Formal Languages
Introduction to Formal Languages and Lambda Calculus
Key Concepts Introduced by Noam Chomsky
Noam Chomsky developed a theory of formal languages, which classifies languages into four categories based on their complexity and the type of automata that recognize them.
The Chomsky Hierarchy: Language Classes
-
Regular Languages (Type 3)
- Recognized by deterministic finite automata (DFA).
- Defined using regular expressions.
- Limitations:
- Can only describe simple patterns (e.g., sequences, repetitions).
- Lack the power to express nested or hierarchical structures.
- Example: Identifying whether a string contains "ab" using a regular expression.
-
Context-Free Languages (Type 2)
- Recognized by non-deterministic pushdown automata (PDA).
- Most programming languages fall under this category.
- Defined using context-free grammars (CFG).
- Recognition Process:
- Given a string, an algorithm determines if it belongs to the language.
- Involves parsing strategies with varying complexity.
- Example: Arithmetic expressions with balanced parentheses. Parsing HTML or XML.
-
Context-Sensitive Languages (Type 1)
- Recognized by bounded Turing machines, where the tape is limited in both directions.
- More powerful than context-free grammars but computationally expensive.
- Example: Grammar rules where production depends on surrounding symbols. Natural language processing (NLP) requiring context-sensitive grammar.
-
Recursively Enumerable Languages (Type 0)
- Recognized by Turing machines with unlimited tape.
- The most general and powerful type, covering all computable languages.
- Example: Any problem solvable by a computer with no constraints. AI computations.
The Role of Recursion in Language Definition
- Recursion is the backbone of language definition and recognition.
- Recursion is believed to be an inherent feature of human cognition, making natural languages potentially describable by Turing machines.
- Formal grammars often rely on recursive structures to describe complex patterns.
Defining Grammars with Backus-Naur Form (BNF)
BNF is a formal notation used to define the syntax of programming languages through recursion.
Example: Defining Numbers Using BNF
<digit> ::= 0 | 1 | 2 | ... | 9
<number> ::= <digit> | <digit><number>
Interpretation:
- A number can be a single digit or a digit followed by another number.
- Recursive expansion allows generating multi-digit numbers.
Key Point:
To prevent infinite recursion, a base case is essential (e.g., defining the digit separately).
Bad Example: Incorrect BNF Definition
<number> ::= <number><digit>
Why is this incorrect?
- Missing base case: This definition lacks a non-recursive base case, causing an infinite recursion when trying to derive a number.
- Evaluation issue: Parsing a number would require infinite lookahead without reaching a termination point.
- Example of failure:
- If we start with
<number>, it expands to<number><digit>, which again expands to<number><digit>, leading to an infinite loop.
- If we start with
Correcting the Issue:
To fix this, the definition should include a termination condition by allowing <number> to be a single <digit> without further recursion.
Application of Formal Grammars in Programming Languages
- Programming languages like Scala and Python use context-free grammars (CFG).
- BNF and Extended BNF (EBNF) are widely adopted in compiler design to describe syntax.
Example:
The syntax of Scala can be fully described using BNF notation, highlighting the flexibility of context-free grammars.
BNF in Programming Language Syntax
- Programming languages like Scala and Python define their syntax using BNF or Extended BNF (EBNF).
- Example (Python BNF syntax fragment):
<expression> ::= <term> "+" <expression> | <term> <term> ::= <factor> "*" <term> | <factor> <factor> ::= "(" <expression> ")" | <digit>
Key Takeaways:
- The BNF representation allows developers to formally describe the structure of expressions, statements, and blocks.
- Many programming language compilers and interpreters rely on CFGs and BNF to parse source code efficiently.
Introduction to Lambda Calculus
What is Lambda Calculus?
Lambda calculus is a formal system in mathematical logic and computer science for expressing computation based on function abstraction and application. It serves as the foundation for functional programming languages and provides a simple yet powerful model of computation.
Key Characteristics:
- Treats functions as first-class entities.
- Avoids mutable state and imperative control flow.
- Relies on function application as the primary means of computation.
Historical Background
Lambda calculus was introduced by Alonzo Church in the 1930s as a foundation for formalizing computation. It was initially developed as part of an attempt to understand the nature of functions and mathematical logic.
Real-World Impact:
- Forms the theoretical underpinning of languages such as Haskell, Scala, and Lisp.
- Provides insights into functional programming paradigms such as higher-order functions and immutability.