Skip to content

Latest commit

 

History

History
625 lines (332 loc) · 37.6 KB

Theory of Computation.md

File metadata and controls

625 lines (332 loc) · 37.6 KB
typora-copy-images-to
./Diagrams

Theory of Computation

Automata: The Methods and the Madness

Automata - a machine which performs a range of functions according to a predetermined set of coded instructions.

Automata theory is the study of abstract computing devices, or "machines".

Finite Automata - Simpler kinds of machines in the 1940s and 1950s.

The Central Concepts of Automata Theory

Alphabet

An alphabet is a finite nonempty set of symbols.

Conventionally, we use the symbol Σ for an alphabet.

Common alphabets include -

  • ​ ∑={0,1} - The binary alphabet.
  • ​ ∑={a,b,c....,z} - The set of all lower case letters

String

A String or sometimes word is a finite sequence of symbols chosen from some alphabet. For example 01101 is a string from the binary alphabet ∑={0,1}

The empty string is the string with zero occurences of symbols. This string, denoted by ε, is a string that maybe chosen from any alphabet whatsoever.

Length of a String - number of positions for symbols in the string. For instance, 01101 has length 5. It is common to say that the length of a string is "the number of symbols" in the string; this statement is colloquially accepted but not strictly correct.

Standard notation for the length of a string w is |w|.

For example, |011|=3 &|ε|=0.

Powers of an alphabet:

If Σ is an alphabet, we can express the set of all strings of a certain length from that alphabet by using an exponential notation.

We define (Σ)^k^ to be the set of strings of length k, each of whose symbols is in Σ.

Ex - (Σ)^0^ = {ε}, regardless of what alphabet Σ is. ε is the only string whose length is 0.

If Σ={0,1}, Σ^2^ = {00,01,10,11}, Σ^3^ = {000,001,010,011,100,101,110,111} and so on.

The set of all strings over an alphabet Σ is denoted by Σ^*^. For instance,

Σ={0,1} => Σ^*^ = {ε,0,1,00,01,10,11,000,…}

i.e Σ^*^= Σ^0^ U Σ^1^ U Σ^2^ U ...

Sometimes we exclude the empty string from the set of strings. The set of nonempty strings from alphabet Σ is denoted by Σ^+^. Thus, two appropriate equivalences are -

Σ^+^ = Σ^1^ U Σ^2^ U Σ^3^ ….

Σ^*^ = Σ^+^ U {ε}

Concatenation of Strings

Let x and y be strings. xy denotes the concatenation of x and y, that is the string formed by making a copy of x and following it a copy of y. More precisely, if x is the string composed of i symbols

x = a1 a2 a3 … ai and y is the string composed of j symbols y = b1 b2 b3 … bj , then xy is the string of length i+j : xy = a1 a2 a3 … ai b1 b2 b3 … bj .

ε is the identity for concatenation, since when concatenated with any string it yields the other string as a result (analogous to 0 for addition). For any string w, εw = = w hold.

Reverse of Strings

Languages

A set of strings all of which are chosen from some Σ^*^ , where Σ is a particular alphabet, is called a language.

If Σ is an alphabet, and L $\subseteq$ Σ^*^, then L is a language over Σ. A language over Σ need not include strings with all the symbols of Σ, so once we have established that L is a language over Σ, we also know that it is a language over any alphabet that is a superset of Σ.

Common languages can be viewed as sets of strings.

An example is English, where the colelction of legal English words is a set of strings over the alphabet that consist of collection of legal English words is a set of strings over the alphabet that consists of all the letters.

Another example is C, or any other programming language, where the legal programs are a subset of the possible strings that can be formed from the alphabet of the language, i.e syntax in this case. This alphabet is a subset of the ASCII characters.

  1. The language of all strings consisting of n 0s followed by n 1s for some n ≥ 0 : {ε,01,0011,000111,…}.

  2. The set of strings of 0's and 1's with an equal number of each: {ε,01,10,0011,0101,1001,...}

  3. The set of binary numbers whose value is a prime: {10,11,101,111,1011,...}

  4. Σ^*^ is a language for any alphabet Σ.

  5. Φ, the empty language, is a language over any alphabet. {ε}, the language consisting of only the empty string, is also a language over any alphabet. Notice that Φ ≠ {ε}; the former has no strings and the latter has one string.

The only important constraint on what can be a language is that all alphabets are finite.

Thus languages, although they can have an infinite number of strings, are restricted to consist of strings drawn from one fixed, finite alphabet.

In automata theory, a problem is the question of deciding whether a given string is a member of some particular language.

Finite Automata

A finite automaton has a set of states and its control moves from state to state in response to external inputs.

One of the crucial distinctions among classes of finite automata is whether that control is deterministic meaning that the automaton cannot be in more than one state at any one time or nondeterministic meaning that it may be in several states at once.

Adding nondeterminism does not let us define any language that cannot be defined by a deterministic finite automaton, but there can be a substantial efficiency in describing an application using a nondeterministic automaton.

In effect, nondeterminism allows us to "program" solutions to problems using a higher-level language. The nondeterministic finite automaton is then "compiled", by an algorithm, into a determinstic automaton that can be "executed" on a functional computer.

Deterministic Finite Automata

A deterministic finite automaton consists of:

  1. A finite set of states, often denoted Q.

  2. A finite set of input symbols, often denoted Σ.

  3. A transition function that takes as arguments a state and a n input symbol and returns a state. The transition function will commonly be denoted δ.

    δ: Q × ∑ → Q

If q is a state, and a is an input symbol, then δ(q,a) is a state p such that there is an arc labeled a from q to p.

  1. A start state (q0), one of the states in Q. (q0 ∈ Q)
  2. A set of final or accepting states F, where F $\subset$ Q.

A deterministic finite automaton will often be referred to by its acronym: DFA.

Five-tuple notation: A = (Q, Σ, δ, q0, F)

Two preferred notations for describing automata:

  1. Transition diagram (graph)
  2. Transition table (Tabular listing of the δ function)

Transition Diagrams

A trlansition diagram for a DFA A = (Q, Σ, δ, q0, F) is a graph defined as follows:

a) For each state in Q there is a node.

b) For each state q in Q and each input symbol ain b, let δ(q,a) = p.

Then the transition diagram has an arc from node q to node p,labeled a. If there are several input symbols that cause transitions from q to p, then the transition diagram can have one arc, labeled by the list of these symbols.

c) There is an arrow into the start state q0, labeled Start. This arrow does not originate at any node.

d) Nodes corresponding to accepting states (those in F) are marked by a double circle. States not in F have a single circle.

Transition Tables

A transition table is a conventional, tabular representation of a function like δ that takes two arguments and returns a value. The rows of the table correspond to the states , and the columns correspond to the inputs. The entry for the row corresponding to state q and the column corresponding to input a is the state δ(q,a).

How a DFA Processes Strings

The language of the DFA is the set of all strings that the DFA accepts.

Suppose a1 a2 a3 … an is a sequence of input symbols. We start out with the DFA in its start state, q0.

We consult the transition function δ, say δ(q0,a1) = q1 to find a state that the DFA enters after processing the first input symbol a1.

We continute in this manner for the symbols that follow in the sequence, finding states q3,q4,q5,….qn such that δ(qi-1,ai) = qi for each i.

If qn is a member of F, then the input sequence is accepted, and if not then it is "rejected".

Example 2.1: Let us formally specify a DFA that accepts all and only the strings of 0 and 1 that have the sequence 01 somewhere in the string. We can write this language L as:

{ w | w is of the form x01y for some strings x and y consisting of 0's and 1 's only}

Another equivalent description, using parameters x and y to the left of the vertical bar, is:

{ x01y | x and y are any strings of 0's and l's}

Examples of strings in the language include 01, 11010, and 100011.

Examples of strings not in the language include ε, 0, and 111000.

What do we know about an automaton that can accept this language L?

First, its input alphabet is Σ = {0,1}. It has some set of states, Q, of which one, say q0, is the start state. This automaton has to remember the important facts about what inputs it has seen so far. To decide whether 01 is a substring of the input, A needs to remember:

  1. Has it already seen 01? If so, then it accepts every sequence of further inputs; i.e., it will only be in accepting states from now on.
  2. Has it never seen 01, but its most recent input was 0, so if it now sees a 1, it will have seen 01 and can accept everything it sees from here on?
  3. Has it never seen 01, but its last input was either non-existent (it just started) or it has last seen a 1? In this case, A cannot accept until it first sees a 0 and then sees a 1 immediately after.

These three conditions can each be represented by a state. Condition (3) is represented by the start state, q0. Surely, when just starting, we need to see a 0 and then a 1. But if in state q0 we next see a 1, then we are no closer to seeing 01, and so we must stay in state q0.

i.e, δ(q0, 1) = q0.

However, if we are in state q0 and we next see a 0, we are in condition (2).

That is, we have never seen 01, but we have our 0. Thus, let us use q2 to represent condition (2). Our transition from q0 on input 0 is -

δ(q0, 0) = q2.

Now, let us consider the transitions from state q2. If we see a 0, we are nobetter off than we were, but no worse either. We have not seen 01, but 0 was the last symbol, so we are still waiting for a 1. State q2 describes this situation perfectly, so we want δ(q2,0) = q2.

If we are in state q2 and we see a 1 input, we now know there is a 0 followed by a 1. We can go to an accepting state, which we shall call q1, and which corresponds to condition (1) above.

i.e, δ(q2, 1)=q1·

Finally, we must design the transitions for state q1. In this state, we have already seen a 01 sequence, so regardless of what happens, we shall still be in a situation where we've seen 01.

i.e, δ(q1, 0) = δ(q1, 1) = q1.

Thus, Q = {q0, q1, q2}.

As we said, q0 is the start state, and the only accepting state is q1;

i.e, F = {q1}.

The complete specification of the automaton A that accepts the language L of strings that have a 01 substring, is

A = ({q0, q1 ,q2}, {0,1}, δ, q0, {q1})

where δ is the transition function described above.

Transition Diagram:

![Screen Shot 2018-02-17 at 20.46.02](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Figure 2.4.png)

Transition Table:

![Screen Shot 2018-02-17 at 20.45.52](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Transition Table.png)

Extending the Transition Function:

We have explained informally that the DFA defines a language: the set of all strings that result in a sequence of state transitions from the start state to an accepting state. In terms of the transition diagram, the language of a DFA is the set of labels along all the paths that lead from the start state to any accepting state.

Now, we need to make the notion of the language of a DFA precise. To do so,we define anextended transition function that describes what happens when we start any state and follow any sequence of inputs. If δ is our trnasition function, then the extended function contructed from δ will be called ð.

The extended transition function is a function that takes a state q and a string w and returns a state p - the state that the automaton reaches when starting in state q and processing the sequence of inputs w. We define ð by induction on the length of the input string, as follows:

Basis: ð(q,ε) = q. That is, if we are in state q and read no inputs, then we are still in state q.

Induction: Suppose w is a string of the form xa; that is, α is the last symbol of w, and x is the string consisting of all but the last symbol. For example, w = 1101 is broken into x = 110 and a = 1.

Then,

ð(q, w)=δ(ð(q, x), a)

To compute ð(q,w), first compute ð(q,x), the state that the automaton is in after processing all but the last symbol of w. Suppose this state is p; that is ,ð(q,x)=p. Then ð(q,w) is what we get by making a transition from state p on input a, the last symbol of w. i.e, ð(q,w) = δ(p,a).

![Screen Shot 2018-02-17 at 20.34.01](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Example 2.4.png)

![Screen Shot 2018-02-17 at 20.34.58](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Example 2.4 - II.png)

The Language of a DFA

Now, we can define the language of a DFA A = (Q, Σ, δ, q0, F). This language is denoted L(A), and is defined by -

L(A) = { w | ð(q0, w) is in F }

That is, the language of A is the set of strings w τhat take the start state q0 to one of the accepting states.

If L is L(A) for some DFA A, then we say L is a regular language.

Minimizing a DFA

A DFA is said to be minimzed when the removal of any state ε Q, changes the language accepted by it.

So if you find a state in a DFA, whose removal doesn't change the language accepted by it, the DFA is not the minimized DFA for the language.

Before we think about removing states from a DFA, let's talk about the types of states that a DFA has in the context of minimization.

Every Minimal DFA for a language is unique. Why? Let's prove it.

There are two ways to minimize a DFA -

(i) Partitioning Method (uses more of common sense)

(ii) Table Filling Method (application of the Myhill-Nerode Theorem)

Nondeterministic Finite Automata

A "nondeterministic" finite automaton (NFA) has the power to be in several states at once.

This ability is often expressed as an ability to "guess" something about its input.

For instance, when the automaton is used to search for certain sequences of characters (e.g., keywords) in a long text string, it is helpful to "guess" that we are at the beginning of one of those strings and use a sequence of states to do nothing but check that the string appears, character by character.

The NFA's accept exactly the regular languages, just as DFA's do. However, there are reasons to think about NFA's. They are often more succinct and easier to design than DFA's. Moreover,while we can always convert an NFA to a DFA, the latter may have exponentially more states than the NFA; fortunately, cases of this type are rare.

Like the DFA, an NFA has a finite set of states, a finite set of input symbols, one start state and a set of accepting states. It also has a transition function, which we shall commonly call δ .

The difference between the DFA and the NFA is in the type of δ.

For the NFA, δ is a function that takes a state and input symbol as arguments (like the DFA's transition function), but returns a set of zero ,one ,or more states (rather than returning exactly one state,as the DFA must).

An NFA is represented essentialy like a DFA:

A = (Q, Σ, δ, q0, F)

where:

  1. Q is a finite set of states.

  2. Σ is a finite set of input symbols.

  3. q0, a member of Q, is the start state.

  4. F, a subset of Q, is the set of final or accepting states.

  5. δ, the transition function that takes a state in Q and an input symbol in Σ as arguments and returns a subset of Q. (returns a single state in the case of a DFA)

    ![Screen Shot 2018-02-17 at 20.36.52](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Figure 2.9.png)

Given figure shows a nondeterministic finite automaton, whose job is to accept all and only the strings of 0's and 1's that end in 01. State q0 is the start state, and we can think of the automaton as being in state q0 (perhaps among other states) whenever it has not yet "guessed" that the final 01 has begun. It is always possible that the next symbol does not begin thefinal 01, even if that symbol is 0. Thus, state q0 may transition to itself on both 0 and 1.

However, if the next symbol is 0, this NFA also guesses that the final 01 has begun. An arc labeled 0 thus leads from q0 to state q1. Notice that there are two arcs labeled 0 out of q0. The NFA has the option of going either to q0 or to q1, and in fact it does both, as we shall see when we make the definitions precise. In state q1, the NFA checks that the next symbol is 1, and if so, it goes to state q2 and accepts.

Notice that there is no arc out of q1 labeled 0, and there are no arcs at all out of q2. In these situations, the thread of the NFA's existence corresponding to those states simply "dies," although other threads may continue to exist.

While a DFA has exactly one arc out of each state for each input symbol, an NFA has no such constraint; we have seen in the figure cases where the number of arcs is zero, one, and two, for example.

![Screen Shot 2018-02-17 at 20.38.06](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Figure 2.10.png)

Above figure suggests how an NFA processes inputs. We have shown whathappens when the automaton of the given NFA receives the input sequence 00101. It starts in only its start state, q0. When the first 0 is read, the NFA may go toeither state q0 or state q1, so it does both.

Then, the second 0 is read. State q0 may again go to both q0 and q1. However, state q1 has no transition on 0, so it "dies." When the third input, a1, occurs, we must consider transitions from both q0 and q1.

We find that q0 goes only to q0 on 1, while q1 goes only to q2. Thus, after reading 001, the NFA is in states q0 and q2. Since q2 is an accepting state, the NFA accepts 001.

However, the input is not finished. The fourth input, a 0, causes q2's thread to die, while q0 goes to both q0 and q1. The last input, a 1, sends q0 to q0 and q1 to q2. Since we are again in an accepting state, 00101 is accepted.

The given NFA can be specified formally as ({q0,q1,q2},{0, 1},δ,q0, {q2}) where the transition function δ is given by the transition table below:

![Screen Shot 2018-02-17 at 20.40.03](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Figure 2.11.png)

Notice that transition tables can be used to specify the transition function for an NFA as well as for a DFA. The only difference is that each entry in the table for the NFA is a set, even if the set is a singleton (has one member). Also notice that when there is no transition at all from a given state on a given input symbol, the proper entry is 0, the empty set.

As for DFA's, we need to extend the transition function δ of an NFA to a function ð that takes a state q and a string of input symbols w, and returns the set of states that the NFA is in if it starts in state q and processes the string w.

The idea was suggested by the previous diagram, in essence ð(q, w) is the column of states found after reading w, lf q is the lone state in the first column.

For instance, it suggests that ð(q0, 001) = {q0, q2}. Formally, we define ð for an NFA's transition function ð by:

BASIS: δ(q, ε) = {q}. That is, without reading any input symbols, we are only in the state we began in.

INDUCTION: Suppose w is of the form w=xa, where a is the final symbol of w and x is the rest of w. Also suppose that ð(q, x) = {pl, p2, …, pk}.

Let ![Screen Shot 2018-02-17 at 20.49.45](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Union I.png)

Then ð(q, w)= {r1,r2,. . . ,rm}. Less formally, we compute ð(q, w) by first computing ð(q,x), and by then following any transition from any of these states that is labeled a.

Let us use ð to describe the processing of input 00101 by the NFA

  1. ð(q0, e) = {q0}.
  2. ð(q0, 0) = δ(q0, 0) = {q0, q1}.
  3. ð(q0, 00) = δ(q0, 0) ∪ δ(q1, 0) = {q0, q1} ∪ ø = {q0, q1}.
  4. ð(q0, 001) = δ(q0, 1) ∪ δ(q1, 1)= {q0} ∪ {q2} = {q0, q2}.
  5. ð(q0, 0010) = δ(q0, 0) ∪ δ(q2, 0) = {q0, q1} ∪ ø = {q0, q1}.
  6. ð(q0, 00101) = δ(q0, 1) ∪ δ(q1, 1) = {q0} ∪ {q2} = {q0, q2}.

The Language of an NFA

As we have suggested, an NFA accepts a string w if it is possible to make any sequence of choices of next state, while reading the characters of w, and go from the start state to any accepting state. The fact thtat other choices using the input symbols of w lead to a non-accepting state, or do not lead to any state at all (i.e, the sequence of states "dies"), does not prevent w from being accepted by the NFA as a whole. Formally, if A = (Q, Σ, δ, q0, F) is an NFA, then

L(A) = { w |ð(q0, w) ∩ F ≠ ø }

That is, L(A) is the set of strings in Σ^*^ such that ð(q0, w) contains at least one acceptìng state.

Note: Make sure you go through Example 2.9, to understand proof of acceptance of a language by an NFA through mutual induction.

Equivalence of Deterministic and Nondeterministic Finite Automata

Every languge that can be described by an NFA can also be described by some DFA. The DFA in practice has about as many states as the NFA, but it often has more transitions than the latter. In the worst case, the smallest DFA can have 2^n^ states, while the smallest NFA for the same language has only n states.

The proof that DFA's can do whatever NFA's can do involves an important "construction" called the subset construction because it involves constructing all subsets of the set of states of the NFA.

The subset construction starts from an NFA N = (QN, Σ, δΝ, q0, FN). The idea here is the description of a DFA D = (QD, Σ, δD, {q0}, FD) such that L(D) = L(N). The input alphabets of the two automata are same, and the star state of D is the set containing only the start state of N. The other components of D are constructed as follows -

  • QD is the set of subsets of QN; i.e, QD is the power set of QN. If QN has n states, QD will have 2^n^ states. Often, not all these states are accessible from the start state of QD, and hence can be "thrown away", so effectively, the number of states of D may be much smaller than 2^n^.

  • FD is the set of subsets S of QN such that S ∩ FN ≠ Φ. That is, FD is all sets of N's states that include at least one accepting state of N.

  • For each set S ⊆ QN and for each input symbol a in Σ,

    δD(S, a) = ∪p in S δΝ(p, a)

    That is, to comput δD(S, a) we look at all the states p in S, see what states N goes to from p on input a, and take the union of all those states.

Theorem 2.12 - A Language L is accepted by some DFA if and only if L is accepted by some NFA.

Finite Automata with Epsilon-Transitions

The new feature is that we allow a transition on ε, the empty string. In effect, an NFA is allowed to make a transition spontaneously, without recieveing an input symbol. It is known as ε-ΝFA. As mentioned before in the case of comparing NFAs and DFAs, ε-NFAs don't expand the class of languages that can be accepted, but rather offers extra programming convenience.

Each ε along a path is invisible, i.e, it contributes nothing to the string along the path.

When you supply ε as an input to a state, the state does not change and control remains in the same state, unless a specified transition for ε is defined in the automata.

We may represent an ε-ΝFA exactly as we do an NFA, with one exception: the transition function ust include information about transitions on ε.

Formally, we represent an ε-NFA A by -

​ A = (Q, Σ, δ, q0, F)

where all components have their same interpretation as for an NFA, except that δ is now a function that takes as arguments:

  1. A state in Q, and
  2. A member of Σ ∪ {ε}, that is, either an input symbol or the symbol ε.

We require that ε cannot be a member of the alphabet Σ, to avoid any confusion.

An interesting notion regarding ε-transitions is the fact that you can divide a system into separate entities using them, without having the need to connect each entity with a logical input.

One must also consider that, the power of ε-transitions in the context of non-determinism lies in the fact that when you are at a state that has multiple outgoing ε-transitions to a set of other states, you are basically present in those states parallely as well, since transitioning to those states requires no input. What I can make of this is the idea that ε-transitions give you a crude way to implement parallel processing.

Example: ε-ΝFA that accepts decimal numbers consisting of:

1. An optional + or - sign,
2. A string of digits,
3. A decimal point, and
4. Another string of digits. Either this string of digits, or the string in (2) can be empty, but at least one of the two strings of digits must be nonempty.

![Screen Shot 2018-02-20 at 12.57.56](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Figure 2.18.png)

Note the transition from q0 to q1 on any of ε, + or —. The state q1 represents the situation in which we have seen the sign if there is one, and perhaps some digits, but not the decimal point. State q2 represents the situation where we have just seen the decimal point, and may or may not have seen prior digits. In q4, we have definitely seen at least one digit, but not the decimal point.

At q3, we have seen a decimal point and at least one digit, either before or after the decimal point. We may stay in q3 reading whatever digits there are, and also have the option of "guessing" the string of digits is complete and going spontaneously to q5, the accepting state.

ε-closure

Informally, we ε-close a state q by following all transitions out of q that are labelled ε. However, when we get to other states by following ε, we follow the ε-transitions out of those states and so on, eventually finding every state that can be reached from q along any path whose arcs are all labelled ε.

ε-closure of a state is the set of all the states that can be reached on providing the input symbol ε to the state, recursively.

Formally, we define the (ε-closure) ECLOSE(q) recursively, as follows:

BASIS: State q is in ECLOSE(q).

INDUCTION: If state p is in ECLOSE(q), and there is a transition from state p to state r labelled ε, then r is in ECLOSE(q). More precisely, if δ is the transition function of the ε-NFA involved, and p is in ECLOSE(q), then ECLOSE(q) also contains all the states in δ(p, e).

In the above figure (2.18), each state is its own ε-closure, with two exceptions: ECLOSE(q0) ={q0, q1} and ECLOSE(q3) = {q3, q5}, as there are two ε-transitions defined for q1 and q3 that adds q1 to ECLOSE(q0) and q5 to ECLOSE(q3).

ε-closure of a set of states S, is equal to the union of ε-closures of the individual states.

Extended Transitions and Langues for ε-ΝFAs

The ε-closure basically tells you what the transition of an ε-NFA looks like when given a sequence of non-ε inputs. From there, you can contemplate what it means for an ε-ΝFA to accept its input.

Suppose that E = (Q, Σ, δ, q0, F) is an ε-ΝFA. We first define ð, the extended transition function, to reflect what happens on a sequence of inputs. The intent is that ð(q,w) is the set of states that can be reached along a path whose labels, when concatenated, form the string w.

As always, ε along the paths do not contribute to the w. The appropriate recursive definition of ð is -

BASIS: ð(q, ε) = ECLOSE(q), i.e - if the label of the path is ε, then we can follow only ε-labelled arcs extending from state q; exactly what ECLOSE does.

INDUCTION: Suppose w is of the form xa, where a is the last symbol of w. (Note that a is a member of Σ), it cannot be ε, which is not in Σ. ð(q,w) is computed as follows -

  1. Let {p1, p2, p3, … , pk} be ð(q,x). That is, the pi's are all and only the states that we can reach from q following the path labelled x. This path may end with one or more transitions labeled ε, and may have other ε-transitions, as well.
  2. Let ![Screen Shot 2018-02-25 at 00.04.55](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Union.png) be the set {r1, r2, … rm}. That is, follow all transitions labeled a from states we can reach from q along paths labeled x. The ri's are some of the states we cna reach from q along the paths labeled w. The additional states we can reach are found from rj's by following ε-labeled arcs in step(3), below.
  3. Then ð(q,w) = ECLOSE({r1, r2, … rm}). This additional closure step includes all the paths from q labelled w, by considering the possibility that there are additional ε-labeled arcs that we can follow after making a transition on the final "real" symbol, a.

Basically, you can infer the following from the above steps -

ð(q, ε) = ECLOSE(q)

For a symbol 'a', ð(q, a) = ECLOSE(δ(ð(q, ε), a))

For a sequence of symbols w = xa, ð(q, w) = ð(q, xa) = ECLOSE(δ(ð(q, x), a))

![Screen Shot 2018-02-25 at 00.25.10](/Users/vishhvaksrinivasan/Stuff/Course Stuff/TOC-Notes/Diagrams/Example 2.20.png)

L(E) = { w | ð(q0, w) F ≠ φ }

That is, the language of E is the set of strings w that take the start state to at least one accepting state.

Eliminating ε-Τransitions:

When you convert an ε-ΝFA to an NFA, the number of states remain the same.

You use ε-closure as a tool for this conversion process in the context of redefining the transition function.

ðNFA(qi, x) = ECLOSE(δε-ΝFA(ECLOSE(qi), x))

The states of the ε-NFA that contain the final state of the ε-ΝFA in it's ε-closure will be the final states of the converted NFA.

One can make sense of the above, with a proper understanding of ε-transitions.

Consider a case where you have an ε-ΝFA, and you are in a state qi. In order to determine the transition from qi upon the input x, you would also have to consider the states that qi leads to upon recieving ε, as they can somewhat be considered synonymous to qi in a very crude sense, as giving the input x to qi is also equivalent to giving x to those states as well, as they are just a null input away from qi. Hence you take ε-closure(qi) and then give those set of states the input x, and find their transitions. Once you find out the transitions of these set of states upon recieving an input x, you would then arrive at another set of state(s), after which you have to repeat the ε-closure for these states, as the states connected to these states upon ε are the states that are reached in the end, for the NFA we are trying to construct without these ε transitions. Similarly, the final states of the converted NFA are constructed upon the same logic.

Regular Expressions

  • Form of language-defining notation.
  • Closely related to nondeterministic finite automata, can be though of as a user-friendly alternative.
  • Capable of defining all and only the regular languages.
  • Algebraic descriptions of languages.
  • Defines the same languages that other forms of automata define - regular langugages.
  • Serve as input language for many systems that process strings.

Example - 01 + 10** - represents the language consisting of strings either starting with a single 0 and followed by any number of 1's, or a single 1 followed by any number of 0's.

Operators of Regular Expressions

Before describing the regular-expression notation, we need to learn the three operations on languages that the operators of regular expressions represent -

  1. The union . of two languages L and M, denoted by L ∪ M, is the set of strings that are either in L or M, or both. For example, if L = {001, 10, 111} and M = {ε, 001}, then L ∪ M = {ε, 10, 001, 111}.

  2. Τhe concatenation of languages L and M is the set of strings that can be formed by taking any string in L and concatenating it with any string in M. Taking the languages mentioned in (1), L.M or LM = {10, 001, 111, 10001, 001001, 111001}.

  3. The closure (or star, or Kleene closure) of a language L is denoted L* and represents the set of those strings that can be formed by taking any number of strings from L, possibly with repetitions (i.e, the same string may be selected more than once), and concatenating all of them. For instance, if L = {0,1}, then L* is all strings of 0's and 1's. If L = {0, 11}, then L* consists of those strings of 0's and 1's such that 1's come in pairs, e.g - 011, 11110, and ε, but not 010111, or 101.

    More formally,

    L is the inifnite union ∪i≥0 L^i^, where L^0^= {ε} and L^i^, for i > 1, is LL…L (the concatenation of i copies of L).*

Note: Go through Example 3.1 (page 87), in the book to get a better idea of Kleene Closure.

Use of the Star Operator

We saw the star operator first, where we applied it to an alphabet, e.g Σ*. That operator formed all strings whose symbols were chosen from alphabet Σ. Τhe closure operator is basically the same, although there is a very small difference.

Say L is the language consisting of strings of length 1, and for each symbol s in Σ, there is a string s, in L. Then, even though L and Σ look the same, they are of different types. L is a set of strings, and Σ is a set of symbols. On the other hand, L* denotes the same language as Σ*.

Expressions and their Languages - Strictly speaking, a regular expression E is just an expression, and not a language, and we should ideally be using L(E) when speaking of the language of the regular expression. However, we shall use the commonly used or refferred convention that employs the usage of "E" when we really mean "L(E)".

Building Regular Expressions

Algebra of regular expression follows the basic patten followed in any algebra, using constants and variables that denote languages, and operators for the three operations described above, namely - union, dot and star. The recursive definition of RE's goes as -

BASIS:

  1. The constants ε and Φ are regular expressions, denoting the languages {ε} and Φ respectively. That is, L(ε) = {e}, and L(Φ) = Φ.
  2. Ιf a is any symbol, then a is a regular expression. This denotes the language {a}, i.e, L(a) = {a}.
  3. A variable, usually capitalized (and maybe italic), such as L, is a variable representing any language.

INDUCTION:

  1. If E and F are regular expressions, E + F is a regular expression denoting the union of L(E) and L(F).

    L(E+F) = L(E) + L(F)

  2. If E and F are regular expressions, EF is a RE denoting the concatenation of L(E) and L(F).

    L(EF) = L(E)L(F)

  3. If E is a regular expression, then E* is a regular expression, denoting the Kleene Closure (or just Closure) of L(E).

    L(E*) = (L(E))*

  4. If E is a regular expression, then (E) is also a regular expression.

L((E)) = L(E)

Example 3.2: Write a Regular expression for the set of strings that consist of alternating 0's and 1's.

There are 2 cases, with 2 subcases each.

  1. The string begins with 0.

    a. The string ends with 1.

    b. The string ends with 0.

  2. The String begins with 1.

    a. The string ends with 1.

    b. The string ends with 0.

Case 1 (a) = (01), Case 1(b) = 0(10) Case 2(a) = 1(01), Case 2(b) = 1(10)

​The overall regular expression is the union of the expressions produced in all cases -

(01)* + (10)* + 1(01)* + 0(10)*

There is another approach to this problem, mentioned in the book. Being the reader, I would suggest you try to think. (Hint: Use ε!)

Precedence of Regular Expression Operators

Like other algebras, RE operators have an assumed order of "precedence", as follows -

  1. .

i.e - ==*== > ==.== > ==+==

You begin with the star operator first. The star operator applies to the smallest sequence of symbols to its left that is a well-formed expression.

The concatenation operator comes next. Without the "dot" operator, all expressions that are juxtaposed are grouped together. Concatenation is associative. It doesn't matter in what order you group consecutive concats, although if there is a choice to be made, you would want to group them from the left. For instance, 012 is grouped as (01)2.

Finally, all unions (+ operators) are grouped with their operands. Union is also associative. Order of grouping doesn't matter, although we shall assume grouping from the left.

We are free to use parantheses as well to group operands exactly as we choose. You can also put parantheses around operands you want to group, even if the desired grouping is implied by the rules of precedence.

To have a thorough understanding of the use of precedence and parantheses, make sure you go through Example 3.3 (page-91).

A language is said to be regular if there exists a regular expression to represent it.

Finite Automata and Regular Expressions

​ ​

​ ​