唧唧复唧唧，十日九风雨。
# Regular language
Language is the set of strings.
# Finite automata
Definition(Finite automaton)
A finite automaton is a 5tuple $(Q,\Sigma,\delta,q_0,F)$, where
 $Q$ is a finite set called the states.
 $\Sigma$ is a finite set called the alphabet.
 $\delta:Q\times\Sigma\rightarrow Q$ is the transition function. (Not partial function!)
 $q_0\in Q$ is the start state.
 $F\subseteq Q$ is the set of accept states.
If $A$ is the set of all strings that machine $M$ accepts, we say that $A$ is the
language of machine $M$ and write $L(M) = A$.
We say that $M$ recognizes $A$ or that M accepts A. Because the term accept has different meanings when we refer to machines accepting strings and machines accepting languages, we prefer the
term recognize for languages in order to avoid confusion.
A machine may accept several strings, but it always recognizes only one language.
Definition(Computation)
Let $M=(Q,\Sigma,\delta,q_0,F)$ be a finite automaton and let $w=w_1w_2...w_n$ be a string where each $w_i$ is a member of the alphabet $\Sigma$. Then $M$ accepts $w$ if a sequence of states $r_0,r_1,...,r_n$ in $Q$ exists with three conditions:
 $r_0=q_0$.
 $\delta(r_i,w_{i+1})=r_{i+1}$, for $i=0,...,n1$.
 $r_n\in F$.
We say that $M$ recognizes language $A$ if $A=\{wM\;accepts\;w\}$.
Definition(Regular language)
A language is called a regular language if some finite automaton recognizes it.
Definition
Let $A$ and $B$ be languages. We define the regular operations:

Union:
$A\cup B=\{xx\in A\;or\;x\in B\}$

Concatenation:
$A\circ B=\{xyx\in A,y\in B\}$

Star:
$A^*=\{x_1x_2...x_kk\geq 0,x_i\in A\}$
Theorem:
The class of regular languages is closed under the union operation.
Proof: Let $M_1=(Q_1,\Sigma_1,\delta_1,q_1,F_1)$ recognize $A_1$ and $M_2=(Q_2,\Sigma_2,\delta_2,q_2,F_2)$ recognize $A_2$. Then we construct $M=(Q,\Sigma,\delta,q,F)$ to recognize $A_1\cup A_2$.

$Q=\{(r_1,r_2)r_1\in Q_1,r_2\in Q_2\}$

$\Sigma=\Sigma_1\cup\Sigma_2$

$\delta((r_1,r_2),a)=(\delta_1(r_1,a),\delta_2(r_2,a))$

$q=(q_1,q_2)$

$F=\{(r_1,r_2)r_1\in F_1\;or\;r_2\in F_2\}$
# Nondeterminism
Definition(Nondeterministic finite automata)
A nondeterministic finite automaton is a 5tuple $(Q,\Sigma,\delta,q_0,F)$, where
 $Q$ is a finite set called the states.
 $\Sigma$ is a finite set called the alphabet.
 $\delta:Q\times(\Sigma\cup\{\varepsilon\})\rightarrow \mathcal{P}(Q)$ is the transition function.
 $q_0\in Q$ is the start state.
 $F\subseteq Q$ is the set of accept states.
Definition(Computation)
Let $M=(Q,\Sigma,\delta,q_0,F)$ be a nondeterministic finite automaton and let $w=w_1w_2...w_n$ be a string where each $w_i$ is a member of the alphabet $\Sigma$. Then $M$ accepts $w$ if we can write $w$ as $w=y_1y_2...y_m$, where $y_i\in\Sigma\cup\{\varepsilon\}$ and a sequence of states $r_0,r_1,...,r_m$ exists in $Q$ with three conditions:

$r_0=q_0$.

$r_{i+1}\in \delta(r_i,y_{i+1})$, for $i=0,...,m1$.

$r_m\in F$.
Theorem:
Every nondeterministic finite automaton has an equivalent deterministic finite automaton.
Proof: Let $N=(Q,\Sigma,\delta,q_0,F)$ be the NFA recognizing some language $A$. We construct a DFA $M=(Q',\Sigma,\delta',q_0',F')$ recognizing $A$. Before doing the full construction, let’s first consider the easier case when $N$ has no $\varepsilon$ arrows.

$Q'=\mathcal{P}(Q)$

$\delta'(R,a)=\bigcup_{r\in R}\delta(r,a)$

$q_0'=\{q_0\}$

$F'=\{R\in Q'R\;contains\;an\;accept\;state\;of\;N\}$
Now we need to consider the $\varepsilon$ arrows. To do so we set up an extra bit of notation. For any state $R$ of $N$ we define $E(R)$ to be the collection of states that can be reached from $R$ by going only along $\varepsilon$ arrows, including the members of $R$ themselves. We need to replace $\delta'(R,a)$ by
$\delta'(R,a)=\{q\in Qq\in E(\delta(r,a))\}$
and replace $q_0'$ with
$q_0'=E(\{q_0\})$
Example:
$I$  $I_a$  $I_b$ 

[1,3]  [1,3]  [2] 
[2]  [2,3]  [3] 
[2,3]  [1,2,3]  [3] 
[3]  [1,3]  [] 
[1,2,3]  [1,2,3]  [2,3] 
[]  []  [] 
Then the DFA is
Theorem:
The class of regular languages is closed under the union operation.
Proof: since the theorem has been proved before, here we give an informal proof: given two DFAs: $N_1$ and $N_2$, we can construct an NFA to recognize $L(N_1)\cup L(N_2)$:
Theorem:
The class of regular languages is closed under the concatenation operation.
Proof: Suppose we are given two DFAs: $N_1$ and $N_2$, we can construct an NFA to recognize $L(N_1)\circ L(N_2)$:
Theorem:
The class of regular languages is closed under the star operation.
Proof:
Theorem:
The class of regular languages is closed under the complement.
Proof: we can swap the accept states and other states in the DFA
Theorem:
The class of regular languages is closed under the intersection.
Proof: $L_1\cap L_2=\sim(\sim L_1\cup \sim L_2)$, since regular language is closed under union, complement, so it is closed under intersection.
# Regular expressions
Definition(regular expression)
Say that $R$ is a regular expression if $R$ is
 $a$ for some $a$ in the alphabet $\Sigma$
 $\varepsilon$
 $\emptyset$
 $(R_1\cup R_2)$, where $R_1,R_2$ are regular expressions
 $(R_1\circ R_2)$, where $R_1,R_2$ are regular expressions
 $(R_1)^*$, where $R_1$ is regular expressions
Example:
$1^*\circ\emptyset=\emptyset\\ \emptyset^*=\{\varepsilon\}\\ (\Sigma\Sigma)^*=\{wthe\;length\;of\;w\;is\;even\}\\ (01^+)^*=(01^+)(01^+)...(01^+)=\\ \{wevery\;0\;in\;w\;is\;followed\;by\;at\;least\;one\;1\}$
Theorem:
A language is regular if and only if some regular expression describe it.
Proof:
One direction: If a language is described by regular expression, then it is a regular language.
We need to construct NFA to recognize corresponding regular expressions, which is the same as the proof of closure under operations.
Example: building an NFA recognizing $(a\cup b)^*aba$
The other direction: If a language is regular, then it is described by a regular expression. I have my own method to do it. example:
 Step1. construct a new accept state, and add $\varepsilon$ arrows from accept states to the new state:
 Step2. remove selfloops. If a state $q$ has a selfloop with character $a$, then we need to add $a^*$ to every arrow from $q$ to other states. Specially if $q$ has no arrows from it, then you can just remove $q$.
 Step3. remove states. If we want to remove state $q$, then we need to enumerate all states $q_1,q_2$ which form following transitions:
$q_1\stackrel{R_1}{\rightarrow}q\stackrel{R_2}{\rightarrow}q_2$
where $R_1$ and $R_2$ are regular expressions. Then we can remove $q$, and add arrows from $q_1$ to $q_2$ with regular expressions $R_1\circ R_2$.
since in the example, the state 2 has only one inarrow and only one outarrow, so it can be removed easily. However, for those states who have $n$ inarrows and $m$ outarrows, we need to add $n\times m$ new arrows.
# Nonregular languages
Theorem (Pumping Lemma)
If $A$ is a regular languages, then there is a number $p$ (the pumping length) where, if $s$ is any string in $A$ of length at least $p$, then $s$ can be divided into three pieces, $s=xyz$, satisfying the following conditions:
 for each $i\geq 0$, $xy^iz\in A$ (Notice! when $i=0$, $xz\in A$)
 $y>0$
 $xy\leq p$
Proof: Let $M=(Q,\Sigma,\delta,q_1,F)$ be a DFA recognizing $A$ and $p$ be the number of states of M: $p=Q$.
Let $s=s_1s_2...s_n$ be a string in $A$ of length $n$, where $n\geq p$. Let $r_1,...,r_{n+1}$ be the sequence of states that $M$ enters when processing $s$, so $r_{i+1}=\delta(s_i,r_i)$.
Since $n\geq p$, there are two states in the sequence are the same. Without loss of generality, we assume $r_i=r_j$. Besides, $j\leq p+1$ since if $j>p+1$, then there must be two same states in $r_1,...,r_{p+1}$.
Then we can divide the string as
$x=s_1s_2...s_{i1}\\ y=s_{i}s_{i+1}...s_{j1}\\ z=s_{j}...s_n$
since $r_{j}=r_i$, so after reading $s_{i1}$ and $s_{j1}$, $M$ would enter the same state.
So no matter how many times $y$ repeats, $M$ always enter the same state and loop.
So $xy^iz$ can always be accepted.
Example: Consider the language $\{0^n1^nn\geq 0\}$. Now we can prove it’s not a regular language with pumping lemma.
Assume to the contrary that it’s regular, then there exists a pumping lemma $p$.
However, for any $p$, we can construct a string: $0^p1^p$, whose length is greater than $p$ and cannot be divided into three parts satisfying three conditions.
So it’s not a regular language.
# Contextfree language
Definition(contextfree grammar)
A contextfree grammar is a 4tuple $(V,\Sigma,R,S)$, where
 $V$ is a finite set called the variables
 $\Sigma$ is a finite set, disjoint from $V$, called the terminals
 $R$ is finite set of rules, with each rule being a variable and a string of variables and terminals
 $S\in V$ is the start variable
If $u,v$ and $w$ are strings of variables and terminals, and $A\rightarrow w$ is a rule of grammar, we say that $uAv$ yields $uwv$, written $uAv\Rightarrow uwv$/
Say that $u$ derives $v$, written $u\stackrel{*}{\Rightarrow} v$, if $u=v$ or if a sequence exists:
$u\Rightarrow u_1\Rightarrow...\Rightarrow u_k\Rightarrow v$
The language of grammar is $\{w\in\Sigma^*S\stackrel{*}{\Rightarrow} w\}$.
A derivation of a string $w$ in a grammar $G$ is a leftmost derivation if at every step the leftmost remaining variable is the one replaced.
Definition
A string $w$ is derived ambiguously in contextfree grammar $G$ if it has two or more different leftmost derivations. Grammar $G$ is ambiguous if it generates some string ambiguously.
Sometimes when we have the ambiguous grammar, we can construct another unambiguous grammar generating the same languages. While some ambiguous grammars cannot. We call the latter grammar inherently ambiguous.
Example: $\{a^ib^jc^ki=j\;or\;j=k\}$ s inherently ambiguous.
Definition(Chomsky normal form)
A contextfree grammar is in Chomsky normal form if every rule is of the form
$A\rightarrow BC\\ A\rightarrow a$
where $A,B,C$ are any variables except that $B,C$ can’t be the start variable, and $a$ is any terminals.
In addition we permit the rule $S\rightarrow\varepsilon$ when $S$ is the start variable.
Theorem
Any contextfree language is generated by a contextfree grammar in Chomsky normal form.
Proof: If we have a contextfree grammar, we can construct the corresponding Chomsky normal form as follows.
Example:
$S\rightarrow ASA\;\;aB\\ A\rightarrow B\;\;S\\ B\rightarrow b\;\;\varepsilon$

Step1, we add a new start variable $S_0$ and the rule $S_0\rightarrow S$, where $S$ was the original start variable.
$S_0\rightarrow S\\ S\rightarrow ASA\;\;aB\\ A\rightarrow B\;\;S\\ B\rightarrow b\;\;\varepsilon$

Step2, we take care of all $\varepsilon$ rules. We remove $A\rightarrow\varepsilon$ where $A$ is not the start variable. Then for each occurrence of an $A$ on the right side of a rule, we add a new rule with that occurrence deleted.
$\begin{cases}S_0\rightarrow S\\ S\rightarrow ASA\;\;aB\\ A\rightarrow B\;\;S\\ B\rightarrow b\;\;\varepsilon\end{cases}\Rightarrow \begin{cases}S_0\rightarrow S\\ S\rightarrow ASA\;\;aB\;\;a\\ A\rightarrow B\;\;S\;\;\varepsilon\\ B\rightarrow b\end{cases}(remove\;B\rightarrow\varepsilon)\\ \Rightarrow\begin{cases}S_0\rightarrow S\\ S\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\;\;S\\ A\rightarrow B\;\;S\\ B\rightarrow b\end{cases}(remove\;A\rightarrow\varepsilon)$

Step3, we remove unit rules $S\rightarrow S,A\rightarrow B,A\rightarrow S$:
$\begin{cases}S_0\rightarrow S\\ S\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ A\rightarrow B\;\;S\\ B\rightarrow b\end{cases}(remove\;S\rightarrow S)\\ \Rightarrow\begin{cases}S_0\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ S\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ A\rightarrow B\;\;S\\ B\rightarrow b\end{cases}(remove\;S_0\rightarrow S)\\ \Rightarrow\begin{cases}S_0\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ S\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ A\rightarrow b\;\;S\\ B\rightarrow b\end{cases}(remove\;A\rightarrow B)\\ \Rightarrow\begin{cases}S_0\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ S\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ A\rightarrow b\;\;ASA\;\;aB\;\;a\;\;SA\;\;AS\\ B\rightarrow b\end{cases}(remove\;A\rightarrow S)$

Step4, we destruct rules that length $\geq 3$
$\begin{cases}S_0\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ S\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ A\rightarrow b\;\;AT\;\;aB\;\;a\;\;SA\;\;AS\\ T\rightarrow SA\\ B\rightarrow b\end{cases}(destruct\;A\rightarrow ASA)\\ \Rightarrow \begin{cases}S_0\rightarrow ASA\;\;aB\;\;a\;\;SA\;\;AS\\ S\rightarrow AT\;\;aB\;\;a\;\;SA\;\;AS\\ A\rightarrow b\;\;AT\;\;aB\;\;a\;\;SA\;\;AS\\ T\rightarrow SA\\ B\rightarrow b\end{cases}(destruct\;S\rightarrow ASA)\\ \Rightarrow \begin{cases}S_0\rightarrow AT\;\;aB\;\;a\;\;SA\;\;AS\\ S\rightarrow AT\;\;aB\;\;a\;\;SA\;\;AS\\ A\rightarrow b\;\;AT\;\;aB\;\;a\;\;SA\;\;AS\\ T\rightarrow SA\\ B\rightarrow b\end{cases}(destruct\;S_0\rightarrow ASA)\\ \Rightarrow \begin{cases}S_0\rightarrow AT\;\;UB\;\;a\;\;SA\;\;AS\\ S\rightarrow AT\;\;UB\;\;a\;\;SA\;\;AS\\ A\rightarrow b\;\;AT\;\;UB\;\;a\;\;SA\;\;AS\\ T\rightarrow SA\\ U\rightarrow a\\ B\rightarrow b\end{cases}(destruct \;S_0,S,A\rightarrow aB)$
Done.
# Pushdown automata
Definition(Pushdown automaton)
A pushdown automaton is a 6tuple $(Q,\Sigma,\Gamma,\delta,q_0,F)$, where $Q,\Sigma,\Gamma$ and $F$ are all finite sets, and

$Q$ is the set of states

$\Sigma$ is the input alphabet

$\Gamma$ is the stack alphabet

$\delta:Q\times(\Sigma\cup\{\varepsilon\})\times(\Gamma\cup\{\varepsilon\})\rightarrow \mathcal{P}(Q\times(\Gamma\cup\{\varepsilon\}))$
is the transition function

$q_0\in Q$ is the start state

$F\subseteq Q$ is the set of accept states
Definition(Computation)
A pushdown automaton $M=(Q,\Sigma,\Gamma,\delta,q_0,F)$ computes as follows. It accepts input $w$ if $w$ can be written as $w=w_1w_2...w_m$, where each $w_i\in\Sigma\cup\{\varepsilon\}$ and sequences of states $r_0,r_1,...,r_m\in Q$ and strings $s_0,s_1,...,s_m\in\Gamma^*$ exists that satisfy the following three conditions:
 $r_0=q_0,s_0=\varepsilon$.
 $(r_{i+1},b)\in\delta(r_i,w_{i+1},a)$ where $s_i=at$ and $s_{i+1}=bt$.
 $r_m\in F$.
Intuitively, $s_0,s_1,...,s_m$ are the state of stacks at each position. Initially $s_0=\varepsilon$ means that the stack is empty.
Example:
Notice that the state transition function is of the form “$q_1\stackrel{w,a\rightarrow b}{\longrightarrow}q_2$” which means that at state $q_1$, we read character $w$ from tape, the pop character $a$ from the stack and push character $b$ into the stack. $a,b$ can be $\varepsilon$ so we can choose not to pop/push characters.
Theorem
A language is context free if and only if some pushdown automaton recognizes it.
Proof: we can transform any CFG to pushdown automaton and any pushdown automaton to CFG.

One direction: from CFG to pushdown automaton. Given a CFG and an input string $w$, we need to construct a pushdown automaton to accept $w$ if and only if the CFG can yield $w$.
The pushdown automaton work as follows:
 Place the marker symbol $ and the start variable on the stack,
 Repeat the following steps forever:
 If the top of stack is a variable $A$, then nondeterministically select one of the rules for $A$ and substitute $A$ by the string on the right side of the rule.
 If the top of stack is a terminal symbol $a$, read the next symbol from the input and compare it to $a$. If they match, repeat. If they do not match, reject on this branch of the nondeterminism.
 If the top stack is the symbol $, enter the accept state.
We can give the algorithm:

Construct four states as follows. And put all rules selfloop in $q_{loop}$ state:

Destruct the derivation rules:
In fact this step is to pop $S$ and push its derivation string. Pay attention that the string is pushed from right to left.

Add the compare of terminals:
if the top of stack and the readin symbol is the same, then pop it.
Now we can transform any CFG to PDA. Another example:

The other direction: transform PDA to CFG.
Given a PDA, for each pair of its states $p,q$, we create a variable $A_{pq}$ in the grammar. This variable will generates all the strings that can take the PDA from $p$ with an empty stack to $q$ with an empty stack. Obverve that such strings can also take the PDA from $p$ to $q$ with the same stack contents at $p$ and at $q$.
Firstly, we need to modify the PDA slightly to give it the following three features:
 It has a single accept state, $q_{accept}$.
 It empties its stack before accepting.
 Each transition either pushes a symbol onto the stack, or pops one off the stack, but does not do both at the same time.
Then, we construct the grammar as follows:

For each $p,q,r,s,t,a,b$, if we have the form:
$p\stackrel{a,\varepsilon\rightarrow t}{\longrightarrow}r\rightarrow...\rightarrow s\stackrel{b,t\rightarrow\varepsilon}{\longrightarrow}q$
then we add the rule: $A_{pq}\rightarrow aA_{rs}b$.

For each $p,q,r$, add the rule A_{pq}\rightarrow A_{pr}A_

For each $p$, add $A_{pp}\rightarrow \varepsilon$.

$A_{q_{start}q_{accept}}$ is the start variable.
You may get some insight for the construction from the following figures:
Example:
Theorem
Every regular language is context free.
Proof: obviously PDA can simulate any DFA.
Theorem
Context free language is closed under union.
Proof: Add the new rule $S\rightarrow S_1\;\;S_2$.
Theorem
Context free language is closed under concatenation.
Proof: Add the new rule $S\rightarrow S_1\;\;S_2$.
Theorem
Context free language is closed under star operation.
Theorem
Context free language is NOT closed under intersection.
Proof: $\{a^nb^nc^mn,m\geq 0\}\cap \{a^nb^mc^mn,m\geq 0\}$ is not context free.
Theorem
Context free language is NOT closed under complement.
Proof: We assume to the contrary, then $L_1\cap L_2=\sim(\sim L_1\cup\sim L_2)$. If CFL is closed under complement, then it should be closed under intersection, which is false.
Theorem
The intersection of a CFL and a regular language is context free.
Proof: For a PDA $P$ and NFA $Q$, we can construct a $PDA$ recognizing $L(P)\cap L(Q)$.
state: $(p_1,q_1)$ where $p_1\in P,q_1\in Q$.
transition:
$(p_1,q_1)\stackrel{a,b\rightarrow c}{\longrightarrow}(p_2,q_2)$
if and only if $p_1\stackrel{a,b\rightarrow c}{\longrightarrow}p_2$ in PDA and $q_1\stackrel{a}{\longrightarrow}q_2$ in NFA.
# Noncontextfree languages
Theorem(Pumping lemma for contextfree languages)
If $A$ is a contextfree languages, then there is a number $p$ (pumping length) where, if $s$ is any string in $A$ of length at least $p$, then $s$ may be divided into five pieces $s=uvxyz$ satisfying:
 for each $i\geq 0$, $uv^ixy^iz\in A$
 $vy>0$
 $vxy\leq p$
Proof: Say $V$ is the number of variables in $G$, $b$ is the maximum number of symbols in the right side of a rule. We set $p$ to be $b^{V}+1$.
If the parser tree’s height is $h$, then the string it parses is at most $b^h$ long.
So if the length of string is greater than $b^{V}$, then the height of the parser tree is bigger than $V$, in other words, we definitely have two same variables in the longest path in the parser tree.
Then we can substitute the small substree by the bigger subtree (their roots are $R$ both) as the figure above, then condition 1 is satisfied since $uv^ixy^iz$ can always be parsed.
Condition 2 is natually satisified since we don’t have the derivation $R\rightarrow R$, so either $v$ or $y$ is not empty.
Condition 3 is a bit tricky. We need to modify $p$ to $b^{V+2}$. Then the longest path in the parser tree is greater than $V+1$ still. We choose $R$ so that both occurences fall within tht bottom $V+1$ variables on the path. So the subtree is at most $V+2$ high (the bottom is terminals). Then we can guarantee that $vxy\leq b^{V+2}=p$.
Example: $\{a^nb^nc^nn\geq 0\}$ is not context free, since for any $p$, we can construct $a^pb^pc^p$, it cannot be divided into five pieces satisfying three conditions.
# The churchturing thesis
# Turing machines
Definition(Turing machine)
A Turing machine is a 7tuple, $(Q,\Sigma,\Gamma,\delta,q_0,q_{accept},q_{reject})$, where $Q,\Sigma,\Gamma$ are all finite sets and
 $Q$ is the set of states.
 $\Sigma$ is the input alphabet not containing the blank symbol $\sqcup$.
 $\Gamma$ is the tape alphabet, where $\sqcup\in\Gamma$ and $\Sigma\subseteq\Gamma$.
 $\delta:Q\times\Gamma\rightarrow Q\times\Gamma\times\{L,R\}$ is the transition function.
 $q_0\in Q$ is the start state.
 $q_{accept}\in Q$ is the accept state.
 $q_{reject}\in Q$ is the reject state, where $q_{accept}\neq q_{reject}$.
The configuration is
 a current state $q$
 a string $u$ over the tape alphabet
 a string $v$ over the tape alphabet
the configuration means that, the current of turing machine is $q$, the content of the tape is $uv$ and the head location is at the first symbol of $v$.
The configuration $C_1=(q,u,v)$ yields $C_2=(q',u',v')$ if:

$\delta(q,a)=(q',b,L)$
$u=wc\\ v=aw'\\ u'=w\\ v'=cbw'$

or $\delta(q,a)=(q',b,R)$
$u=wc\\ v=aw'\\ u'=wcb\\ v'=w'$
Intuitively, it’s just modifying the head location content from $a$ to $b$ and move the head.
Remark(Important!): When the head is already at the leftmost location of the tape, then if we try to move it left, then it will stay still. Thanks to this property, we can easily move head to the leftmost.
If we enter the accept configuration (the state in the configuration is $q_{accept}$) then the Turing machin will accept the input string. Similarly we have reject configuration. Both of them are called halting configuration.
Definition
Call a language Turingrecognizable if some Turing machine recoginizes it.
TM recognizes language $L$ means that for all string in $L$, TM will accept it. While for strings not in $L$, TM will reject or not halt.
Definition
Call a language Turingdecidable if some Turing machine decides it.
TM decides language $L$ means that for all string in $L$, TM will accept it. While for strings not in $L$, TM will reject it. The TM must halt.
Example: Here we describe a Turing machine that decides $A=\{0^{2^n}n\geq 0\}$.
$M=$ "On input string $w$:
 Sweep left to right across the tape, crossing off every other 0.
 If in stage 1 the tape contained a single 0, accept.
 If in stage 1 the tape contained more than a single 0 and the number of 0s was odd, reject.
 Return the head to the lefthand end of the tape
 Goto state 1."
The stage 1 is cutting the number of 0s in half.
# Varaints of turing machine
A multiple Turing machine is like an ordinary Turing machine with several tapes. Each tape has its own head for reading and writing.
If we have $k$ tapes, then the transition functions is like:
$\delta:Q\times\Gamma^k\rightarrow Q\times \Gamma^k\times \{L,R,S\}^k$
$S$ means “stay”. So all $k$ tapes can do their own work or pause.
Theorem
Every multitape Turing machine has an equivalent singletape Turing machine.
Proof: Say that $M$ has $k$ tapes, we can construct a single tape Turing machine to simulate its behavior.
$S$ = "On input $w=w_1...w_n$:

First $S$ puts its tape into the format that represents all $k$ tapes of $M$. The formatted tape contains:
$@\dot{w_1}w_2...w_n@\dot{\sqcup}@\dot{\sqcup}@...@\sqcup\sqcup\sqcup...$

To simulate a single move, $S$ scans its tape from the first @, which marks the lefthand end, to the $(k+1)$st @, which marks the righthand end, in order to determine the symbols under the virtual heads. Then $S$ makes a second pass to update the tapes according to the way that $M$'s transition function dictates.

If at any point $S$ moves one of the virtual heads to the right onto a @, this action signifies that $M$ has moved th corresponding head onto the previously unread blank portion of that tape. So $S$ writes a blank symbol on this tape cell and shifts the tape contents, from this cell until the rightmost @, one unit to the right.
$@\dot{w_1}w_2...w_n@a\dot{\sqcup}\underbrace {@\dot{\sqcup}@...@}_{shifted\;right}$
Then repeat the simulation."
Nondeterministic Turing machine can proceed different possibilities.
$\delta:Q\times\Gamma\rightarrow\mathcal{P}(Q\times\Gamma\times\{L,R\})$
and it accepts the input string if any branch accepts it.
Theorem
Every nondeterministic Turing machine has an equivalent singletape Turing machine.
Proof: We can use a threetapes Turing machine to simulate nondeterministic Turing machine.
Let’s first consider the data representation on tatpe 3. Every node in the searching tree can have most $b$ children, where $b$ is the size of largest set of possible choices given by $N$'s transition function.
To every node in the three we ssign an address that is a string over $\{1,2,...,b\}$.
For example, we assign the address 231 to the node we arrive at by starting at the root, going to its 2nd child, going to that node’s 3rd child, and finally going to that node’s 1st child. Each symbol in the string tells us which choice to make next when simulating a step in one branche in $N$'s nondeterministic computation.
If too few choices are available for a configuration, then the symbol is invalid.
Now we are ready to describe $D$.

Initially tape 1 contains the input $w$, and tapes 2 and 3 are empty.

Copy tape 1 to tape 2.

Use tape2 to simulate $N$ with input $w$ on one brach of its nondeterministic computation. Before each step of $N$ consult the next symbol on tape 3 to determine which choice to make.
If no more symbols remain on tape 3 or if this nondeterministic choice is invalid, abort this brach by going to stage 4. Also goto stage 4 if a rejecting configuration is encountered. If an accepting configuration is encountered, accept the input.

Replace the string on tape 3 the lexicographically next string. example:
$123\rightarrow 124\\ 12b\rightarrow 131\\ bbb\rightarrow1111\\$
Then repeat the simulation.
Theorem
A language is Turingrecognizable/decidable if and only if some nondeterministic Turing machine recognizes/decides it.
Definition(Enumerator)
An enumerator is a TM with two tapes: the work tape and the printer.

Initially both tapes are empty.

On the work tape it can read, write and move just like the ordinary TM.

On the printer, it can only print one of the symbols of the language alphabet, or @ serving as the “end of string” symbol. So the transition looks like:
$q_1\stackrel{\stackrel{c}{a\rightarrow b},D}{\longrightarrow}q_2$
meaning “if in state $q_1$, you see a symbol $a$ on the work tape, then replace it with $b$, move in the direction $D$, goto state $q_2$ and print $c$ to the printer”. Every time it prints a symbol, the head of the printer moves right. if $c$ is absent, then nothing happens to printer.

There is start state, but no accept or reject states. Entering a configuration from which there is no transition causes the enumerator to halt.

The language enumerated by an enumerator is the set of strings that it prints on the printer.
Example:
Theorem
A language is Turingrecognizable if and only if some enumerator enumerates it.
Proof: First we show that if we have an enumerator $E$ that enumerates a language $A$, a TM $M$ recognizes it.
$M$ = "On input $w$:
 Run $E$. Every time that $E$ outputs a string, compare it with $w$.
 If $w$ ever appears in the output of $E$, accept."
Now we do the other direction
$E$ = "Ignore the input
 Repeat the following for $i=1,2,3,...$
 Run $M$ for $i$ steps on each input $s_1,s_2,...,s_i$
 If any computations accept, print out the correspoinding $s_j$."
where $s_1,s_2,s_3,...$ is a list of all possible strings in $\Sigma^*$.
In fact, if $M$ accepts some string $w$, then it will appear on the printer infinite times.
ChurchTuring thesis: Intuitive notion of algorithms = algorithms defined by $\lambda$ calculus = algorithms defined by Turing machines.
# Decidability
# Decidable languegs
Theorem
$A_{DFA}=\{\langle B,w\rangleB$ is a DFA that accepts input string $w\}$ is a decidable language.
Proof:
$M$ = "On input $\langle B,w\rangle$, where $B$ is a DFA and $w$ is a string:
 Simulate $B$ on input $w$.
 If the simulation ends in an accept state, accept. If it ends in a nonaccepting state, reject."
Theorem
$A_{NFA}=\{\langle B,w\rangleB$ is a NFA that accepts input string $w\}$ is a decidable language.
Proof:
$M$ = "On input $\langle B,w\rangle$, where $B$ is a NFA and $w$ is a string:
 Convert $B$ to a DFA. Since the algorithm is precisely introduced before, it can be simulated by TM.
 Call $A_{DFA}$ decider."
Theorem
$A_{REX}=\{\langle B,w\rangleB$ is a regular expression that generates input string $w\}$ is a decidable language.
Proof:
$M$ = "On input $\langle B,w\rangle$, where $B$ is a regular expression and $w$ is a string:

Convert $B$ to its corresponding NFA.

Call $A_{NFA}$ decider."
Theorem
$E_{DFA}=\{\langle A\rangleA$ is a DFA and $L(A)=\emptyset\}$ is decidable.
Proof:
$T$ = "On input $\langle A\rangle$ where $A$ is a DFA:
 Mark the start state of $A$.
 Repeat until no new states get marked:
 Mark any state that has a transition coming into it from any state that is already marked.
 If no accept state is marked, accept; otherwise reject."
Theorem
$EQ_{DFA}=\{\langle A,B\rangleA$ and $B$ are DFAs and $L(A)=L(B)\}$ is decidable.
Proof:
$F$ = "On input $\langle A,B\rangle$, where $A$ and $B$ are DFAs:

Construct DFA $C$, where
$L(C)=(L(A)\cap\sim L(B))\cup(\sim L(A)\cap L(B))$
$L(C)$ is called the symmetric difference of $L(A)$ and $L(B)$.

Call $E_C$ to decider whether $L(C)$ is empty, if is, then accept. Otherwise reject."
Theorem
$A_{CFG}=\{\langle G,w\rangleG$ is a CFG that generates string $w\}$ is decidable.
Proof:
$S$ = "On input $\langle G,w\rangle$ where $G$ is a CFG and $w$ is a string:
 Convert $G$ to an equivalent grammar in Chomsky normal form.
 List all derivations with $2n1$ steps, where $n$ is the length of $w$, except if $n=0$, then check if we have the rule $S\rightarrow\varepsilon$.
 If any of these derivations generate $w$, accept; if not, reject."
Notice that if a string can be generated by a Chomsky normal form, then the number of derivation steps is at most $2n1$.
Theorem
$E_{CFG}=\{\langle G\rangleG$ is a CFG and $L(G)=\emptyset\}$ is decidable.
Proof:
$R$ = "On input $\langle G\rangle$ where $G$ is a CFG:
 Mark all terminal symbols in $G$.
 Repeat until no new variables get marked:
 Mark any variable $A$ where $G$ has a rule $A\rightarrow U_1U_2...U_k$ and each symbol $U_1,...,U_k$ has already been marked.
 If the start variable is not marked, accept; otherwise reject."
Theorem
$EQ_{CFG}=\{\langle A,B\rangleA,B$ are CFGs and $L(A)=L(B)\}$ is NOT decidable.
The proof is too hard now.
Theorem
Every contextfree language is decidable.
Proof: Let $G$ be a CFG for contextfree language $A$, then on input string $w$, we just need to call $A_{CFG}$ decider.
# The halting problem
Theorem
$A_{TM}=\{\langle M,w\rangle M$ is a TM and $M$ accepts $w\}$ is undecidable.
Proof: We assume to the contrary that $A_{TM}$ is decidable and has a decider $H$:
$H(\langle M,w\rangle)=\begin{cases}accept&if\;M\;accepts\;w\\ reject&if\;M\;does\;not\;accept\;w \end{cases}$
Now we construct a new TM:
$D$ = "On input $\langle M\rangle$, where $M$ is a TM:
 Run $H$ on input $\langle M,\langle M\rangle\rangle$.
 If $H$ accepts, then reject. If $H$ rejects, then accept."
Now what would happen if we run $D$ on input $\langle D\rangle$?
$D(\langle D\rangle)=\begin{cases} accept&D\;does\;not\;accept\;\lang D\rangle\\ reject&D\;accpet\;\langle D\rangle \end{cases}$
Which is obviously a contradiction.
So $H$ doesn’t exists.
Theorem
A language is decidable if and only it’s Turing recognizable and coTuringrecognizable (its complement is Turing recognizable).
Proof:
One direction: if $A$ is decidable, we can easily see that both $A$ and its complement $\bar{A}$ are Turingrecognizable.
The other direction: if both $A$ and $\bar{A}$ are Turingrecognizable, we let $M_1$ be the recognizer for $A$ and $M_2$ for $\bar{A}$. And we construct a Turing machine:
$M$ = "On input $w$:
 Run both $M_1$ and $M_2$ on input $w$ in parallel (two tapes).
 If $M_1$ accept, accept; If $M_2$ accept, reject."
Theorem
$\bar{A_{TM}}$ is not Turingrecognizable.
Proof: We already know that $A_{TM}$ is Turingrecognizable but not Turingdecidable. So its complement must NOT be Turingrecognizable.
# Reducibility
# Undecidable problems from language theory
Theorem
$HALT_{TM}=\{\langle M,w\rangleM$ is a TM and $M$ halts on input $w\}$ is undecidable.
Proof: We assume to the contrary, if $HALT_{TM}$ is decidable and $R$ is a decider. Then we can construct a decider for $A_{TM}$:
$S$ = "On input $\langle M,w\rangle$, an encoding of a TM and a string:
 Run $R$ on input $\langle M,w\rangle$.
 If $R$ rejects, reject
 If $R$ accepts, simulate $M$ on $w$ until it halts.
 If $M$ accepts, then accept; otherwise reject.
Since we already know that $A_{TM}$ is undecidable, so $R$ does not exist. So $HALT_{TM}$ is undecidable.
Theorem
$E_{TM}=\{\langle M\rangleM$ is a TM and $L(M)=\emptyset\}$ is undecidable.
Proof:
Like the last theorem, we can prove that, if we have the decider for $E_{TM}$, then $A_{TM}$ will be decidable too.
We assume to the contrary, $R$ is a decider for $E_{TM}$.
We construct a decider for $A_{TM}$:
$S$ = "On input string $\langle M,w\rangle$ where $M$ is a Turing machine and $w$ is a string:

Construct a new Turing machine $M'$:
$M'$ = "On input string $x$
 If $x\neq w$, reject
 If $x=w$, run $M$ on $w$ if $M$ accepts then accept."

Run $R$ on input $\langle M'\rangle$.

If $R$ accepts, reject; otherwise accept."
Remark: We don’t need to run $M$ on $w$, instead, we only need the encoding of $M'$ doing such work.
Obviously, $L(M')\neq\emptyset$ if and only if $M$ accepts $w$.
Theorem
$REGULAR_{TM}=\{\langle M\rangleM$ is a TM and $L(M)$ is a regular language$\}$ is undecidable.
Proof: We let $R$ be a decider that decides $REGULAR_{TM}$. Then we can construct a decider for $A_{TM}$:
$S$ = "On input $\langle M,w\rangle$, where $M$ is a TM and $w$ is a string:

Construct the following TM $M'$:
$M'$ = "On input $x$:
 If $x$ is of the form $0^n1^n$, accept.
 If $x$ does not have the form, run $M$ on input $w$ and accept if $M$ accepts $w$."

Run $R$ on input $\langle M'\rangle$.

If $R$ accepts, accept; otherwise reject."
Remark: Firstly, we don’t need to run $M$ on $w$, instead, we only need the encoding of $M'$ doing such work.
Obviously, if $M$ accept $w$, then $L(M')=\Sigma^*$. Because no matter of the input $x$, $M'$ will accept it. If $M$ doesn’t accept $w$, then $L(M')=\{0^n1^n\}$, since only $x=0^n1^n$, $M'$ will accept it. So $L(M')$ is regular $\Leftrightarrow$ $M$ accepts $w$.
So if $R$ acceptes $M'$, then $M$ accepts $w$.
Theorem
$EQ_{TM}=\{\langle M_1,M_2\rangleM_1,M_2$ are TMs and $L(M_1)=L(M_2)\}$ is undecidable.
Proof:
If we have the decider $R$ for $EQ_{TM}$, then we can construct the decider for $E_{TM}$:
$S$ = "On input $\langle M\rangle$, where $M$ is TM
 Run $R$ on $\langle M,M_1\rangle$, where $M_1$ is the Turing machine rejecting any input.
 If R accepts, accept; otherwise reject."
Obviously, $L(M)=\emptyset\Leftrightarrow L(M)=L(M_1)$.
Theorem
$ALL_{CFG}=\{\langle G\rangleG$ is a CFG and $L(G)=\Sigma^*\}$ is undecidable.
# Mapping reducibility
Definition
A function $f:\Sigma^*\rightarrow\Sigma^*$ is a computable function if some Turing machine $M$, on every input $w$, halts with just $f(w)$ on its tape.
Definition
Language $A$ is mapping reducible to language $B$, written $A\leq_m B$, if there is a computable function $f:\Sigma^*\rightarrow\Sigma^*$, where for every $w$:
$w\in A\Leftrightarrow f(w)\in B$
The function $f$ is called the reduction of $A$ to $B$.
Theorem
If $A\leq_m B$ and $A$ is undecidable, then $B$ is undecidable.
Proof:
If we have a decider for $B$, then for any input $w$, we can compute $f(w)$ and check whether it is in $B$. Then we can decide whether $w$ is in $A$.