In theoretical computer science, in particular in formal language theory, Kleene's algorithm transforms a given nondeterministic finite automaton (NFA) into a regular expression. Together with other conversion algorithms, it establishes the equivalence of several description formats for regular languages. Alternative presentations of the same method include the "elimination method" attributed to Brzozowski and McCluskey, the algorithm of McNaughton and Yamada,[1] and the use of Arden's lemma.

Algorithm description

edit

According to Gross and Yellen (2004),[2] the algorithm can be traced back to Kleene (1956).[3] A presentation of the algorithm in the case of deterministic finite automata (DFAs) is given in Hopcroft and Ullman (1979).[4] The presentation of the algorithm for NFAs below follows Gross and Yellen (2004).[2]

Given a nondeterministic finite automaton M = (Q, Σ, δ, q0, F), with Q = { q0,...,qn } its set of states, the algorithm computes

the sets Rk
ij
of all strings that take M from state qi to qj without going through any state numbered higher than k.

Here, "going through a state" means entering and leaving it, so both i and j may be higher than k, but no intermediate state may. Each set Rk
ij
is represented by a regular expression; the algorithm computes them step by step for k = -1, 0, ..., n. Since there is no state numbered higher than n, the regular expression Rn
0j
represents the set of all strings that take M from its start state q0 to qj. If F = { q1,...,qf } is the set of accept states, the regular expression Rn
01
| ... | Rn
0f
represents the language accepted by M.

The initial regular expressions, for k = -1, are computed as follows for ij:

R−1
ij
= a1 | ... | am       where qj ∈ δ(qi,a1), ..., qj ∈ δ(qi,am)

and as follows for i=j:

R−1
ii
= a1 | ... | am | ε       where qi ∈ δ(qi,a1), ..., qi ∈ δ(qi,am)

In other words, R−1
ij
mentions all letters that label a transition from i to j, and we also include ε in the case where i=j.

After that, in each step the expressions Rk
ij
are computed from the previous ones by

Rk
ij
= Rk-1
ik
(Rk-1
kk
)* Rk-1
kj
| Rk-1
ij

Another way to understand the operation of the algorithm is as an "elimination method", where the states from 0 to n are successively removed: when state k is removed, the regular expression Rk-1
ij
, which describes the words that label a path from state i>k to state j>k, is rewritten into Rk
ij
so as to take into account the possibility of going via the "eliminated" state k.

By induction on k, it can be shown that the length[5] of each expression Rk
ij
is at most 1/3(4k+1(6s+7) - 4) symbols, where s denotes the number of characters in Σ. Therefore, the length of the regular expression representing the language accepted by M is at most 1/3(4n+1(6s+7)f - f - 3) symbols, where f denotes the number of final states. This exponential blowup is inevitable, because there exist families of DFAs for which any equivalent regular expression must be of exponential size.[6]

In practice, the size of the regular expression obtained by running the algorithm can be very different depending on the order in which the states are considered by the procedure, i.e., the order in which they are numbered from 0 to n.

Example

edit
Example DFA given to Kleene's algorithm

The automaton shown in the picture can be described as M = (Q, Σ, δ, q0, F) with

  • the set of states Q = { q0, q1, q2 },
  • the input alphabet Σ = { a, b },
  • the transition function δ with δ(q0,a)=q0,   δ(q0,b)=q1,   δ(q1,a)=q2,   δ(q1,b)=q1,   δ(q2,a)=q1, and δ(q2,b)=q1,
  • the start state q0, and
  • set of accept states F = { q1 }.

Kleene's algorithm computes the initial regular expressions as

R−1
00
   
= a | ε
R−1
01
= b
R−1
02
= ∅
R−1
10
= ∅
R−1
11
= b | ε
R−1
12
= a
R−1
20
= ∅
R−1
21
= a | b
R−1
22
= ε

After that, the Rk
ij
are computed from the Rk-1
ij
step by step for k = 0, 1, 2. Kleene algebra equalities are used to simplify the regular expressions as much as possible.

Step 0
R0
00
   
= R−1
00
(R−1
00
)* R−1
00
| R−1
00
   
= (a | ε) (a | ε)* (a | ε) | a | ε     = a*
R0
01
= R−1
00
(R−1
00
)* R−1
01
| R−1
01
= (a | ε) (a | ε)* b | b = a* b
R0
02
= R−1
00
(R−1
00
)* R−1
02
| R−1
02
= (a | ε) (a | ε)* | ∅ = ∅
R0
10
= R−1
10
(R−1
00
)* R−1
00
| R−1
10
= ∅ (a | ε)* (a | ε) | ∅ = ∅
R0
11
= R−1
10
(R−1
00
)* R−1
01
| R−1
11
= ∅ (a | ε)* b | b | ε = b | ε
R0
12
= R−1
10
(R−1
00
)* R−1
02
| R−1
12
= ∅ (a | ε)* | a = a
R0
20
= R−1
20
(R−1
00
)* R−1
00
| R−1
20
= ∅ (a | ε)* (a | ε) | ∅ = ∅
R0
21
= R−1
20
(R−1
00
)* R−1
01
| R−1
21
= ∅ (a | ε)* b | a | b = a | b
R0
22
= R−1
20
(R−1
00
)* R−1
02
| R−1
22
= ∅ (a | ε)* | ε = ε
Step 1
R1
00
   
= R0
01
(R0
11
)* R0
10
| R0
00
   
= a*b (b | ε)* | a*         = a*
R1
01
= R0
01
(R0
11
)* R0
11
| R0
01
= a*b (b | ε)* (b | ε) | a* b = a* b* b
R1
02
= R0
01
(R0
11
)* R0
12
| R0
02
= a*b (b | ε)* a | ∅ = a* b* ba
R1
10
= R0
11
(R0
11
)* R0
10
| R0
10
= (b | ε) (b | ε)* | ∅ = ∅
R1
11
= R0
11
(R0
11
)* R0
11
| R0
11
= (b | ε) (b | ε)* (b | ε) | b | ε = b*
R1
12
= R0
11
(R0
11
)* R0
12
| R0
12
= (b | ε) (b | ε)* a | a = b* a
R1
20
= R0
21
(R0
11
)* R0
10
| R0
20
= (a | b) (b | ε)* | ∅ = ∅
R1
21
= R0
21
(R0
11
)* R0
11
| R0
21
= (a | b) (b | ε)* (b | ε) | a | b = (a | b) b*
R1
22
= R0
21
(R0
11
)* R0
12
| R0
22
= (a | b) (b | ε)* a | ε = (a | b) b* a | ε
Step 2
R2
00
   
= R1
02
(R1
22
)* R1
20
| R1
00
   
= a*b*ba ((a|b)b*a | ε)* | a* = a*
R2
01
= R1
02
(R1
22
)* R1
21
| R1
01
= a*b*ba ((a|b)b*a | ε)* (a|b)b* | a* b* b = a* b (a (a | b) | b)*
R2
02
= R1
02
(R1
22
)* R1
22
| R1
02
= a*b*ba ((a|b)b*a | ε)* ((a|b)b*a | ε) | a* b* ba = a* b* b (a (a | b) b*)* a
R2
10
= R1
12
(R1
22
)* R1
20
| R1
10
= b* a ((a|b)b*a | ε)* | ∅ = ∅
R2
11
= R1
12
(R1
22
)* R1
21
| R1
11
= b* a ((a|b)b*a | ε)* (a|b)b* | b* = (a (a | b) | b)*
R2
12
= R1
12
(R1
22
)* R1
22
| R1
12
= b* a ((a|b)b*a | ε)* ((a|b)b*a | ε) | b* a = (a (a | b) | b)* a
R2
20
= R1
22
(R1
22
)* R1
20
| R1
20
= ((a|b)b*a | ε) ((a|b)b*a | ε)* | ∅ = ∅
R2
21
= R1
22
(R1
22
)* R1
21
| R1
21
= ((a|b)b*a | ε) ((a|b)b*a | ε)* (a|b)b* | (a | b) b* = (a | b) (a (a | b) | b)*
R2
22
= R1
22
(R1
22
)* R1
22
| R1
22
= ((a|b)b*a | ε) ((a|b)b*a | ε)* ((a|b)b*a | ε) | (a | b) b* a | ε     = ((a | b) b* a)*

Since q0 is the start state and q1 is the only accept state, the regular expression R2
01
denotes the set of all strings accepted by the automaton.

See also

edit

References

edit
  1. ^ McNaughton, R.; Yamada, H. (March 1960). "Regular Expressions and State Graphs for Automata". IRE Transactions on Electronic Computers. EC-9 (1): 39–47. doi:10.1109/TEC.1960.5221603. ISSN 0367-9950.
  2. ^ a b Jonathan L. Gross and Jay Yellen, ed. (2004). Handbook of Graph Theory. Discrete Mathematics and it Applications. CRC Press. ISBN 1-58488-090-2. Here: sect.2.1, remark R13 on p.65
  3. ^ Kleene, Stephen C. (1956). "Representation of Events in Nerve Nets and Finite Automata" (PDF). Automata Studies, Annals of Math. Studies. 34. Princeton Univ. Press. Here: sect.9, p.37-40
  4. ^ John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. Here: Section 3.2.1 pages 91-96
  5. ^ More precisely, the number of regular-expression symbols, "ai", "ε", "|", "*", "·"; not counting parentheses.
  6. ^ Gruber, Hermann; Holzer, Markus (2008). "Finite Automata, Digraph Connectivity, and Regular Expression Size". In Aceto, Luca; Damgård, Ivan; Goldberg, Leslie Ann; Halldórsson, Magnús M.; Ingólfsdóttir, Anna; Walukiewicz, Igor (eds.). Automata, Languages and Programming. Lecture Notes in Computer Science. Vol. 5126. Springer Berlin Heidelberg. pp. 39–50. doi:10.1007/978-3-540-70583-3_4. ISBN 9783540705833. S2CID 10975422.. Theorem 16.

📚 Artikel Terkait di Wikipedia

Stephen Cole Kleene

Stephen Cole Kleene (/ˈkleɪni/ KLAY-nee; January 5, 1909 – January 25, 1994) was an American mathematician and logician. One of the students of Alonzo

Algorithm

In mathematics and computer science, an algorithm (/ˈælɡərɪðəm/ ) is a finite sequence of mathematically rigorous instructions, typically used to solve

Kleene algebra

states), the regular expressions computed from Kleene's algorithm evaluates, in this particular Kleene algebra, to the shortest path length between the

Floyd–Warshall algorithm

Floyd–Warshall algorithm (also known as Floyd's algorithm, the Roy–Warshall algorithm, the Roy–Floyd algorithm, or the WFI algorithm) is an algorithm for finding

Thompson's construction

expressions; an earlier algorithm was given by McNaughton and Yamada. Converse to Thompson's construction, Kleene's algorithm transforms a finite automaton

Glushkov's construction algorithm

that is, the regular languages. The converse of Glushkov's algorithm is Kleene's algorithm, which transforms a finite automaton into a regular expression

Kleene star

Sardinas-Patterson algorithm can be used to check for a given V whether any member of V* can be obtained in more than one way. Example of Kleene and Kleene plus applied

Nondeterministic finite automaton

an algorithm for compiling a regular expression to an NFA that can efficiently perform pattern matching on strings. Conversely, Kleene's algorithm can