A Better Approach to DFA Minimization

DFA minimization is often described as follows:

  1. First eliminate any unreachable states (easy).
  2. Then create a table of all possible pairs of states \((p, q)\), initially unmarked. (E.g. a two-dimensional array of booleans, initially set to false.) We mark pairs \((p, q)\) as and when we discover that \(p\) and \(q\) cannot be equivalent.
    1. Start by marking all pairs \((p, q)\) where \(p \in F\) and \(q \not\in F\), or vice versa.
    2. Look for unmarked pairs \((p, q)\) such that for some \(u \in \Sigma\), the pair \((δ(p, u), δ(q, u))\) is marked. Then mark \((p, q)\).
    3. Repeat step 2.2 until no such unmarked pairs remain. If \((p, q)\) is still unmarked, can collapse \(p\) and \(q\) to a single state.

Source: Mary Cryan, Informatics 2A: Processing Formal and Natural Languages, Lecture 5, 2018.

Based on my very limited research, aforementioned algorithm seems to be the most common way of teaching undergrads how to minimise a given DFA, but I believe that it suffers from reasons soon listed below, and I will instead propose a better way of executing the same algorithm that does not (make the student) suffer from the same problems.

Current Problems

In order of increasing severity:

  1. Drawing a triangular half-table by hand is confusing as not getting the row/column spacing right will result in a shape that might confuse the student, and the distance between the cell and its column/row header makes it harder to process the information.
  2. “Looking for unmarked pairs” makes you revisit the same pairs again and again as new pairs are getting marked.

A Better Approach

  1. First eliminate any unreachable states (easy).
  2. Then create a list of all possible pairs of states \((p, q)\), initially unmarked. When we discover that \(p\) and \(q\) cannot be equivalent, we mark pairs \((p, q)\) by putting a mark, iteration number, and some other information we’ll soon learn.
    1. Start by marking all pairs \((p, q)\) where \(p \in F\) and \(q \not\in F\), or vice versa.

      While marking record the iteration number (#1) as well.
    2. For each marked pair \((p, q)\) from the previous iteration, mark all unmarked pairs \((p’, q’)\) if \(\exists u \in \Sigma\) such that \((δ(p’, u), δ(q’, u)) = (p, q)\).

      While marking, record the iteration number, the state \((p, q)\) that you came from, and the letter \(u\) as well.
    3. Repeat step 2.2 as a new iteration until no new pairs marked.

The greatest difference between the previous approach and ours is step 2.2 where, instead of looking for the same unmarked pairs again and again, we consider only the marked states from the previous iteration, which makes the whole process far more efficient and is easier to keep track of (as we record the iteration number while marking) and to verify if needed (as we record the previously marked state that we came from, and the letter \(u\)).

An Example

An exemplary DFA.

There are no unreachable states to eliminate.

Let us create a list of all possible pairs of states \((p, q)\), initially unmarked.

  • \((q0, q1)\)
  • \((q0, q2)\)
  • \((q0, q3)\)
  • \((q0, q4)\)
  • \((q0, q5)\)
  • \((q0, q6)\)
  • \((q1, q2)\)
  • \((q1, q3)\)
  • \((q1, q4)\)
  • \((q1, q5)\)
  • \((q1, q6)\)
  • \((q2, q3)\)
  • \((q2, q4)\)
  • \((q2, q5)\)
  • \((q2, q6)\)
  • \((q3, q4)\)
  • \((q3, q5)\)
  • \((q3, q6)\)
  • \((q4, q5)\)
  • \((q4, q6)\)
  • \((q5, q6)\)

Iteration #1

Start by marking all pairs \((p, q)\) where \(p \in F\) and \(q \not\in F\), or vice versa.

  • \((q0, q1)\)
  • \((q0, q2)\)
  • \((q0, q3)\)
  • \((q0, q4)\)
  • \((q0, q5)\ \unicode{10003}_{1}\)
  • \((q0, q6)\ \unicode{10003}_{1}\)
  • \((q1, q2)\)
  • \((q1, q3)\)
  • \((q1, q4)\)
  • \((q1, q5)\ \unicode{10003}_{1}\)
  • \((q1, q6)\ \unicode{10003}_{1}\)
  • \((q2, q3)\)
  • \((q2, q4)\)
  • \((q2, q5)\ \unicode{10003}_{1}\)
  • \((q2, q6)\ \unicode{10003}_{1}\)
  • \((q3, q4)\)
  • \((q3, q5)\ \unicode{10003}_{1}\)
  • \((q3, q6)\ \unicode{10003}_{1}\)
  • \((q4, q5)\ \unicode{10003}_{1}\)
  • \((q4, q6)\ \unicode{10003}_{1}\)
  • \((q5, q6)\)

Iteration #2

For each marked pair \((p, q)\) from the previous iteration, mark all unmarked pairs \((p’, q’)\) if \(\exists u \in \Sigma\) such that \((δ(p’, u), δ(q’, u)) = (p, q)\).

For instance, the pair \((q2, q5)\) that we’ve marked in the previous iteration, can be reached from \((q0, q3)\) with letter ‘b’.
  • \((q0, q1)\)
  • \((q0, q2)\ \unicode{10003}_{2\ (q2, q6)\ b}\)
  • \((q0, q3)\ \unicode{10003}_{2\ (q2, q5)\ b}\)
  • \((q0, q4)\ \unicode{10003}_{2\ (q2, q6)\ b}\)
  • \((q0, q5)\ \unicode{10003}_{1}\)
  • \((q0, q6)\ \unicode{10003}_{1}\)
  • \((q1, q2)\ \unicode{10003}_{2\ (q1, q3)\ b}\)
  • \((q1, q3)\ \unicode{10003}_{2\ (q3, q5)\ b}\)
  • \((q1, q4)\ \unicode{10003}_{2\ (q3, q6)\ b}\)
  • \((q1, q5)\ \unicode{10003}_{1}\)
  • \((q1, q6)\ \unicode{10003}_{1}\)
  • \((q2, q3)\)
  • \((q2, q4)\)
  • \((q2, q5)\ \unicode{10003}_{1}\)
  • \((q2, q6)\ \unicode{10003}_{1}\)
  • \((q3, q4)\)
  • \((q3, q5)\ \unicode{10003}_{1}\)
  • \((q3, q6)\ \unicode{10003}_{1}\)
  • \((q4, q5)\ \unicode{10003}_{1}\)
  • \((q4, q6)\ \unicode{10003}_{1}\)
  • \((q5, q6)\)

Iteration #3

For each marked pair \((p, q)\) from the previous iteration, mark all unmarked pairs \((p’, q’)\) if \(\exists u \in \Sigma\) such that \((δ(p’, u), δ(q’, u)) = (p, q)\).

For instance, the pair \((q1, q2)\) that we’ve marked in the previous iteration, can be reached from \((q0, q1)\) with letter ‘a’.
  • \((q0, q1)\ \unicode{10003}_{3\ (q1, q2)\ a}\)
  • \((q0, q2)\ \unicode{10003}_{2\ (q2, q6)\ b}\)
  • \((q0, q3)\ \unicode{10003}_{2\ (q2, q5)\ b}\)
  • \((q0, q4)\ \unicode{10003}_{2\ (q2, q6)\ b}\)
  • \((q0, q5)\ \unicode{10003}_{1}\)
  • \((q0, q6)\ \unicode{10003}_{1}\)
  • \((q1, q2)\ \unicode{10003}_{2\ (q1, q3)\ b}\)
  • \((q1, q3)\ \unicode{10003}_{2\ (q3, q5)\ b}\)
  • \((q1, q4)\ \unicode{10003}_{2\ (q3, q6)\ b}\)
  • \((q1, q5)\ \unicode{10003}_{1}\)
  • \((q1, q6)\ \unicode{10003}_{1}\)
  • \((q2, q3)\)
  • \((q2, q4)\)
  • \((q2, q5)\ \unicode{10003}_{1}\)
  • \((q2, q6)\ \unicode{10003}_{1}\)
  • \((q3, q4)\)
  • \((q3, q5)\ \unicode{10003}_{1}\)
  • \((q3, q6)\ \unicode{10003}_{1}\)
  • \((q4, q5)\ \unicode{10003}_{1}\)
  • \((q4, q6)\ \unicode{10003}_{1}\)
  • \((q5, q6)\)

Iteration #4

For each marked pair \((p, q)\) from the previous iteration, mark all unmarked pairs \((p’, q’)\) if \(\exists u \in \Sigma\) such that \((δ(p’, u), δ(q’, u)) = (p, q)\).

The only pair we marked in the previous iteration, \((q0, q1)\) is not reachable from any other unmarked state so no new pairs are marked in this iteration.

Thus we are done!

Results

The unmarked pairs in our list are:

  • \((q2, q3)\)
  • \((q2, q4)\)
  • \((q3, q4)\)
  • \((q5, q6)\)

And thus:

The Minimized DFA

Hope you find it useful!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.