Deboggler

Introduction

During the holidays, my father introduced me to a multiplayer Boggle game for Android and iOS called Chasse Au Mot (Wordz in English); after playing for a few hours and getting my butt solidly kicked, I thought it would be nice to have a program that would find words for me. As it happens, I had been looking for a small project to do in Haskell, and this was a good opportunity. As an added bonus, I've been meaning to do some work with graphs, a data structure that I learned about in my algorithms and data structure classes, but never really got the chance to use in a practical setting. Extra bonus: I used a trie, a data structure I've heard about, but also never got to use in an actual application.

The code for deboggler is available on GitHub.

(Note: I do not actually use deboggler when I play, since it kills the entire fun of the game; it was just a computer scientist itch that I needed to scratch.)

Application design

The application design is very simple:

Create a dictionary of words from a text file;
Build a graph of the board;
Perform a depth-first search of the graph (depth ranging from 3 to 10, the minimum and maximum length of words) starting at every cube;
Print out the valid words, sorted by ascending length.

The dictionary and the board letters are passed as command-line arguments by the user. The time to find words for a 4x4 grid with my French dictionary file takes about 2.3 seconds on my Thinkpad laptop. I'm pretty sure that time could be improved, and I will probably use the opportunity to learn about Haskell's profling tools.

Graph

When I thought about solving the boggle problem, using an undirected graph seemed like a natural solution; that made me happy as I had recently watched the graph lectures from Bob Sedgewick's Coursera algorithms class and this gave me a great opportunity to apply those notions.

The nodes of the graph are the cubes and there is an edge between two nodes if they are adjacent on the board. For a 4x4 board, the graph looks like this:

Boggle Graph

Following Professor Sedgewick's comment that an adjacency list implementation is the most common way of representing graphs (due to most graphs being sparse), I created a Graph data type that is simply a wrapper around a map from integers to sets of integers.

type Cube = Int

newtype Graph = MkGraph (M.IntMap (S.Set Cube))
    deriving (Show, Eq)

The graph uses integers for nodes. Initially I used the characters from the board, but since a letter can appear more than once on the board, I ended up with an incorrect graph. I also face the question of how do distinguish between two E's? In a language like Java, you might distinguish by comparing the identity of the objects, but in a language like Haskell, any character is as good as any other character. Using the numbers from 0 to 15 (for a 4x4 board) ensures uniqueness of the cubes and thus that the graph is properly formed. We can easily map from nodes to characters later.

The Graph module exports five operations; the first three are pretty standard:

empty :: Graph
empty = MkGraph (M.empty)

neighbors :: Cube -> Graph -> S.Set Cube
neighbors node (MkGraph m) =
    M.findWithDefault S.empty node m

addEdge :: Cube -> Cube -> Graph -> Graph
addEdge x y (MkGraph m) =
    MkGraph $ M.insertWith S.union x (S.singleton y)
            $ M.insertWith S.union y (S.singleton x) m

We have a function to create an empty graph, a function to return the nodes adjacent to a give node and a function to add an edge to a graph. Because the graph is undirected, the addEdge function adds a mapping from x to y and from y to x.

The next function, boggleGraph is specific to Boggle: it creates a graph for an NxN board. For every cube of the board, add an edge from that cube to its immediate neighbors. The code ensures that cubes on the edge of the board (those that don't have 8 neighbors) are handled properly.

Finally, we have our dfs function that returns a list of the paths of a certain depth starting at a given cube. I was particularly happy with how concise the definition of dfs was.

dfs :: Int -> Int -> Graph -> [[Cube]]
dfs depth cube g = go depth cube S.empty
    where go 0 _ _    = []
          go 1 curr _ = [[curr]]
          go d curr visited = map (curr:) $ concat [go (d-1) adj (S.insert curr visited)
                                                    | adj <- S.toList (neighbors curr g)
                                                    , adj `S.notMember` visited]

Trie

Initially, I used a Set (Data.Set) as my dictionary structure, but I decided to have some fun an implement a Trie. Beyond the savings in memory, and a small performance boost (runtime went from 2.9s to 2.3s), that was my first time using this data structure. I went with what seemed like a not-so-terrible type definition:

data Trie = Trie Bool (M.Map Char Trie)
          deriving (Show, Eq)

A trie is a pair containing a boolean value indicating if we are at the end of a word, and a map from characters to other tries. The trie module exports 4 basic functions. The empty function returns an empty trie. The add function takes a ByteString (my application handles user input as ByteStrings), inserts it into a trie and returns a new trie.

add :: B.ByteString -> Trie -> Trie
add word (Trie eow m) =
    case B.uncons word of
      Nothing -> Trie True m
      Just (c, cs) ->
          case M.lookup c m of
            Nothing -> Trie eow (M.insert c (add cs empty) m)
            Just t  -> Trie eow (M.insert c (add cs t) m)

The contains function checks whether a word is part of the trie:

contains :: B.ByteString -> Trie -> Bool
contains word (Trie eow m) =
    case B.uncons word of
      Nothing -> eow
      Just (c, cs) ->
          case M.lookup c m of
            Nothing -> False
            Just t  -> contains cs

Both functions are fairly straight-forward and actually share some common structure; perhaps there's some refactoring to be done there. Finally, we have a fromList function that adds an entire list of ByteString words into an empty trie.

Putting it all together

The Graph and Trie modules are imported in the Main module, and put to good use. Earlier I mentioned that we'd need to convert from integers to the board actual. A small function handles this job:

pathToWord :: B.ByteString -> [Int] -> B.ByteString
pathToWord board path = B.pack [board `B.index` x | x <- path]

pathToWord takes the letters of the board (written one after the other) and a path (as computed by dfs) and returns the word this path corresponds to.

To select only valid words, we have a function that takes our trie of words (which we call a dictionary), the board string, a list of paths and returns a set containing all the valid words. We use a set to weed out the duplicate words.

findWords :: Dict -> B.ByteString -> [[Int]] -> S.Set B.ByteString
findWords dict board paths =
    S.fromList [ word
               | path <- paths
               , let word = pathToWord board path
               , T.contains word dict
               ]

Finally, the main function gets the dictionary file and board string from the command line arguments, creates the trie, the graph and calls dfs. We do some massaging to the data and output it to the console.

Future work

This is only a first version of deboggler, there are many things I hope to improve in the future:

The problem should be easily parallelizable by starting a new thread for words starting at every cube; with Haskell's choice of parallel solutions, that should be a lot of fun.
I have not paid too much attention to the performance of my data structures; I hope to use the Haskell profiling tools to find possible bottlenecks.
The interface is a bit crude; I might want to try making a better command-line interface or perhaps a GUI using threepenny?

Closing thoughts

If you are a programmer who wants to learn Haskell, the next time you have a small project, use Haskell to do it! Yes, you could probably finish much quicker if you used your favorite language, but take the opportunity to flex you Haskell muscles!