During the holidays, my father introduced me to a multiplayer Boggle game for Android and iOS called Chasse Au Mot (Wordz in English); after playing for a few hours and getting my butt solidly kicked, I thought it would be nice to have a program that would find words for me. As it happens, I had been looking for a small project to do in Haskell, and this was a good opportunity. As an added bonus, I've been meaning to do some work with graphs, a data structure that I learned about in my algorithms and data structure classes, but never really got the chance to use in a practical setting. Extra bonus: I used a trie, a data structure I've heard about, but also never got to use in an actual application.
The code for deboggler is available on GitHub.
(Note: I do not actually use deboggler when I play, since it kills the entire fun of the game; it was just a computer scientist itch that I needed to scratch.)
The application design is very simple:
The dictionary and the board letters are passed as command-line arguments by the user. The time to find words for a 4x4 grid with my French dictionary file takes about 2.3 seconds on my Thinkpad laptop. I'm pretty sure that time could be improved, and I will probably use the opportunity to learn about Haskell's profling tools.
When I thought about solving the boggle problem, using an undirected graph seemed like a natural solution; that made me happy as I had recently watched the graph lectures from Bob Sedgewick's Coursera algorithms class and this gave me a great opportunity to apply those notions.
The nodes of the graph are the cubes and there is an edge between two nodes if they are adjacent on the board. For a 4x4 board, the graph looks like this:
Following Professor Sedgewick's comment that an adjacency list implementation is the most common way of representing graphs (due to most graphs being sparse), I created a Graph data type that is simply a wrapper around a map from integers to sets of integers.
type Cube = Int
newtype Graph = MkGraph (M.IntMap (S.Set Cube))
deriving (Show, Eq)
The graph uses integers for nodes. Initially I used the characters from the board, but since a letter can appear more than once on the board, I ended up with an incorrect graph. I also face the question of how do distinguish between two E's? In a language like Java, you might distinguish by comparing the identity of the objects, but in a language like Haskell, any character is as good as any other character. Using the numbers from 0 to 15 (for a 4x4 board) ensures uniqueness of the cubes and thus that the graph is properly formed. We can easily map from nodes to characters later.
The Graph
module exports five operations; the first three
are pretty standard:
empty :: Graph
empty = MkGraph (M.empty)
neighbors :: Cube -> Graph -> S.Set Cube
neighbors node (MkGraph m) =
M.findWithDefault S.empty node m
addEdge :: Cube -> Cube -> Graph -> Graph
addEdge x y (MkGraph m) =
MkGraph $ M.insertWith S.union x (S.singleton y)
$ M.insertWith S.union y (S.singleton x) m
We have a function to create an empty graph, a function to return the
nodes adjacent to a give node and a function to add an edge to a
graph. Because the graph is undirected, the addEdge
function adds a
mapping from x
to y
and from y
to x
.
The next function, boggleGraph
is specific to Boggle: it creates a
graph for an NxN board. For every cube of the board, add an edge from
that cube to its immediate neighbors. The code ensures that cubes on
the edge of the board (those that don't have 8 neighbors) are handled
properly.
Finally, we have our dfs
function that returns a list of the
paths of a certain depth starting at a given cube. I was particularly
happy with how concise the definition of dfs
was.
dfs :: Int -> Int -> Graph -> [[Cube]]
dfs depth cube g = go depth cube S.empty
where go 0 _ _ = []
go 1 curr _ = [[curr]]
go d curr visited = map (curr:) $ concat [go (d-1) adj (S.insert curr visited)
| adj <- S.toList (neighbors curr g)
, adj `S.notMember` visited]
Initially, I used a Set (Data.Set
) as my dictionary structure, but I
decided to have some fun an implement a Trie. Beyond the savings in
memory, and a small performance boost (runtime went from 2.9s to
2.3s), that was my first time using this data structure. I went with
what seemed like a not-so-terrible type definition:
data Trie = Trie Bool (M.Map Char Trie)
deriving (Show, Eq)
A trie is a pair containing a boolean value indicating if we are at
the end of a word, and a map from characters to other tries. The trie
module exports 4 basic functions. The empty
function returns an
empty trie. The add
function takes a ByteString (my application
handles user input as ByteStrings), inserts it into a trie and returns
a new trie.
add :: B.ByteString -> Trie -> Trie
add word (Trie eow m) =
case B.uncons word of
Nothing -> Trie True m
Just (c, cs) ->
case M.lookup c m of
Nothing -> Trie eow (M.insert c (add cs empty) m)
Just t -> Trie eow (M.insert c (add cs t) m)
The contains
function checks whether a word is part of the trie:
contains :: B.ByteString -> Trie -> Bool
contains word (Trie eow m) =
case B.uncons word of
Nothing -> eow
Just (c, cs) ->
case M.lookup c m of
Nothing -> False
Just t -> contains cs
Both functions are fairly straight-forward and actually share some
common structure; perhaps there's some refactoring to be done there.
Finally, we have a fromList
function that adds an entire list of
ByteString words into an empty trie.
The Graph and Trie modules are imported in the Main module, and put to good use. Earlier I mentioned that we'd need to convert from integers to the board actual. A small function handles this job:
pathToWord :: B.ByteString -> [Int] -> B.ByteString
pathToWord board path = B.pack [board `B.index` x | x <- path]
pathToWord
takes the letters of the board (written one after the
other) and a path (as computed by dfs
) and returns the word this
path corresponds to.
To select only valid words, we have a function that takes our trie of words (which we call a dictionary), the board string, a list of paths and returns a set containing all the valid words. We use a set to weed out the duplicate words.
findWords :: Dict -> B.ByteString -> [[Int]] -> S.Set B.ByteString
findWords dict board paths =
S.fromList [ word
| path <- paths
, let word = pathToWord board path
, T.contains word dict
]
Finally, the main function gets the dictionary file and board string
from the command line arguments, creates the trie, the graph and calls
dfs
. We do some massaging to the data and output it to the console.
This is only a first version of deboggler, there are many things I hope to improve in the future:
If you are a programmer who wants to learn Haskell, the next time you have a small project, use Haskell to do it! Yes, you could probably finish much quicker if you used your favorite language, but take the opportunity to flex you Haskell muscles!