An anagram is a type of puzzle in which letters of a word or phrase must be
rearranged to form a new word or phrase. Since ancient times, anagrams have
provided a source of entertainment, and occasionally, even religious
inspiration. The puzzle may specify that each letter must appear as many times
as in the original, or the rules may allow one to ignore some of the original
letters. Anagrams often have multiple solutions.
anagrams1.py
)The Python program anagrams1.py
finds all valid English words
that can be formed by reordering a given sequence of letters. Each solution word
must use all of the letters in the original "base word". The base word may be
specified via the command line; if not, the program prompts the user to enter
this word interactively. Note that the base word need not be a valid English
word.
The algorithm operates as follows: (1) We construct a letter-frequency
dict
(Python dictionary structure) from the base word; each key in
this dict
is a letter appearing in the base word, and the
corresponding value is the number of occurrences of that letter. (2) We loop
through the spelling dictionary, creating a letter-frequency dict
for each word and then comparing it against the base word's letter-frequency
dict
. If the two dict
s are identical, an anagram has
been found (unless the two words are identical).
Note that the word dictionary is used above in two very different senses.
A Python dict
is a
hash table (type of data
structure).
Selling points of this algorithm are (A) it is simple and (B) it is fast, with running times that are essentially independent of the length of the base word. An alternative algorithm permutes the letters in the base word, checking each permutation against the spelling dictionary. The alternative algorithm is very fast for short words, but impractically slow for words having more than about eight letters.
If you do not have a Python interpreter on your computer and don't want to
install one, you can run anagrams1.py
online (i.e., on the server)
by entering the base word into the box, selecting one of the two spelling
dictionaries, and then clicking on the button below the box.
anagrams2.py
)The Python program anagrams2.py
solves a more general anagram
problem, finding all valid English words that can be formed using any subset of
the letters in a given 'base word'. The 'base word' may be specified via the
command line; if not, the program prompts the user to enter this word
interactively. Note that the 'base word' need not be a valid English word, and
that no anagram may use any letter more times than it appears in the base word.
The algorithm of anagrams1.py
was extremely simple, but cannot
be practically applied to the more general problem. To see why, note that there
are 2**n-1
non-empty subsets of the letters in an n-letter word
(assuming that all letters are distinct). For a 10-letter word, for example,
there are 1023 non-empty subsets of the letters. To apply the algorithm of
anagrams1.py
, we would have to test each of these 1023 possible
subsets against each word in the 181,000-word spelling dictionary—a total
of 185,163,000 comparisons.
A slightly more sophisticated algorithm, known as depth-first search
, is needed here. In detail, anagrams2.py
operates as follows:
We load the spelling dictionary and store it as a Python
dict
.
We construct a second dict
that stores all word prefixes.
The definition of prefix that applies here is any sequence of letters
such that at least one longer English word begins with that sequence.
We initialize an empty list that will store anagrams as they are found.
We perform multiple top-level invocations of a recursive search function (see below), passing the following two strings to it:
a one-letter 'word' formed by extracting one letter from the base word, and
the remaining letters of the base word.
The search function tests the current word; if it appears in the spelling dictionary, it is added to the list of anagrams. Whether the current word appears in the spelling dictionary or not, if it appears in the prefixes dictionary and unused base word letters remain, the search function calls itself recursively, appending one letter to the base word and removing that letter from the list of available letters. (One recursive call is made for each unique letter in the list of available letters).
After the recursive search has terminated, we remove the base word from the list of anagrams if it is present. (The base word cannot be considered an anagram of itself).
To run anagrams2.py
online, enter the base word into the box,
select one of the two spelling dictionaries, and then click on the button below
the box.
anagrams3.py
)The Python program anagrams3.py
finds English words and phrases
that can be formed using all letters in a given 'base word'. Blanks are
automatically inserted between words as needed. The 'base word' may be
specified via the command line; if not, the program prompts the user to enter
this word interactively.
To limit the number of solutions, this program is currently configured to use the small spelling dictionary; one can change this by editing the code.
If the base word contains more than about 10 letters, the program running time may be quite long (more than 30 seconds on a fast computer, and longer on a slow one), and the number of solutions may also be very large.
The algorithm is somewhat more complex than that of
anagrams2.py
, but uses the same basic ideas. See the code, which is
thoroughly documented, for details.
The code filters out phrases that end with any of the following words on the grounds that such phrases are guaranteed to be nonsensical: a, an, and, are, as, at, for, i, if, in, is, its, my, of, on, or, the, to, and was.
The benefit of the above-mentioned filtering is limited. Given only the spelling dictionary, there is no way to make the code exclude all or even most phrases that are non-grammatical and/or unintelligible, and one may have to sift a large volume of output to find a few reasonable answers. (If you have a solution to this problem, please let me know).
The following test case demonstrates the problem. I entered
bestofthree
for the base 'word' and allowed a maximum of three
words in the output phrase. The code produced 484 results. The vast majority
of these are not meaningful. A few are meaningful, albeit neither profound nor
deeply inspiring. Here are a few examples of these:
be oft threes be soft there be the forest be there soft be threes oft Bee forth set bet for these both set free sort the beef the soft beer
Two almost make sense, but not quite:
be frost thee rob theft see
(Be frost thee is grammatically incorrect. (It should be be frost thou).
Last update: 18 June, 2015