Anagrams

Dr. Phillip M. Feldman

1. Introduction

An anagram is a type of puzzle in which letters of a word or phrase must be rearranged to form a new word or phrase. Since ancient times, anagrams have provided a source of entertainment, and occasionally, even religious inspiration. The puzzle may specify that each letter must appear as many times as in the original, or the rules may allow one to ignore some of the original letters. Anagrams often have multiple solutions.

2. Single-Word Anagrams Using All Letters (`anagrams1.py`)

The Python program anagrams1.py finds all valid English words that can be formed by reordering a given sequence of letters. Each solution word must use all of the letters in the original "base word". The base word may be specified via the command line; if not, the program prompts the user to enter this word interactively. Note that the base word need not be a valid English word.

The algorithm operates as follows: (1) We construct a letter-frequency dict (Python dictionary structure) from the base word; each key in this dict is a letter appearing in the base word, and the corresponding value is the number of occurrences of that letter. (2) We loop through the spelling dictionary, creating a letter-frequency dict for each word and then comparing it against the base word's letter-frequency dict. If the two dicts are identical, an anagram has been found (unless the two words are identical).

Note that the word dictionary is used above in two very different senses. A Python dict is a hash table (type of data structure).

Selling points of this algorithm are (A) it is simple and (B) it is fast, with running times that are essentially independent of the length of the base word. An alternative algorithm permutes the letters in the base word, checking each permutation against the spelling dictionary. The alternative algorithm is very fast for short words, but impractically slow for words having more than about eight letters.

anagrams1.py

If you do not have a Python interpreter on your computer and don't want to install one, you can run anagrams1.py online (i.e., on the server) by entering the base word into the box, selecting one of the two spelling dictionaries, and then clicking on the button below the box.

3. Single-Word Anagrams Using a Subset of the Letters (`anagrams2.py`)

The Python program anagrams2.py solves a more general anagram problem, finding all valid English words that can be formed using any subset of the letters in a given 'base word'. The 'base word' may be specified via the command line; if not, the program prompts the user to enter this word interactively. Note that the 'base word' need not be a valid English word, and that no anagram may use any letter more times than it appears in the base word.

The algorithm of anagrams1.py was extremely simple, but cannot be practically applied to the more general problem. To see why, note that there are 2**n-1 non-empty subsets of the letters in an n-letter word (assuming that all letters are distinct). For a 10-letter word, for example, there are 1023 non-empty subsets of the letters. To apply the algorithm of anagrams1.py, we would have to test each of these 1023 possible subsets against each word in the 181,000-word spelling dictionary—a total of 185,163,000 comparisons.

A slightly more sophisticated algorithm, known as depth-first search , is needed here. In detail, anagrams2.py operates as follows:

We load the spelling dictionary and store it as a Python dict.
We construct a second dict that stores all word prefixes. The definition of prefix that applies here is any sequence of letters such that at least one longer English word begins with that sequence.
We initialize an empty list that will store anagrams as they are found.
We perform multiple top-level invocations of a recursive search function (see below), passing the following two strings to it:
- a one-letter 'word' formed by extracting one letter from the base word, and
- the remaining letters of the base word.
One top-level invocation is performed per unique letter in the base word, e.g., if the base word is 'foot', three top-level invocations will be done.
The search function tests the current word; if it appears in the spelling dictionary, it is added to the list of anagrams. Whether the current word appears in the spelling dictionary or not, if it appears in the prefixes dictionary and unused base word letters remain, the search function calls itself recursively, appending one letter to the base word and removing that letter from the list of available letters. (One recursive call is made for each unique letter in the list of available letters).
After the recursive search has terminated, we remove the base word from the list of anagrams if it is present. (The base word cannot be considered an anagram of itself).

anagrams2.py

To run anagrams2.py online, enter the base word into the box, select one of the two spelling dictionaries, and then click on the button below the box.

4. Multi-Word Anagrams Using All Letters (`anagrams3.py`)

The Python program anagrams3.py finds English words and phrases that can be formed using all letters in a given 'base word'. Blanks are automatically inserted between words as needed. The 'base word' may be specified via the command line; if not, the program prompts the user to enter this word interactively.

To limit the number of solutions, this program is currently configured to use the small spelling dictionary; one can change this by editing the code.

If the base word contains more than about 10 letters, the program running time may be quite long (more than 30 seconds on a fast computer, and longer on a slow one), and the number of solutions may also be very large.

The algorithm is somewhat more complex than that of anagrams2.py, but uses the same basic ideas. See the code, which is thoroughly documented, for details.

The code filters out phrases that end with any of the following words on the grounds that such phrases are guaranteed to be nonsensical: a, an, and, are, as, at, for, i, if, in, is, its, my, of, on, or, the, to, and was.

The benefit of the above-mentioned filtering is limited. Given only the spelling dictionary, there is no way to make the code exclude all or even most phrases that are non-grammatical and/or unintelligible, and one may have to sift a large volume of output to find a few reasonable answers. (If you have a solution to this problem, please let me know).

The following test case demonstrates the problem. I entered bestofthree for the base 'word' and allowed a maximum of three words in the output phrase. The code produced 484 results. The vast majority of these are not meaningful. A few are meaningful, albeit neither profound nor deeply inspiring. Here are a few examples of these:

be oft threes
be soft there
be the forest
be there soft
be threes oft
Bee forth set
bet for these
both set free
sort the beef
the soft beer

Two almost make sense, but not quite:

be frost thee
rob theft see

(Be frost thee is grammatically incorrect. (It should be be frost thou).

anagrams3.py

Last update: 18 June, 2015

Anagrams

Dr. Phillip M. Feldman

1. Introduction

2. Single-Word Anagrams Using All Letters (anagrams1.py)

3. Single-Word Anagrams Using a Subset of the Letters (anagrams2.py)

4. Multi-Word Anagrams Using All Letters (anagrams3.py)

2. Single-Word Anagrams Using All Letters (`anagrams1.py`)

3. Single-Word Anagrams Using a Subset of the Letters (`anagrams2.py`)

4. Multi-Word Anagrams Using All Letters (`anagrams3.py`)