A Guessing Game

Lately I’ve been thinking about the best way of playing games, in particular, how one could write an algorithm to perform best in a game.  In the aptly-named “guessing game”, a person chooses a word from a predefined list, and the computer than attempts to guess this word using the fewest guesses possible.

Initially, I thought that the best way of tackling this would be to find a measure of how valuable each guess would be in reducing the number of words: a measure that would be dependent on firstly the entropy of the guess (how much each guess tells you about the word that has been chosen), but also the probability of this guess actually telling you anything.

Thinking about the probability of the guess telling you anything made me realise that however incorrect guesses could be just as useful as correct guesses.  Therefore, perhaps the best guess would be a guess that divides the word possibility space into two groups of equal size.  Consequently, no matter the result, the guess halves the possibility space.

I was feeling sarcastic, so I chose a list of motivating words (thanks dictionary-thesaurus.com!) as my predefined word collection.  Here is what I came up with in Python:

import string

def find_most_useful_guess(words):
    aim_occ = len(words)/2 # The ideal number of occurrences that would split the set of words in half
    best_guess, best_occ_diff = '', 26
    for ch in string.ascii_lowercase:
        occurences = sum(1 for word in words if ch in word)
        if (abs(aim_occ - occurences) < best_occ_diff):
            best_guess = ch
            best_occ_diff = abs(aim_occ - occurences)
    return best_guess

words = []
with open("motivating_words.txt", "r") as f:
    words = [line.strip().lower() for line in f]

while len(words) > 1:
    guess = find_most_useful_guess(words)
    in_word = input("Is " + guess + " in the word? [y] ") == "y"
    words = [w for w in words if guess in w] if in_word else [w for w in words if guess not in w]

print("I think your word was \"" + words[0] + "\".")

The program worked in limited testing using the set of motivating words. A potential improvement could be dealing with anagrams, which will currently leave the program in an infinite guessing loop.

Ideally, the program will produce a solution in \( \log_2 n \) guesses, as in an ideal scenario, each guess reduces the input size by half.