The Zebra Puzzle

You probably all know the Zebra Puzzle in one form or another (sometimes it is referred to as the Einstein’s Problem). Apart from the names and the many formulations, the puzzle has a well defined structure.
It starts with describing a system where usually 5 differently coloured houses (which are built in the same street, in a row) are inhabited by 5 humans of different countries. They each have a different animal and drink different beverages. Moreover, they all listen to different kind of music.
The problem then goes on by giving the unfortunate reader a list of relations between the above elements. “The englishman lives in the red house and its house is not the last nor the first of the row. He who owns the dog does not like Metal…” and so on. The final question is related to finding something that has never been mentioned in your data, like who owns the Zebra.
Let’s now see one possible formulation of this problem and how it is related to code optimization.

The green house is after the ivory house,
Who listens to Metal owns a snail,
Who listens to Rock lives next to the fox owner,
The coffee-drinker lives in the green house,
The Spanish lady has a dog,
The Norwegian businessman lives in the first house,
The Ukrainian student drinks tea,
The man who lives in the red house is from England,
Who listens to Country music drinks rum,
The Japanese bioinformatician likes Pop music,
The horse owner does not love Classical music, but his neighbour does,
The middle house hosts the human that drinks milk,
The Norwegian businessman lives next to the blue house,
From the yellow house, Classical music flows out every hour.
Who drinks water? Who owns the Zebra?

I now invite you to take 10 minutes of thinking to try and solve this puzzle by yourself (pen and paper, old style).

Done?
While it is possible to solve this problem with no more than logic (hard, but doable), a computer can solve the task easily if well programmed.
Think of this problem as finding the right label for each of the elements involved.
The set of labels can be anything ordered, as to reflect the ordering of the houses.
Assigning a numbered label to an element implies putting it into the house with the same number.
For example if we assign the number 1 to the red house, the englishman, the Metal music, the dog and the milk, we are restricting our selves to all the solutions that consider this set of elements in the first house, and so on. Then we need to find the set of elements to assign to the label 2 (second house) and so on up the the fourth house (the last one is trivial if all others are determined).
First of all, think about the universe of all possible states of the problem.
A state in this problem may be seen as 5 vectors (one for each category), each composed by 5 elements, with all possible numberings from 1 to 5.

nationality=[1,4,3,5,2]
color=[5,2,3,4,1]
animal=[4,1,2,3,5]
drink=[5,4,3,2,1]
music=[1,5,2,4,3]

where we must think that every position of the list represents one fixed feature of each category (e.g. in this possible state we may have decided that the third animal is the Zebra, so we are telling that the zebra is in the second house)
The number of different states is $5!^5$
(How much is that? Check for yourself)
An approach to solve this problem is to try all the possible states and, for each one, test if all the conditions are met, then extract the solution (the labels for “water” and “zebra”).
First we need to be able to formalize all the information contained in the problem.
Concepts to code:

• “Next to”
• “After”
• “First” and “Middle”

These are all pretty easy tasks: let’s define a function for the first two:

import itertools
#If the first argument minus the second is exactly one, then return True.
def after(right,left):
return 1==(right-left)
#if the difference between the two arguments is 1 (in any order), return True.
def nextto(A,B):
return 1==(A-B) or 1==(B-A)

Remember that a possible state involves each element to be translated into a number, so this formulation is perfectly fit for the problem.
Now for the puzzle:

def zebra_puzzle():
#definition of our set of labels, "first" and "middle" concepts are straightforward
houses = first, _ , middle, _, _ = [1,2,3,4,5]
#create all 5! permutations and send them to a list.
orderings = list(itertools.permutations(houses))
#Create a list comprehension where ALL possible solutions are written.
#A solution is the set of labels for WATER and ZEBRA
#for the problem states that meet all the conditions
solution=[(WATER, ZEBRA)
for (red,green,ivory,yellow,blue) in orderings
for (Englishman,Spaniard,Ukrainian,Japanese,Norwegian) in orderings
for (dog,snails, fox, horse, ZEBRA) in orderings
for (coffee, tea, milk, rum, WATER) in orderings
for (Metal, Classical, Rock, Country, Pop) in orderings
if imright(green,ivory)
if Metal is snails
if nextto(Rock, fox)
if coffee is green
if Spaniard is dog
if Norwegian is first
if Ukrainian is tea
if Englishman is red
if Country is rum
if Japanese is Pop
if nextto(Classical, horse)
if milk is middle
if nextto(Norwegian,blue)
if Classical is yellow
]
return solution
import time
t0=time.time()
print zebra_puzzle()
t1=time.time()
print str(t1-t0) + "s"

The problem is solved with a list comprehension, but what about time? If we measure the time needed for a standard modern computer to pass through all the solutions it should take about 20 seconds (a standard computer should make about 1 billion operations per second).
(Try to relate this number to the number of possible states you calculated earlier, roughly how much calculations can your computer do each second?)
This solution takes 20 seconds because it actually tries all possible labelings. This is defined by the way the for loops and the if statement are nested. The first if is tested after the last for jumped in. At that time, all the labels are already assigned and the restraints are tested.
In other words we are creating a possible labeling scheme for all of the 25 items in the problem and then checking if this labeling has something wrong. Yet for most of the schemes we could tell earlier if they were right or not. If green and ivory are not labeled with two consecutive numbers, we don’t have to test all 5x5x5x5 possible labelings of the other items, and so on.
With list comprehensions we don’t even have to care about indentation, we just need to swap the conditions inside the comprehension definition.

def zebra_puzzle():
#definition of our set of labels, "first" and "middle" concepts are straightforward
houses = first, _ , middle, _, _ = [1,2,3,4,5]
#create all 5! permutations and send them to a list.
orderings = list(itertools.permutations(houses))
#Create a list comprehension where ALL possible solutions are written.
#A solution is the set of labels for WATER and ZEBRA
#for the problem states that meet all the conditions
solution=[(WATER, ZEBRA)
for (red,green,ivory,yellow,blue) in orderings
if after(green,ivory) # is green after ivory?
for (Englishman,Spanish,Ukrainian,Japanese,Norwegian) in orderings
if Englishman is red
if Norwegian is first
if nextto(Norwegian,blue)
for (dog,snails, fox, horse, ZEBRA) in orderings
if Spanish is dog
for (coffee, tea, milk, rum, WATER) in orderings
if milk is middle
if coffee is green
if Ukrainian is tea
for (Metal, Classical, Rock, Country, Pop) in orderings
if Metal is snails
if Classical is yellow
if nextto(Rock, fox)
if nextto(Classical, horse)
if Country is rum
if Japanese is Pop
]
return solution
import time
t0=time.time()
print zebra_puzzle()
t1=time.time()
print str(t1-t0) + "s"


This takes about 0m0.036s on my computer.
It got faster by 500-fold from the previous solution. Think about this when your code will risk to take an year to execute.
Question: how many operations did the optimized code do (roughly)?

The take-home message here is that you should always consider the size of your problem, and the rough amount of computation needed to get to your goal. Do not stop at the first solution, and ask yourself if the time the code will need is acceptable for the kind of issue addressed. If not, the code optimization is a field where you should look, yet most of the work can be anticipated by a carefully planned design.
Rule of thumb: $10^{12}$ operations => bad. $10^{6}$ operations => good.

Code examples:

1-zebra slow (about 1 hour execution time – depends on your pc)
https://github.com/noise42/datastructures/blob/master/materials/tv19/misc_data/les3/zebraSlow.ipynb
2-zebra optimized (less than 1 sec)
https://github.com/noise42/datastructures/blob/master/materials/tv19/misc_data/les3/zebraOpt.ipynb