You probably all know the Zebra Puzzle in one form or another (sometimes it is referred to as the Einstein’s Problem). Apart from the names and the many formulations, the puzzle has a well defined structure.

It starts with describing a system where usually 5 differently **coloured** houses (which are built in the same street, in a row) are inhabited by 5 humans of different **countries**. They each have a different **animal** and drink different **beverages**. Moreover, they all listen to different kind of **music**.

The problem then goes on by giving the unfortunate reader a list of relations between the above elements. “The englishman lives in the red house and its house is not the last nor the first of the row. He who owns the dog does not like Metal…” and so on. The final question is related to finding something that has never been mentioned in your data, like who owns the Zebra.

Let’s now see one possible formulation of this problem and how it is related to code optimization.

The green house is after the ivory house,

Who listens to Metal owns a snail,

Who listens to Rock lives next to the fox owner,

The coffee-drinker lives in the green house,

The Spanish lady has a dog,

The Norwegian businessman lives in the first house,

The Ukrainian student drinks tea,

The man who lives in the red house is from England,

Who listens to Country music drinks rum,

The Japanese bioinformatician likes Pop music,

The horse owner does not love Classical music, but his neighbour does,

The middle house hosts the human that drinks milk,

The Norwegian businessman lives next to the blue house,

From the yellow house, Classical music flows out every hour.

Who drinks water? Who owns the Zebra?

I now invite you to take 10 minutes of thinking to try and solve this puzzle by yourself (pen and paper, old style).

…

Done?

While it is possible to solve this problem with no more than logic (hard, but doable), a computer can solve the task easily if well programmed.

Think of this problem as finding the right label for each of the elements involved.

The set of labels can be anything ordered, as to reflect the ordering of the houses.

Assigning a numbered label to an element implies putting it into the house with the same number.

For example if we assign the number 1 to the red house, the englishman, the Metal music, the dog and the milk, we are restricting our selves to all the solutions that consider this set of elements in the first house, and so on. Then we need to find the set of elements to assign to the label 2 (second house) and so on up the the fourth house (the last one is trivial if all others are determined).

First of all, think about the universe of all possible states of the problem.

A state in this problem may be seen as 5 vectors (one for each category), each composed by 5 elements, with all possible numberings from 1 to 5.

nationality=[1,4,3,5,2] color=[5,2,3,4,1] animal=[4,1,2,3,5] drink=[5,4,3,2,1] music=[1,5,2,4,3]

where we must think that every position of the list represents one fixed feature of each category (e.g. in this possible state we may have decided that the third animal is the Zebra, so we are telling that the zebra is in the second house)

The number of different states is $5!^5$

(How much is that? Check for yourself)

An approach to solve this problem is to try all the possible states and, for each one, test if all the conditions are met, then extract the solution (the labels for “water” and “zebra”).

First we need to be able to formalize all the information contained in the problem.

Concepts to code:

- “Next to”
- “After”
- “First” and “Middle”

These are all pretty easy tasks: let’s define a function for the first two:

import itertools #If the first argument minus the second is exactly one, then return True. def after(right,left): return 1==(right-left) #if the difference between the two arguments is 1 (in any order), return True. def nextto(A,B): return 1==(A-B) or 1==(B-A)

Remember that a possible state involves each element to be translated into a number, so this formulation is perfectly fit for the problem.

Now for the puzzle:

def zebra_puzzle(): #definition of our set of labels, "first" and "middle" concepts are straightforward houses = first, _ , middle, _, _ = [1,2,3,4,5] #create all 5! permutations and send them to a list. orderings = list(itertools.permutations(houses)) #Create a list comprehension where ALL possible solutions are written. #A solution is the set of labels for WATER and ZEBRA #for the problem states that meet all the conditions solution=[(WATER, ZEBRA) for (red,green,ivory,yellow,blue) in orderings for (Englishman,Spaniard,Ukrainian,Japanese,Norwegian) in orderings for (dog,snails, fox, horse, ZEBRA) in orderings for (coffee, tea, milk, rum, WATER) in orderings for (Metal, Classical, Rock, Country, Pop) in orderings if imright(green,ivory) if Metal is snails if nextto(Rock, fox) if coffee is green if Spaniard is dog if Norwegian is first if Ukrainian is tea if Englishman is red if Country is rum if Japanese is Pop if nextto(Classical, horse) if milk is middle if nextto(Norwegian,blue) if Classical is yellow ] return solution import time t0=time.time() print zebra_puzzle() t1=time.time() print str(t1-t0) + "s"

The problem is solved with a list comprehension, but what about time? If we measure the time needed for a standard modern computer to pass through all the solutions it should take about 20 seconds (a standard computer should make about 1 billion operations per second).

*(Try to relate this number to the number of possible states you calculated earlier, roughly how much calculations can your computer do each second?)*

This solution takes 20 seconds because it actually tries **all** possible labelings. This is defined by the way the *for* loops and the *if* statement are nested. The first *if* is tested after the last for jumped in. At that time, all the labels are already assigned and the restraints are tested.

In other words we are creating a possible labeling scheme for all of the 25 items in the problem and **then** checking if this labeling has something wrong. Yet for most of the schemes we could tell earlier if they were right or not. If green and ivory are not labeled with two consecutive numbers, we don’t have to test all 5x5x5x5 possible labelings of the other items, and so on.

With list comprehensions we don’t even have to care about indentation, we just need to swap the conditions inside the comprehension definition.

def zebra_puzzle(): #definition of our set of labels, "first" and "middle" concepts are straightforward houses = first, _ , middle, _, _ = [1,2,3,4,5] #create all 5! permutations and send them to a list. orderings = list(itertools.permutations(houses)) #Create a list comprehension where ALL possible solutions are written. #A solution is the set of labels for WATER and ZEBRA #for the problem states that meet all the conditions solution=[(WATER, ZEBRA) for (red,green,ivory,yellow,blue) in orderings if after(green,ivory) # is green after ivory? for (Englishman,Spanish,Ukrainian,Japanese,Norwegian) in orderings if Englishman is red if Norwegian is first if nextto(Norwegian,blue) for (dog,snails, fox, horse, ZEBRA) in orderings if Spanish is dog for (coffee, tea, milk, rum, WATER) in orderings if milk is middle if coffee is green if Ukrainian is tea for (Metal, Classical, Rock, Country, Pop) in orderings if Metal is snails if Classical is yellow if nextto(Rock, fox) if nextto(Classical, horse) if Country is rum if Japanese is Pop ] return solution import time t0=time.time() print zebra_puzzle() t1=time.time() print str(t1-t0) + "s"

This takes about 0m0.036s on my computer.

It got faster by 500-fold from the previous solution. Think about this when your code will risk to take an year to execute.*Question: how many operations did the optimized code do (roughly)?*

The take-home message here is that you should always consider the size of your problem, and the rough amount of computation needed to get to your goal. Do not stop at the first solution, and ask yourself if the time the code will need is acceptable for the kind of issue addressed. If not, the code optimization is a field where you should look, yet most of the work can be anticipated by a carefully planned design.

Rule of thumb: $10^{12}$ operations => bad. $10^{6}$ operations => good.

## Code examples:

1-zebra slow (about 1 hour execution time – depends on your pc)

https://github.com/noise42/datastructures/blob/master/materials/tv19/misc_data/les3/zebraSlow.ipynb

2-zebra optimized (less than 1 sec)

https://github.com/noise42/datastructures/blob/master/materials/tv19/misc_data/les3/zebraOpt.ipynb