Python: Itertools and why you should use it.
I’m sure most Python programmers are at least vaguely familiar with
itertools. Most likely in the form of itertools.count
and
itertools.product
, which are probably sitting somewhere in your code.
What you may not be as familiar with (at least, I wasn’t until I bothered to read the full docs today) are some of the other convenient iterators and generators. I’ll cover just a few here.
Review: count and product
count
and product
are the two functions from itertools
that I see used
the most, although I do still see (and am guilty of writing) code that uses
more lines to do the same things. A comparison of count
and product
to the
naïve code that forms the same logic should suffice.
count
A fairly common pattern seen all over the case, implemented in dozens of programming languages, is something like this:
i = start_value
while True:
do_something_with(i)
if we_need_to_exit():
break
i += step_size
This is incredibly common when you’re unsure what the last value you’re going to need to operate on is. Unfortunately, it’s somewhat uglier in Python than the same pattern in C. (It’s also fairly ugly in Pascal since you also have to use a while loop, and it’s also pretty in other C-like languages.)
In C, we can write it as:
for (i = start_value; we_need_to_exit(); i += step_size){
do_something_with(i);
}
This is possible because the loop condition in C is explicitly stated, and can be any expression. In truth, Python doesn’t actually have a real for loop. The for loop in Python is really a foreach loop.
itertools.count
, however can help us improve our Python code. It’s still
not quite as concise as the C version, but it’s getting there:
from itertools import count
for i in count(start_value, step_size):
do_something_with(i)
if we_need_to_exit():
break
This version has a few advantages:
- It makes the fact that you’re simply counting more obvious
i
falls out of scope when you exit the loop- It’s shorter, which makes a bug less likely to hide in it.
product
product
is less obviously advantageous than count
, unless you (or your
code reviewer) shares Linus Torvalds’s belief that more than 3 levels of
indentation is a sign of bad code.
In practical terms, product
is essentially a certain class of nested for loop.
What was:
for i in range(10):
for j in range(10):
do_something_with(i, j)
becomes:
from itertools import product
for i, j in product(range(10), repeat=2):
do_something_with(i, j)
or if your i and j aren’t iterating over the same thing:
for i in list_of_i_values:
for j in list_of_j_values:
do_something_with(i, j)
can become:
from itertools import product
for i, j in product(list_of_i_values, list_of_j_values):
do_something_with(i, j)
Other nested for loops
That’s all well and good, but there are plenty of other uses of nested for
loops that aren’t doable with product
. So what are they?
permutations and combinations
These are the permutations you’re probably expecting if you remember anything about statistics at all, you probably know what to expect.
In practical terms, permutations
is equivalent to product
with all the
duplicates removed. A very general version might be:
from itertools import product
for working_values in product(values, repeat=n):
if len(set(working_values)) < len(working_values):
continue
do_something_with(*working_values)
Any time you’re deduplicating product
in a similar way, you may as well skip
it and use permutations
instead.
combinations
is the same as the combinations you’ll see in statistics, too.
A hand of cards in bridge is a 13 card long combination of a deck, for example.
But lots of algorithms with running time ~n^2
can use combinations, too:
for i in range(start, stop):
for j in range(i + 1, stop):
do_something_with(i, j)
it’s fairly easy to understand, and if you’re working with pairs of points in a space, it looks alright to do:
for i, a in enumerate(points):
for b in points[i+1:]:
do_something_with(a, b)
combinations
looks even better, though:
from itertools import combinations
for my_points in combinations(points, 2):
do_something_with(*my_points)
Isn’t that so much prettier? Especially since you now can (but needn’t) have an iterable containing the combination, instead of 2, 3, 5, or 13 different variables.
combinations_with_replacement
As a quick side note, combinations_with_replacement
may be one of the
least-known functions in itertools
. The biggest reason for this is its
relative newness. It was only added in versions 2.7 and 3.1, so if you need to
target versions of Python that are more than five years old, you can’t use it.
The nested for loop of:
for i, a in enumerate(points):
for b in points[i:]:
do_something_with(a, b)
can become:
from itertools import combinations_with_replacement as combine
for a, b in combine(points, 2):
do_something_with(a, b)
If ordering doesn’t matter, this particular nested loop can become:
from itertools import combinations
for a, b in combinations(points, 2):
do_something_with(a, b)
for a in points:
do_something_with(a, a)
Although at that point, I’d probably just use the nested for loop.
Final thoughts
itertools is a very powerful library in Python for looping, and can be used
in a huge variety of ways. With product
, it can even be used to make nested
loops of unknown depth in just a couple of lines.
Know it. Use it. Love it.
Or at least know it exists for now. Once you know about it, you’ll probably find great ways to use it later.