One thing that many find very confusing when they first learn about generators is that once they’re exhausted they’re, erm, exhausted.
Eh?!
Let’s see what this means with an example. Let’s start with a list, first:
numbers = [item for item in range(10)]
# Loop through numbers
for number in numbers:
print(number)
# And let's loop one more time…
for number in numbers:
print(number)
We’ve used a list comprehension to create a list of numbers–it’s not the most exciting list you’ll see, but it will do here
You’re them looping through the list twice, printing out the values each time
Here’s the output from this code:
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
I did say it’s not an exciting list. I know!
The numbers are printed out twice. Of course they are, since you’ve repeated the loop twice
Let’s see what happens with a generator instead of a list:
# Notice that we've now made this a generator
numbers = (item for item in range(10))
# Loop through numbers
for number in numbers:
print(number)
# And let's loop one more time…
for number in numbers:
print(number)
Notice how we’re now using parentheses (round brackets) instead of square brackets when creating numbers
This creates a generator
If you print(type(numbers))
you’ll get: <class 'generator'>
and if you print(numbers)
:<generator object <genexpr> at 0x103309ff0>
So, what’s the output from the two for
loops?
Let’s find out:
0
1
2
3
4
5
6
7
8
9
The numbers are only printed out once. The generator was “used up” when you looped through it the first time, so you can’t use it again
Here’s why this happens (very abridged version):
A generator doesn’t store the data within it. Instead, it refers to data which is stored or created elsewhere
So, when you try and fetch the first item in numbers
, the generator fetches 0
and it now “knows” it’s taken the first value…
So the next time you need a value from numbers
, it will get the second one, and so on…
Once it fetches the last item, there’s nothing left
The generator is exhausted
When you try to fetch another item, there’s nothing there
Think of a generator as single-use
@brwillems A big part of the answer is memory management. A generator doesn’t store all the values in memory, it fetches them or creates them as and when they’re required.
Here are two scenarios: you’re reading data from a large data set stored outside of your program (let’s say a large CSV). A generator allows you to represent the “whole data set” in your program, but only needs memory for one item at a time.
Or, you have a large set of data stored in memory, but want to get lots of subsets of it - all the names starting with P in a large list of names, all those 5 letters long, and so on. Each subset can be a generator which uses the same data set (the original one) without duplicating memory