Skip to main content
Python

What does yield do in Python, and what happens when I call the function?

7 mins

A worker standing beside a conveyor belt that keeps producing boxes one at a time labeled yield, looking relaxed while another worker is buried under a huge pile labeled “List”.

What is a generator? #

You may have come to this site after encountering the yield keyword in Python and wondering what it does.

The yield keyword is used in generator functions in Python. A generator is a special type of iterator that allows you to iterate over a sequence of values without storing the entire sequence in memory at once.

Instead of returning a single value, a generator can yield multiple values one at a time, which makes it more memory-efficient, especially when dealing with large datasets or infinite sequences.

When you call a generator function, it returns a generator object that can be iterated over.

  • the function body does not run immediately
  • each yield returns a value to the caller
  • the function’s state is saved between yields
  • subsequent calls resume execution from the last yield statement

Example of a generator function #

# A simple generator function that yields numbers from 1 to n
def count_up_to(n):
    i = 1
    while i <= n:
        yield i
        i += 1

# Using next() to manually iterate through the generator
counter = count_up_to(3)
print(next(counter))  # Output: 1
print(next(counter))  # Output: 2
print(next(counter))  # Output: 3

# Using the generator in a for loop
for number in count_up_to(3):
    print(number)

Let’s break down what happens between the caller and the generator when we call count_up_to(3):

sequenceDiagram participant Caller participant Generator Caller->>Generator: create count_up_to(3) Note right of Generator: Generator object created
Function not executed yet Caller->>Generator: next() Generator->>Generator: i = 1 Generator-->>Caller: yield 1 Note right of Generator: Execution pauses
State saved Caller->>Generator: next() Generator->>Generator: resume after yield
i += 1 → i = 2 Generator-->>Caller: yield 2 Note right of Generator: Pauses again Caller->>Generator: next() Generator->>Generator: resume
i += 1 → i = 3 Generator-->>Caller: yield 3 Caller->>Generator: next() Generator-->>Caller: StopIteration

Why use generators? #

There are several reasons to use generators in Python:

  1. Memory Efficiency: Generators do not store the entire sequence in memory. They generate values on the fly, which is especially beneficial when working with large datasets or infinite sequences.
  2. Lazy Evaluation: Generators produce values one at a time, which allows for lazy evaluation. This means that values are only computed when needed, which can improve performance and reduce memory usage.
  3. Pipelining: Generators can be used to create pipelines of data processing. You can chain multiple generators together to process data in stages without needing to store intermediate results in memory

Let’s write the same example using a list instead of a generator to see the difference:

# A function that returns a list of numbers from 1 to n
def count_up_to(n):
    numbers = []
    i = 1
    while i <= n:
        numbers.append(i)
        i += 1
    # numbers is a list that contains all the values from 1 to n
    return numbers


# Using the function 
numbers = count_up_to(3)
for number in numbers:
    print(number)

Generators are more memory-efficient

In the non-generator version, the entire list of numbers from 1 to n is created and stored in memory before we start iterating over it. This means that if n is large, we could end up using a lot of memory to store the list.

For small values of n, this is not a problem, but larger values of n can lead to high memory usage. In contrast, the generator version only holds one value in memory at a time, making it more efficient for larger datasets, especially if multiple larger datasets are being processed in a pipeline.

Generators pause and resume

Many beginners think that a generator runs in the background and produces values asynchronously, due to the similarity to how asynchronous functions work.

A generator only produces values when the caller requests them (e.g., through next() or a loop). The generator function is paused at the yield statement and resumes execution from that point when the next value is requested.

What happens when a generator is exhausted? #

When a generator has no more values to yield, it raises a StopIteration exception to signal that the iteration is complete.

With next() #

After yielding all values, if you call next() again on the generator, it will raise a StopIteration exception.

counter = count_up_to(3)
print(next(counter))  # Output: 1
print(next(counter))  # Output: 2
print(next(counter))  # Output: 3
# Raises StopIteration, as there are no more values to yield
print(next(counter))  

Handling the StopIteration exception can be done using a try-except block:

counter = count_up_to(3)
try:
    while True:
        print(next(counter)) 
except StopIteration:
    print("No more values to yield.")

With a for loop #

When using a for loop to iterate over a generator, the loop automatically handles the StopIteration exception. Once the generator is exhausted, the loop simply terminates without raising an error.

for number in count_up_to(3):
    print(number) # Automatically stops after yielding 3, no error raised

If there were no yield #

Let’s assume that the yield keyword did not exist in Python. To implement similar functionality, a class with an __iter__ method and a __next__ method would be needed to create an iterator that behaves like a generator.

class CountUpTo:
    def __init__(self, n):
        self.n = n
        self.i = 1

    def __iter__(self):
        return self

    def __next__(self):
        if self.i <= self.n:
            value = self.i
            self.i += 1
            return value
        else:
            # no more values to yield, raise StopIteration to signal the end of iteration
            raise StopIteration

# Using the iterator
counter = CountUpTo(3)
for number in counter:
    print(number)

The class illustrates what a generator does under the hood. There is no storage of the entire sequence in memory; only the current state of the iteration is maintained, just like with a generator.

Common pitfalls to avoid when using generators #

Cannot reuse an exhausted generator #

A common mistake is to try to iterate over a generator multiple times. Once a generator has been exhausted (i.e., all values have been yielded), it cannot be reset or reused. Attempting to iterate over it again will result in no output.

counter = count_up_to(3)
for number in counter:
    print(number) # Output: 1, 2, 3

for number in counter:
    print(number) # Output: (nothing, generator is exhausted)

Confusing yield with return #

Another common mistake is to confuse yield with return. The yield keyword allows a function to return a value and pause its execution, while return exits the function entirely. If you use return instead of yield, the function will not behave as a generator and will not produce an iterable sequence.

def count_up_to(n):
    i = 1
    while i <= n:
        return i  # This will exit the function immediately, not yielding values
        i += 1  

counter = count_up_to(3)
print(counter)  # Output: 1 (only the first value is returned, not a generator)

No random access #

Generators do not support random access like lists. You cannot index into a generator or slice it.

If you need to access specific elements, you would need to convert the generator to a list first, which can negate the memory efficiency benefits.

counter = count_up_to(3)
print(counter[0])  # This will raise a TypeError, as generators do not support indexing

Convert to a list.

counter = count_up_to(3)
counter_list = list(counter)  # Converting to a list to access elements
print(counter_list[0])  # Output: 1

Note that once converted to a list, the generator is exhausted, and cannot be reused.

Cannot use len() on a generator #

As with random access, the generator is not a list, so you cannot use len() to get the number of items it will yield.

If you need to know the length, you would have to convert it to a list first, which again can negate the memory efficiency benefits.

counter = count_up_to(3)
print(len(counter))  # This will raise a TypeError, as generators do not have a length

Use len() on a list instead.

counter_list = list(count_up_to(3))  # Converting to a list to get the length
print(len(counter_list))  # Output: 3