Notes on python’s iteration and generator.

Iteration is like a stream of values.

Iteration that has no limit

import itertools
for i in itertools.count():
    #  0,1,2,... infinite

l = chain(repeat(17,3),cycle(range(4)))  #ex from youtube Ned Batchelder
for num in l:
    # 17,17,17,0,1,2,3,0,1,2,3,... forever

Other iteration

# youtube Ned Batchelder

new_list = list (iterable)      # create list from iterable.

results= [f(x) for x in iterable ]  # perform f(x) on all x 

total = sum(iterable)

largest= max(iterable)

combined = "".join(iterable) 

Get index value of iterables. Anytime you see range(len(x)), it should be re-written using enumerate().

# BAD !!!
for i in range(len(my_list))::
    print i,v

# BAD 2
for v in iterable:
    print i,v

# GOOD !!!
for i,v in enumerate(my_list):
    print i,v

for i,name in enumerate (['apple','banana','cherry']):
    #  [  (0,'apple'),   (1,'banana'), (2,'cherry') ]

for i,name in enumerate(my_list, start=5):
    # starts from 5th element???

Use zip to make iterable pair-wise loops. pair of streams becomes a stream of pairs.

zip (x,y) ==> combines into single list of tuples.


>>> zip(x,y)
[(1, 'a'), (2, 'b'), (3, 'c')]

zip (*x) ==> unzips single list of tuples into multiple list

>>> zip(*l)
[(0, 1, 2), ('a', 'b', 'c')]


age_list= [15,34,65]

for name,age in zip(names, age_list):
    print "%s: %s" % (name,age)

#use dict (iterables) to create dict from zip

d=dict (zip(names,age_list))

print max(d.values())  : #==> #  "65"  in  py3, get oldest age
print max(d.itervalues()) : #  in py2

print max(d.items(), key=lambda b: b[1]) #=> ("carl",65)
    #use iteritems() in py2

print max(d,key=d.get)   # get oldest person's name
    # "carl"


Generators returns stream (creates an iterable), while function return only 1 value

Simple example:

def hello_world():
    yield "hello"
    yield "world"

for x in hello_world():
    print x

Example: given a list, return stream of even numbers

def evens(stream):
    # lazy, so stream is evaluated ONLY when it needs it. Therefore, it is very efficient.
    for n in stream:
        if n%2==0:
            yield n

for n in evens ([1,2,3,5,6,86,32]):

Example: generator that open a config python file, and return only the lines that should be read, and skip comments and empty lines.

def non_comment_lines(f):
    for line in f:
        line =line.strip()
        if line.startswith('#'):
            continue    #skip commented lines
        if not line:
            continue    #skip blank lines
        yield line

def open("myfile.ini") as f:
    for line in non_comment_lines(f):

# reuse generator
def open("2nd.ini") as f:
    for line in non_comment_lines(f):


  1. simplify testing. can use list [] instead of files to test it.
  2. abstract certain functionality, that can be reused

Example: make a double loop single

# OK solution
# stream of 2d coordinates
def range_2d(width,height):
    for y in range(height):     #xrange() in py2
        for x in range(width):
            yield x,y

for col,row in range_2d(width,height):
    value= spreadsheet.get_value(col,row)
    if this_is_my_value(value):

# BETTER solution
for cell in spreadsheet.cells():
    if this_is_my_value(value):

Low-level iteration

  • iterable: produce an iterator (think of book pages in book)
  • iterator: produces a stream of values (Think of bookmark in book)


    iterator = iter(iterable)   # calls iterable.__iter__()
    value = next(iterator)    # calls or .__next__()
    value = next (iterator)         

next() is the ONLY operation allowed on iterator!!!

  • next() advances the iterator (advances the bookmark ) once.
  • next() will raise an exception if there’s no value.


with open('file.txt') as f:
    header=next(f)      # read 1st line 

    for line in f:      # read 2nd line to ... end

Make object iterable by defining __iter__(self) and returning iter(), of usually a list.

class MyObj ():
    def __init__(self):
        self.mylist = []
    def __iter__(self):
        return iter(self.mylist)

myobj= MyObj()
for i in myobj():
    # ...

__iter__ is a good place to put generators

def __iter__(self):
    for i in self.mylist:
        if not i.myflag:
            yield i

def get_all(self):
    return iter(self.mylist)

def done(self):
    return (t for t in self.mylist if mylist.done)
    #generator in one line

Modifying the list of iterator will cause side effects. If element needs to be deleted, it should be deleted backward. Dictionary will raise exception if it is modified.