(Python) Transforming and Reducing Data at the Same Time
Let’s say you need to use a reduction function like sum()
or min()
in order to get calculate the sum of squares. A good way to do this would be to use a generator-expression argument for data reduction and transformation.
You would do something like this:
nums = [1, 2, 3, 4, 5]
s = sum(x * x for x in nums)
The above shows a syntactic aspect of generator expressions when supplied as the single argument to a function: YOU DO NOT NEED REPEATED PARENTHESES! For instance, these statements are the same:
s = sum((x * x for x in nums)) # pass generator-expr as argument
s = sum(x * x for x in nums) # more elegant syntax
If you didn’t use a generator expression, you would do something like this:
nums = [1, 2, 3, 4, 5]
s = sum([x * x for x in nums])
This works, but it introduces an extra step and creates an extra list. If nums was huge, you would end up creating a large temporary data structure to only be used once and discarded. The generator solution transforms the data iteratively and is therefore much more memory-efficient.
Some reduction functions such as min()
and max()
accept a key argument that might be useful in situations where you might be inclined to use a generator. Like so:
# original: returns 20
min_shares = min(s['shares'] for s in portfolio)# alternative: returns {'name': 'AOL', 'shares': 20}
min_shares = min(portfolio, key=lambda s: s['shares'])