Vectorized code

This page is inspired by Code Mistakes: Python's For Loop by G. Ryan Spain. I suggest you read it before continuing with the rest of this post.

Python has about a dozen built in functions that enable a "vectorized" style of coding. I say "style" because this is not true vectorization. Even so, learning how to use them will save you time and produce code that is faster and has fewer bugs.

The problem statement

Given some function (lets call it F) and a list of values
[1, 2, 3, 4, 5]
generate every sequential 3-tuple  
(1, 2, 3) (2, 3, 4) (3, 4, 5) 
and find the 3-tuple which produces the largest return value of F.

This requires generating all the 3-tuples, passing them one by one to the function, and keeping track of the results.

The imperative-style solution employs a "test and swap" strategy.  It requires a for loop, an if statement, and some variables to keep track of intermediate results

i = 0
for x in num_list:
    some_var = some_function(num_list[i:i+3])
    if some_var > some_other_var:
        some_other_var = some_var
        final_list = num_list[i:i+3]
    i += 1


This looks pefectly reasonable if you come from a C, Java, or Basic background. In fact, with minor syntax changes, this code would work in any of those languages.

But there are a few problems

  • the some_other_var variable isn't initialized
  • the slice [i:i+3] goes beyond the end of the list

So you see, this kind of low-level code is difficult to get right even though it's only 7 lines of code.


Using Python's higher level strategy

You can reduce this block of code to this one line

max( zip(num_list, num_list[1:], num_list[2:]), key=F )

if you understand how the zip and max functions work.

Generating the 3-tuples

You can use list slicing combined with the zip function to generate these groups.

Start with a list of numbers

a = [1, 2, 3, 4, 5]

and take 3 slices where each slice is offset by 1 from the previous slice

a[0:]
a[1:]
a[2:]

Now, if you line up the slices they form a triangular matrix and the columns form the groups you are looking for

[ 1, 2, 3, 4, 5 ]
[ 2, 3, 4, 5    ]    group 1
[ 3, 4, 5       ]

[ 1, 2, 3, 4, 5 ]
[ 2, 3, 4, 5    ]    group 2
[ 3, 4, 5       ]

[ 1, 2, 3, 4, 5 ]
[ 2, 3, 4, 5    ]    group 3
[ 3, 4, 5       ]

You can generate the groups by passing the slices to the zip function

groups = zip(a[0:], a[1:], a[2:])

The zip function automatically stops when it reaches the end of the shortest slice. This prevents the "off by 1" errors that are so common in this type of code.



Finding the maximum return value


The max function normally works by comparing values with the "<" operator, but you can override this by specifying a key function.

max(groups, key=F)

I have another post that explains the key parameter in more detail.


Besides zip and max, there are other built in functions that operate on entire lists

Take the time to study them.

No comments:

Post a Comment