Process Pooling and Ordered Outputs
Lately, I have been searching for meaningful or interesting projects to motivate me to do the following things:
- Write more code
- Write better code
- Learn a new programming language (right now, I work almost entirely in python, and can comfortably use Java, R, and MATLAB)
Problem description
Since I am feeling stuck with my data and machine learning projects, I want to stop looking at pandas
DataFrames for a while. To give myself a new bone to chew on, I picked up Al Sweigart's The Big Book of Small Python Projects. In one exercise, a Monte-Carlo simulation demonstrates the effect of the so-called Birthday Problem: as the number of people in a room increases, the probability that some people in that room share a birthday grows quickly, faster than you might think!
While the book's code is very clear, I want to pare the program down:
- Birthdays are now represented by integer values between 1 and 365 instead of
datetime
objects - The function that checks for collisions returns a boolean, rather than the first colliding birthday it finds (this part of the code accounted for a lot of the original program's time complexity, since it found collisions using nested
for
loops)
def getBirthdays(n):
""" n is an integer;
getBirthdays generates
n birthdays for n people,
returns a list of birthdays
of length n """
birthdays = [random.randint(
1,365) for i in range(n)]
return birthdays
def checkCollisions(dates):
""" checks a list of
integer values for duplicates
/ collisions """
if len(set(dates)) == len(dates):
return False
else:
return True
Since this is a Monte Carlo simulation, the process of generating birthdays → checking for collisions → reporting a collision has to be repeated on the order of 100,000 times. After running the code on a single core a few times and impatiently waiting for a few seconds, I remembered my current setup has 8 cores. Sounds like a job for multiprocessing.Pool
!
To use multiprocessing.Pool
's map
, I need a function and an iterable. My plan:
- Create a list of length 100,000 called
maplist
, where each item in the list is the number of birthdays we want to generate - Create a function which, given a number n, generates n birthdays → checks these birthdays for collisions → returns
0
or1
Here is the function:
def generateAndCheckBirthdays(n):
""" generates n dates, and checks
the dates for collisions """
birthdays = getBirthdays(n)
result =
checkCollisions(birthdays)
if result: return 1
else: return 0
I store these results in another list called results
, sum over the list, and see how many trials in the 100,000 produced any colliding birthdays.
$ python unattendedpartiesopt.py 30 8
Checking 30 random birthdays using
8 process(es) 100,000 times...
map 0.526920 seconds.
imap_unordered 0.000027 seconds.
imap_unordered 19732.625 times
faster than map
in 0.00507% of
the ordered time.
70550/100,000 trials had
matching birthdays.
Group of 30 has a 70.55% chance of
matching birthdays.
map
or imap_unordered
?
map
preserves ordering. This means that if I need a result to be accessible at the same index, map
will do this for me, with an increased time complexity cost. However, if I am not concerned with the order I receive results from map
, I can use Pool.imap_unordered
.
In the case of this Birthday Problem, I do not care about the ordering - if a result generated using data at maplist[i]
is accessible at results[j]
, this is not a big deal since all members of maplist
are the same value. The reason imap_unordered
is faster is because each process does not have to wait for the previous one to finish in order to begin its execution. The question is… how much faster?
Setting up the simulations
I wanted to see how much faster imap_unordered
worked than map
; keeping in mind that I am just tracking how fast this program is executed, I added some pieces to the code for consistency across different CPUs and number of birthdays generated:
- setting python's
random.seed(23)
to generate the same pseudo-random birthdays every time - importing
sys
soNUM_BDAYS
andNUM_PROCESSES
can be passed in as command line arguments viasys.argv[1]
andsys.argv[2]
, respectively
I have two machines to run the program on:
- A laptop with a 4 core / 8 thread i7-8550U CPU
- A desktop with a 16 core / 32 thread Ryzen 9 5950x CPU
There is a world of differences between these two chips, so the only meaningful data we can get out of this experiment is how much faster the Ryzen chip runs this program than the i7 with the same parameters.
The results are grouped by NUM_PROCESSES
(1, 8, 32); each row displays a trial's NUM_BDAYS
generated, and how fast it ran on the:
- i7 using
map
- Ryzen using
map
- i7 using
imap_unordered
(imap_un
) - Ryzen using
imap_unordered
(imap_un
)
NUM_PROCESSES | NUM_BDAYS | i7 map | Ryzen map | i7 imap_un | Ryzen imap_un |
---|---|---|---|---|---|
1 | 30 | 1.850070 | 1.884730 | 0.000026 | 0.000019 |
70 | 4.160560 | 4.421220 | 0.000030 | 0.000020 | |
700 | 1.850070 | 1.884730 | 0.000026 | 0.000019 | |
8 | 30 | 0.538490 | 0.252430 | 0.000032 | 0.000024 |
70 | 1.233160 | 0.560390 | 0.000034 | 0.000023 | |
700 | 13.265560 | 5.698570 | 0.000035 | 0.000031 | |
32 | 30 | / | 0.138530 | / | 0.000026 |
70 | / | 0.303690 | / | 0.000021 | |
700 | / | 3.081280 | / | 0.000026 |
Even when I try to get away from data, there is no escape!
Clearly imap_unordered
, pardon my French, whips ass, and so does that Ryzen 9 5950x.
Seeing the huge time delay between map
and imap_unordered
got me thinking about using imap_unordered
with a results dictionary, which lead me down the stack hole to multiprocessing.Manager
objects and the itertools
library. While I dig myself out, feel free to check out the script from this post unattendedpartiesopt.py
here on my github. Perhaps I will have enough material to write a post on itertools
by the time I emerge!
Until next time!