# Handy Parallel(Distributed) Computing in Python

Liang Bo Wang (亮亮), 2014-05-17

## Handy Parallel(Distributed) Computing in Python

Esc to overview
to navigate

• Master student at
Bioinfo & Biostat Core Lab, NTU CGM
• Learn to speak DNA
• R / Python
• Taiwan R co-organizer
• PyCon APAC 2014/TW staff
PyCon APAC web content

## TOC

• multiprocessing: batteries included standard modules
• IPython Parallel: debuggable and interactive
• Scenarios
• Summary

## Aweful feelings ...

Engineers' time is more valuable than computing, but many algorithms is hard to make it parallel.

## That's exactly what I want

Not hard if your tasks are independent. Here we focus on the easiest pattern.

## Prime number validation

PRIMES = [int(l) for l in open('prime_list.txt')]
def is_prime(n):
if n % 2 == 0:
return False
sqrt_n = int(math.floor(math.sqrt(n)))
for i in range(3, sqrt_n + 1, 2):
if n % i == 0:
return False
return True

## Straightforward way

Validate them one-by-one and benchmark by timeit

def orig_way():
ans = [is_prime(prime) for prime in PRIMES]
return ans

from timeit import timeit
timeit('orig_way()', 'from __main__ import orig_way',
number=3)  # return total time of all runs

## Threads and Process in Python

Multithread, threading, does not work as expected because of GIL.

(From David Beazley's Understanding GIL)

## Multiprocess easy and elegant in Python

>>> from validate_prime import is_prime, PRIMES
>>> from multiprocessing import Pool
>>> p = Pool(2) # max concurrent processes
>>> p.map(is_prime, PRIMES[:4])
[True, True, True, True]
>>> # Ctrl-C / Ctrl-Z
Traceback x N ... endless
BOOOOOOOOM!! # NEVER run this interactively

## Use concurrent.future (Py 3.2+)

Still not safe in interactive mode. Provide some async APIs, which immediately returns a Future object.

from concurrent.futures import ProcessPoolExecutor
def use_process():
with ProcessPoolExecutor(4) as executor:
all_ans = executor.map(is_prime, PRIMES)
return all_ans  # block at leaving with block
use_process()
executor = ProcessPoolExecutor()
# every submit returns a future, nonblocking
all_future_ans = [executor.submit(is_prime, p)
for p in PRIMES[:6]]

while not all(fu.done() for fu in all_future_ans):
print('do sth else, waiting')
sleep(1)

# get result from done futures
print([f.result() for f in future_ans])
$python3 demo_async.py do sth else, waiting do sth else, waiting do sth else, waiting do sth else, waiting [True, True, True, True, True, False] More on the asyncio module in Py 3.4, which makes you escape the callback hell through coroutine(generator) pattern ... like magic. See TP's talk on day 1 Yielding a Tulip (slide). ## concurrent.future for thread pooling # replace ProcessPoolExecutor with ThreadPoolExecutor def use_thread(): with ThreadPoolExecutor(4) as executor: ans = executor.map(is_prime, PRIMES) return ans  You don't gain computation boost by this. ## Demo codes Ways to speed up using standard library so far: Demo codes are under /demo_code/builtin-cpuheavy ## Benchmark on a 12 core machine > python3 benchmark.py # Minimum time average of 3 in 3 repeats Straight: 42.10 Multithread: 39.47 Multiprocess: 7.98  You don't get a 12x speed up because of forking process overhead ## Summary for built-in modules • Easy to write • High level APIs available • Fail in interactive mode Bad because that's the way we do data mangling • Limited Scalability Simply don't scale, and it is hard to find a 100 core server • Use multithread for IO bound; multiprocess for CPU bound tasks ## Introducing IPython Parallel ## Features of IPython Parallel • Get all the goodness of IPython • Both interactive and parallel. Highly useful in data analysis • Scale across multiple machines • Support MPI, PBS, and SGE common schemes Simply, awesome ## Setup Trivial if you setup your Python correctly python3 -m ensurepip # or apt-get install python3-pip # or python3 ez_setup.py from setuptools pip3 install ipython[all]  Windows? (... get a *unix) or search Unofficial Windows Binaries for Python ## Overview In[1]: from IPython.parallel import Client rc = Client() # connect controller In[2]: dview = rc[:] # return a View ## Start IPython Cluster $ ipcluster3 start
# [IPClusterStart] Using existing profile dir: '.../profile_default'
# [IPClusterStart] Starting ipcluster with [daemon=False]
# ... (automatically set up controller and engines locally)

Or use IPython Notebook to start

In [1]: from IPython.parallel import Client
rc = Client()
In [2]: rc.ids
Out[2]: [0, 1, 2, 3]
In [3]: dview = rc[:]  # DirectView use all engines
In [4]: with dview.sync_imports():
...:     import math  # import on all engines
In [5]: ar = dview.map_sync(is_prime, PRIMES[:8]) # block
In [6]: ar.get()
Out[6]: [True, True, True, True, True, False, True, True]

In [1]: %%px  # run on all engines by ipython magic
...: import numpy as np
...: rand_n = np.random.randint(10, 2)
In [2]: dview['talk'] = 'MLDM Monday'   # spread variables
In [3]: dview.scatter('a', list(range(8)))
...: dview['a']   # and collect them back
Out[3]: [[0, 1], [2, 3], [4, 5], [6, 7]]
In [4]: dview.gather('a')
Out[4]: [0, 1, 2, 3, 4, 5, 6, 7]


## More in demo code

Or this online IPython notebook (hosted by Notebook Viewer)

You can use ipython notebook and actually run it.

ipython3 notebook
# Luanch IPython notebook server
# on http://localhost:8888/
# ... (terminated by Ctrl-C/Ctrl-Z)

## Summary for IPy Parallel

• Awesome.
• Provides more scalability (cross machine) and flexbility (interactive) through network connections
• Passing object may bring up issues like pickling, but IPython handles most of them correctly.

## Celery Intro

• Messages first produced by starting a new task by producer
• Messages then passed to Celery Worker (consumers, can be many)
• RabbitMQ deals with the message queue;

## Overview

From Abhishek Tiwari,
AMQP, RabbitMQ and Celery - A Visual Guide For Dummies

       # Application under tasks.py
def is_prime(n):
# ...
# run Celery workers: celery -A tasks worker

result = is_prime.delay(PRIME)
print(result.get())

## Summary

Builtin IPy Parallel Celery
High Level APIs
Interactive mode
Scale across machines

✔ = trivial; △ = some lines of code; ✕ = plenty lines of code
They can work as partners : )

## How is our naive parallelism?

• Easy to use; clean APIs
• Can come up the parallel strategy by try-n-error
• Should boost the computation by some factor
• Heavy jobs (that run fews day) would have more significant performance improvement

But that comes with price

• Copy objects a lot → use database
• Memory usage → buy memory XD
• Unaccepted overhead sometimes → avoid this pattern

## Take-home message

• Put these tools under your pillow
• multiprocessing, concurrent
• IPython Parallel
• Celery
• Python 3.4 and asyncio (if tasks are IO bound)
• Parallelism is hard but keep calm, there are simple situations

## Thank You!

Fork me on Github