Tuesday, September 17, 2019

Python Threading and a little Pypy

Multithreading in Python - useful for io...

Python is really easy to pick up and learn. It comes with almost any Linux distribution. Once you know what you're doing, it's quick to put together an application. That said, it's slow. If you're hitting databases, elasticsearch, http pulls, running external programs etc - then perhaps that doesn't matter - but if you're doing cpu bound operations, it's slowwwww. 

There are useful ways to speed up processing. The first to consider is probably pypy. This is often quick to implement (no code change) - though don't do it in production unless your company is fine with it :P. The next step - parallelism.

So let's talk threading. Python has something called the GIL - Global Interpreter Lock. It is there to protect non-thread safe threads, and it makes it simpler to write the interpreter. More here.

In Python threading works fine for IO bound threads. It does not work for CPU bound threads because of the implemented GIL. e.g. lets say you write a simple Fibonacci function which if done recursively can be heavy on CPU. If I wanted to handle generating nth term on a list, multithreading in python wont help get this done faster. Multiprocessing can though, but we're looking at threading here.

Here's what it looks like running a cpu heavy function in series:

import time

def fib(n):
    if n<2:return 1
    return fib(n-1)+fib(n-2)

def fiblist(ns):
    na=[]
    for n in ns:
        na.append(fib(n))
    return na

def timeathing(f,v):
    tstart=time.time()
    f(v)
    tfinsh=time.time()
    print("timetaken={}s".format(tfinsh-tstart))

timeathing(fiblist,[30,31,29,30,31,29])

So fib will find the n'th term of Fibonacci. Fiblist calls fib for each element in the list, and does this in a loop - one after the other. "timeathing" simply outputs the time a function took.
So dropping this in python 3 the output is:
timetaken=2.3240543842315674s

So lets try threading.

Again using fibonacci - an area where we do NOT expect any improvement.


import time
import threading

def fib(n):
    if n<2:return 1
    return fib(n-1)+fib(n-2)

def fiblist(ns):
    na=[]
    for n in ns:
        na.append(fib(n))
    return na

threadoutput=[]
def threadhelper(func,arg):
    threadoutput.append((arg,func(arg)))

def fibthreading(tasklist):
    #start a bunch of threads and save handles to a list
    threadlist=[]
    for task in tasklist:
        threadlist.append(threading.Thread(target=threadhelper,args=(fib,task,)))
        threadlist[-1].start()
    #poll list of threads until all are done
    notdone=True
    while notdone:
        notdone=False
        for task in threadlist:
            if task.isAlive():
                notdone=True
                time.sleep(0.001)

def timeathing(f,v):
    tstart=time.time()
    f(v)
    tfinsh=time.time()
    print("timetaken={}s".format(tfinsh-tstart))

timeathing(fiblist,[30,31,29,30,31,29])
timeathing(fibthreading,[30,31,29,30,31,29])

print(threadoutput)

Running this we get the following :

timetaken=2.200274705886841s
timetaken=2.2959671020507812s
[(29, 832040), (29, 832040), (30, 1346269), (30, 1346269), (31, 2178309), (31, 2178309)]



So we see no time improvement - as expected really. But threading can help if our function was IO based - e.g. calling external commands, polling web pages etc. The output is shown to show this works.



Lets try that.
So, here I'll pull weather data sequentially, then in parallel using threads. 

import time
import threading
import requests

def urlpull(url):
    r=requests.get(url)
    return r.text

def urllist(ns):
    na=[]
    for n in ns:
        na.append((n,urlpull(n)))
    return na

threadoutput=[]
def threadhelper(func,arg):
    threadoutput.append((arg,func(arg)))

def urlthreading(tasklist):
    #start a bunch of threads and save handles to a list
    threadlist=[]
    for task in tasklist:
        threadlist.append(threading.Thread(target=threadhelper,args=(urlpull,task,)))
        threadlist[-1].start()
    #poll list of threads until all are done
    notdone=True
    while notdone:
        notdone=False
        for task in threadlist:
            if task.isAlive():
                notdone=True
                time.sleep(0.001)

def timeathing(f,v):
    tstart=time.time()
    f(v)
    tfinsh=time.time()
    print("timetaken={}s".format(tfinsh-tstart))

urls=[]
url="https://www.wrh.noaa.gov/mesowest/getobext.php?sid={}&table=1&num=168&banner=off"
for a in ["KNYC","KHTO","KFRG","KISP","KJFK"]:
    urls.append(url.format(a))

timeathing(urllist,urls)
timeathing(urlthreading,urls)

Running this gives the following:

timetaken=2.074526071548462s
timetaken=0.4535403251647949s



Sequential time is the first. Threaded time to pull 5 urls is the second. That's pretty huge time savings. Printing "threadoutput" shows the data successfully acquired. Of course you'll probably want to limit the number of threads actually alive - that's not difficult, just a couple more loops really.

PS: You can have a cpubound function called in a subprocess, and threading WILL actually help - because you're no longer actually threading, you're calling external processes, but having threads manage them. Of course if you want to run python functions in this way, just use multiprocessing :P


PS2: Pypy was mentioned as well. Here's what it looks like running pypy:

Running the fibonacci function sequential and threaded:
timetaken=0.303519964218s
timetaken=0.300674200058s

Note the HUGE gain here - from 2.2 seconds or so, down to 0.3 seconds. Running pypy genericname.py instead of python3 was a 7x speedup here! My dual core has no chance catching up with parallelism.

Running the urlfetch function sequential and threaded:
timetaken=2.09613394737s
timetaken=0.464855194092s


Note there is really no gain here. The times here match those from python3. This is an IO bound operation. A faster python cannot help here - but the threading used shows a similar boost as in regular python.




1 comment: