Wednesday, September 18, 2019

Threading continued - timing thread

Thread for timing is useful:

Last post discussed parallelism. Threads can be useful for timing stats. In this example, a single thread is spawned for the timing function. Obviously you can spawn purpose specific threads to do whatever you need. Communication with threads is simple because you can just use local variables. Python built in types are thread safe, so it's quite simple, no need to use threading.Lock().


import random
import threading
import json
import time

timingstats={"starttime":0,"processtime":0,"rate":0,"requests":0,"lastreq":0}
exitflag=0

def timingfunction():
    a=0
    while True:
        time.sleep(0.01)
        a+=1
        if a%500==0 or exitflag==1:
            print("## WORD RATE ## : {}".format(timingstats["rate"]))
        if exitflag==1: return

def req(n):
    global exitflag
    entry=""
    while entry != n:
        entry=input("Type: {} :".format(n))
        if entry=="exit":
            exitflag=1
            exit()
    timingstats["requests"]+=1
    now=time.time()
    timingstats["lastreq"]=now
    timingstats["rate"]=float(timingstats["requests"])/(now-timingstats["starttime"])



timingthread=threading.Thread(target=timingfunction)
timingthread.start()
timingstats["starttime"]=time.time()

words="hairy,scary,mary,carbon,chief,hat,blah,lunchbox,cloud,crap"
wordlist=words.split(",")
while True:
    chk=random.randint(0,len(wordlist)-1)
    req(wordlist[chk])

sample output:
python3 tst3.py
Type: chief :chief
Type: blah :blah
Type: chief :chief
Type: blah :## WORD RATE ## : 0.6276005416066457
blah
Type: blah :blah
Type: lunchbox :lunchbox
Type: hat :## WORD RATE ## : 0.5847226421265149
hat
Type: blah :blah
Type: crap :crap
Type: lunchbox :## WORD RATE ## : 0.6115188345845941
exit
## WORD RATE ## : 0.6115188345845941


The timing thread is running on it's own, interrupting me mid-word etc. This is useful if you're making a network app and want statistics, checks, monitoring etc. Maybe the output is to a file, or over the network. For terminal apps, it can be used well if you specify text position. 
e.g.
print("\033[6;53H###Hello")
will output ###Hello at row 5, column 53 in the terminal (not windows).
e.g. you can use several different checks on different threads, and their output status can write data to different areas of the screen. Alternatively (and generrally better idea for bigger things), different threads for different checks write status to status list variable, and a screen updater thread uses that status to draw the screen.

Tuesday, September 17, 2019

Python Threading and a little Pypy

Multithreading in Python - useful for io...

Python is really easy to pick up and learn. It comes with almost any Linux distribution. Once you know what you're doing, it's quick to put together an application. That said, it's slow. If you're hitting databases, elasticsearch, http pulls, running external programs etc - then perhaps that doesn't matter - but if you're doing cpu bound operations, it's slowwwww. 

There are useful ways to speed up processing. The first to consider is probably pypy. This is often quick to implement (no code change) - though don't do it in production unless your company is fine with it :P. The next step - parallelism.

So let's talk threading. Python has something called the GIL - Global Interpreter Lock. It is there to protect non-thread safe threads, and it makes it simpler to write the interpreter. More here.

In Python threading works fine for IO bound threads. It does not work for CPU bound threads because of the implemented GIL. e.g. lets say you write a simple Fibonacci function which if done recursively can be heavy on CPU. If I wanted to handle generating nth term on a list, multithreading in python wont help get this done faster. Multiprocessing can though, but we're looking at threading here.

Here's what it looks like running a cpu heavy function in series:

import time

def fib(n):
    if n<2:return 1
    return fib(n-1)+fib(n-2)

def fiblist(ns):
    na=[]
    for n in ns:
        na.append(fib(n))
    return na

def timeathing(f,v):
    tstart=time.time()
    f(v)
    tfinsh=time.time()
    print("timetaken={}s".format(tfinsh-tstart))

timeathing(fiblist,[30,31,29,30,31,29])

So fib will find the n'th term of Fibonacci. Fiblist calls fib for each element in the list, and does this in a loop - one after the other. "timeathing" simply outputs the time a function took.
So dropping this in python 3 the output is:
timetaken=2.3240543842315674s

So lets try threading.

Again using fibonacci - an area where we do NOT expect any improvement.


import time
import threading

def fib(n):
    if n<2:return 1
    return fib(n-1)+fib(n-2)

def fiblist(ns):
    na=[]
    for n in ns:
        na.append(fib(n))
    return na

threadoutput=[]
def threadhelper(func,arg):
    threadoutput.append((arg,func(arg)))

def fibthreading(tasklist):
    #start a bunch of threads and save handles to a list
    threadlist=[]
    for task in tasklist:
        threadlist.append(threading.Thread(target=threadhelper,args=(fib,task,)))
        threadlist[-1].start()
    #poll list of threads until all are done
    notdone=True
    while notdone:
        notdone=False
        for task in threadlist:
            if task.isAlive():
                notdone=True
                time.sleep(0.001)

def timeathing(f,v):
    tstart=time.time()
    f(v)
    tfinsh=time.time()
    print("timetaken={}s".format(tfinsh-tstart))

timeathing(fiblist,[30,31,29,30,31,29])
timeathing(fibthreading,[30,31,29,30,31,29])

print(threadoutput)

Running this we get the following :

timetaken=2.200274705886841s
timetaken=2.2959671020507812s
[(29, 832040), (29, 832040), (30, 1346269), (30, 1346269), (31, 2178309), (31, 2178309)]



So we see no time improvement - as expected really. But threading can help if our function was IO based - e.g. calling external commands, polling web pages etc. The output is shown to show this works.



Lets try that.
So, here I'll pull weather data sequentially, then in parallel using threads. 

import time
import threading
import requests

def urlpull(url):
    r=requests.get(url)
    return r.text

def urllist(ns):
    na=[]
    for n in ns:
        na.append((n,urlpull(n)))
    return na

threadoutput=[]
def threadhelper(func,arg):
    threadoutput.append((arg,func(arg)))

def urlthreading(tasklist):
    #start a bunch of threads and save handles to a list
    threadlist=[]
    for task in tasklist:
        threadlist.append(threading.Thread(target=threadhelper,args=(urlpull,task,)))
        threadlist[-1].start()
    #poll list of threads until all are done
    notdone=True
    while notdone:
        notdone=False
        for task in threadlist:
            if task.isAlive():
                notdone=True
                time.sleep(0.001)

def timeathing(f,v):
    tstart=time.time()
    f(v)
    tfinsh=time.time()
    print("timetaken={}s".format(tfinsh-tstart))

urls=[]
url="https://www.wrh.noaa.gov/mesowest/getobext.php?sid={}&table=1&num=168&banner=off"
for a in ["KNYC","KHTO","KFRG","KISP","KJFK"]:
    urls.append(url.format(a))

timeathing(urllist,urls)
timeathing(urlthreading,urls)

Running this gives the following:

timetaken=2.074526071548462s
timetaken=0.4535403251647949s



Sequential time is the first. Threaded time to pull 5 urls is the second. That's pretty huge time savings. Printing "threadoutput" shows the data successfully acquired. Of course you'll probably want to limit the number of threads actually alive - that's not difficult, just a couple more loops really.

PS: You can have a cpubound function called in a subprocess, and threading WILL actually help - because you're no longer actually threading, you're calling external processes, but having threads manage them. Of course if you want to run python functions in this way, just use multiprocessing :P


PS2: Pypy was mentioned as well. Here's what it looks like running pypy:

Running the fibonacci function sequential and threaded:
timetaken=0.303519964218s
timetaken=0.300674200058s

Note the HUGE gain here - from 2.2 seconds or so, down to 0.3 seconds. Running pypy genericname.py instead of python3 was a 7x speedup here! My dual core has no chance catching up with parallelism.

Running the urlfetch function sequential and threaded:
timetaken=2.09613394737s
timetaken=0.464855194092s


Note there is really no gain here. The times here match those from python3. This is an IO bound operation. A faster python cannot help here - but the threading used shows a similar boost as in regular python.