December 06, 2012

Python and Multi-threadining

As a final project in one of the courses I'm taking, I proposed to implement a parallel optimization algorithm for the LASSO problem, in Python (see details in this paper and C implementation here). I immediately ruled out Matlab, my preferred weapon of choice for numerical optimization, since Its parallel computation toolbox is not free. After that, it was either Perl (which I know and love) or Python (which I practically knew nothing about).

The trouble is that lack of knowledge leads to really poor decision making. Naive as I was, I assumed that a language glorified by the open source community, known to be widely used in Google, and has been around for more than 20 years will, well basically support everything you think it should, and in a way any experienced developer will be able to anticipate before hand.

Well. The situation is a bit more complex than that: (/rant)

Here are some quick lessons I learned about Python and *multi-threading*:
  1. Python has several implementation (in C, in Java, in Python, in .Net…)
  2. The most common, widely used implementation is the C version, called CPython. It DOES NOT support threads in an expected manner - The GIL component (Global Interpreter Lock, satan take his soul) actually makes all the threads compete on a *single* core, even if your system has multiple cores. This means python threads do not run in parallel. They run one at a time (interleaved).
  3. Some of the non-C implementations for python do claim to support threading properly.
I tested (3) on jython and pypy (see code below).
Jython worked okay (however, acceleration scaled poorly on my mac).
pypy - I still don't understand if they claim to support threads properly or not. Practicaly it doesn't work (tested w/ pypy-1.9).

Unfortunately, the non-C implementations may not support external libraries (such as Numpy), so if you need real parallel computation with CPython, you should use the multiprocessing library.
Unfortunately++, in this case you have to manage the shared memory by yourself.
(To contrast with threads, in most programming languages, threads can easily access shared variables with almost not syntactic effort on the developers behalf. Processes however, are a different breed. They were designed to be memory disjoint, and when shared memory is required, some other explicit mechanism is used.)

My next planned post is about Python + multiprocessing + shared memory + numpy, where I hope to release a small library that encapsulates the basic requirements.
In the meantime, you can read the following:
  • Impressive and professionally written post on multiprocessing.
  • stackexchange discussion on numpy + shared memory
My code will be based on both.
-- Tomer


Other References
* A good overview of threading in python can be found here.

* Python code I used to measure thread performance (based on this):
#!/usr/bin/python
import threading
import sys
import time

class myThread (threading.Thread):
    def __init__(self, threadID, name, P, MAX, lock):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.MAX = MAX
        self.lock = lock
        self.P = P
    
    def run(self):
        global A
        for i in range(self.MAX):
            # do some work
            for j in range(1000):
                s = j
            k = (self.threadID+i) % self.P
            # REMOVE ME if you want counter locking
            #            with self.lock[k]:
            A[k] = A[k] + 1;

def runParallel(P,N):
    # distribute N to P 
    global A
    A = [0 for i in range(P)]; # zeros(P,1)
    lock = [threading.Lock() for i in range(P)]
    threads = []
    t = time.time()
    for p in range(0,P):
        thread = myThread(p, "Thread-"+str(p), P, N/P, lock)
        thread.start()
        threads.append(thread)

    for p in threads:
        p.join()
    elapsed = time.time() - t
    print 'time for', P, ':', elapsed

    print A[:]
    return elapsed

if __name__ == '__main__':
    global A
    N = int(sys.argv[1]) # counter
    e1 = runParallel(1,N)*1.0
    e2 = runParallel(2,N)
    e4 = runParallel(4,N)
    e8 = runParallel(8,N)
    print 'time:', e1, e2, e4, e8
    print 'efficiency:', e1/(e1), e1/(2*e2), e1/(4*e4), e1/(8*e8)

No comments: