add-memberalign-toparrow-leftarrow-rightbellblockcalendarcamerachatchevron-leftchevron-rightchevron-small-downchevron-upcircle-with-crosscomposecrossfacebookflagfolderglobegoogleimagesinstagramkeylocation-pinmedalmoremuplabelShape 3 + Rectangle 1pagepersonpluspollsImported LayersImported LayersImported LayersshieldstartwitterwinbackClosewinbackCompletewinbackDiscountyahoo

PyAtl: Atlanta Python Programmers Message Board PyAtl Job Listings › NEWS: Does Python have a concurrency problem?

NEWS: Does Python have a concurrency problem?

A former member
Post #: 5­


There are a number of concurrency models available for Python, both in the standard library and some home-grown solutions. Standard threading, which uses the OS's threading library, is perhaps the most common. Select-based concurrency, such as is used by Twisted, is also quite popular in the Python community. Generator-based "threads", such as described by David Mertz, is another mechanism to support concurrent tasks in Python. A problem is that none of these methods currently scale across multiple CPUs and take full advantage of them. This isn't as much of a problem on IO bound processes (such as network applications) as it is on CPU bound applications, so a select-based concurrency model does have an advantage there. The only options currently available (that I know of) which can take full advantage of multiple CPUs involve multiple process, either forking or using shared memory...or both. I'm sure this approach works very well in a number of situations, but it just feels like a mess. I have a hard time attaching the words "elegant" and "Pythonic" to "forked processes" and "shared memory".

CPython comes equipped with the global interpreter lock (GIL) which allows only one op-code of Python bytecode to execute at a time, regardless of how many threads may be running in a given Python process. This is by design and, from what I gather, is a protection mechanism which keeps the internals of the Python interpreter (and I assume running code) from being mangled by threads accessing the same spots of memory. The end result is that a single CPython process of threaded code will not fully utilize more than a single processor in a single system. This means that, all things being equal, a single process, even threaded, will run no faster on a 128 CPU machine than it will a single CPU machine.

There has been some talk recently about a more scalable (and more Pythonic) concurrency model. Bruce Eckel started a discussion around this topic the other day on the python-dev mailing list. There doesn't appear to be a consensus just yet on the exact approach, but some good ideas floated around for a bit. Unfortunately, the discussion appears to be done - prematurely by my estimation. I'd love to see a PEP come out of this, though. There are so many sticking points, both technically and ideologically, that it will take some time to formulate a PEP that will gain general acceptance. This sounds like a case where some really bright person (of which there are plenty in the Python community and on the Python-dev list specifically) needs to just write a PEP that isn't too strongly hated by any one side and let it get BDFLed into existence.

I'm no language writing expert and the only experience I've had with concurrent programming has been with Python, so I'm sure there are nuances of lower level concurrency that I'm missing, but I am formulating in my mind what kind of concurrency model I'd like to see. I liked the idea of each task creating another Python interpreter instance in the Python process. Why not just spawn a new process? It seems like that just makes it a bit harder to share information between the starting task and the started task. Of course, you want the ability to share information, but you don't want too much shared. Another idea that I liked was a queue-like interface between the starting task and the started task. The starting task should have to explicitly pass in the specific pieces of information it wants the started task to work on or have available to it. The starting task should have the ability to query the started task and find out if it's working on the task or if it's done. Now, if the started task needs to return something to the starting task, how does it do it? I don't know. I really don't like the thought of the starting task polling a queue to see if there is anything in there. What if the started task isn't intended to return anything? I know, you can set flags when starting it....... You can't really make its "run" method return anything or you would block until it finished, which is self-defeating. I'm sure one or more of the pythonian intelligencia will come up with something brilliant. It will probably look nothing like what I've described and I'm sure I'll love it and think that it is better than I could have imagined. I would just like to see it happen.

You may think that I presupposed that Python needs a new concurrency scheme. Well, maybe that'll be a discussion for another day.

Jeremy Jones is a script monkey who works for The Weather Channel as a software quality assurance engineer.
Powered by mvnForum

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy