Ask HN: Who's building on Python NoGIL?

I am interested in knowing the things the community is building specifically taking into account the NoGIL aspect of python. Like is someone building frameworks around using Threads instead of Async?

10 comments

This is going to be bananas for libpython-clj[1]. One of the biggest limiting factors right now is that you can't mix Java/Clojure concurrency with Python concurrency, you need to have a really clear separation of concurrency models. But with this, you will be able to freely mix Clojure and Python concurrency. Just from a compositional standpoint, Clojure atoms and core.async with Python functions will be fantastic. More practically, this will unlock a lot of performance gains with PyTorch and Tensorflow which historically we've had to lock to single threaded mode. Yay!

[1]: https://github.com/clj-python/libpython-clj

[deleted]

I’m here for it :) love the Clojure approach to symbiosis. Parens consume all the things!

Neat, guess we'll finally see if cross-language concurrency stops being such a pain.

My RSS reader is written in async Python but I think it was a mistake and I built my image sorter to use gunicorn which means I have to run it inside WSL on Windows but actually it works really well. My "image sorter" is actually a lot of different things (it has a webcrawler in it, a tagging system, will probably take over the RSS reader's job someday) but it does an unholy mix of )(i) "things that require significant CPU" (like... math) and (ii) "things that require just a touch of CPU" like serving images.

I found that (i) was blocking (ii) making the image sorter unusable.

So far though that is processes and not threads.

For the last few weeks for the hell of it I've been writing a small and very pedagogical chess playing program in Python (trying to outdo Lisp) and once I got the signs figured out in the alpha-beta negamax algorithm it can now beat my tester most of the time. (When I had the signs wrong it managed to find the fool's mate which is not too surprising in retrospect since it looks ahead enough plies)

That was my major goal but I'd also like to try an MCTS chess program which is more of a leap into the unknown. Unlike alpha-beta MCTS can be almost trivially parallelized (run 16 threads of it for, say, 0.1 s, merge the trees, repeat, ...) and threads would be a convenient way to handle concurrency here although multiprocessing out to be good enough. So I am thinking about using a non-GIL Python but on the other hand I could also rewrite in Java and get a 50x or so speedup and great thread support.

(Note the problem here is that unlike games where you fill up a board, chess doesn't really progress when you play out random moves. With random moves for instance you can't reproduce White's advantage at the beginning of the game and if your evaluation function can't see that you are doing really bad. I need a weak player for the playouts that plays well enough that it can take advantage of situations that real players can take advantage of at least some of the time. A really good move ordering function for an alpha-beta search might do the trick.

PyO3 0.23.0 was a big release I’ve been tinkering with extensively. Support for “free-threaded Python” is a headline feature, and I imagine NoGIL Python will be extremely nice for Rust interoperability, so there is definitely interest in that crate. Also could be huge for queueing data for GPUs, api servers, and bulk data fetching.

For whatever reason (maybe post 2to3 PTSD), Python community seems not extremely eager to jump on latest versions of Python and it often takes a long time for popular libraries to support the latest and greatest, so I’d recommend patience and baby steps

https://github.com/PyO3/pyo3/releases/tag/v0.23.0

> For whatever reason (maybe post 2to3 PTSD), Python community seems not extremely eager to jump on latest versions of Python

Well, you'd think after the 2 to 3 debacle, python might take backwards compatibility more seriously, but they don't.

Follow semver, and stop breaking things on 3.x. If it's deprecated in 3.x, don't remove it until 4.

Python doesn’t follow semver [0], it follows a general major.minor.bugfix, but with an extremely liberal definition of minor (“less earth-shattering,” as they describe it).

PEP387 requires that introduced incompatibilities have a “large benefit to breakage ratio,” and that any deprecations last a minimum of two years.

FWIW, Kubernetes has a similar approach. Breaking changes occur all the time with “minor” version updates.

[0]: https://docs.python.org/3/faq/general.html

[1]: https://peps.python.org/pep-0387/

Why should Python follow semver? There are plenty of successful projects that don't use semver. If you feel so strongly about them making the change, than make a case for it.

I don't think there's even a plan for a v4. The fallout from 2 to 3 was that bad. So to keep improvements going takes deprecating something several versions before removing, and research is done to find how popular that particular thing is to determine its candidacy for removal. Thus it's best practice to pin all dependencies, and read the release notes before doing a version update.

Yeah it'd be better if the name was understood as python 3 instead of just python to avoid mixing in semver

What have they broken on 3.x? Genuine question as I haven't followed python's development super closely

Removal of setuptools in 3.12 broke a ton of legacy builds. Basically created a wall of forced package upgrades for a huge amount of packages in PypI where end users have to bump a bunch of stuff if they want to migrate from < 3.12 to 3.12+

setuptools was never part of Python's standard library. I think you're thinking of distutils which was removed from Python in the 3.12 release. You can easily access distutils again by installing a package from PyPI.

You are right, it was distutils. Good call out. Not sure why I thought of setuptools.

This comment thread is a microcosm of the problems with python packaging :D I appreciate the work the ecosystem does on it and everyone is doing their best, but its still a hard problem that doesn't feel solved

One that I ran into at previous job: 3.10 removes the ability to implicit cast floats to ints in a bunch of places. That was very much a breaking change for a bunch of code out there.

Never heard of. Do you have any concrete examples? Would be good to know about for me in 2025.

Some seldom used standard modules have been deprecated and later removed. Like recently I revisited a project I initially made using v3.6, but it broke on v3.13 due to an indirect dependency no longer present in the stdlib. It was a simple fix though as a quick search identified the issue and pointed to the removed module in a package on PyPI.

Python 3.6 is from 2016 and 3.13 is from 2024. Similar things happen on most platforms on this timescale, eg on the Java side[1], you'd be going from Java 8 to Java 23.

Clojure is pretty good even on that timescale though.

[1] See eg https://stackoverflow.com/a/50445603 up until 2021

Yep it's totally understandable, and OK by me as these changes are documented in the release docs and the fix a pip install away.

I often see people recommending Python as a replacement for Bash scripts (utilizing common Unix tools like grep and awk). I'm pretty sure a script from 2016 will still be working fine now.

> I'm pretty sure a bash script from 2016 will still be working fine now.

In some environments, yes. A bunch of platforms have started using bash-compatible (but not actually bash) shells like fish to help with startup performance. Apple has upgraded from a truly ancient version of bash to a somewhat-modern one in that time (and then gone all-in on zsh instead). Things change on the scale of a decade.

The bash side will still work but the tools called by bash won't. Same goes for python/packages.

Yeah, it's nothing crazy, but it makes upgrades a lot more unpredictable. It's harder to communicate to management why the 3.x update took a day and the 3.y upgrade took a whole quarter.

It's harder to upgrade services in a central way with any amount of leverage, and generally requires more coordination overhead, and moving more carefully.

Compare with, say, golang, where it's pretty much a non-issue. My experience with Ruby was a lot better too, until Ruby 3, but hey, that was a major version bump!

They have been removing features every release for the past few years. Code that was working fine on 3.10 may break on 3.13 just because it was using a feature they didn't like.

Every release removes / breaks stuff in the standard library and has been for a while. That's because prior to ~3.2 or so, deprecations were basically never followed up on, but now deprecation means it's going to be removed.

E.g. Python 3.12 has deprecated datetime.utcnow(). So it will probably be removed in Python 3.14 or 3.15.

For all intents and purposes, "Python 3" is now the brand and "Python 3.x" releases are major, breaking releases.

My thoughts exactly. Python was supposed to be this ultra-portable thing. But... I am finding myself having to write patches to get my software to work on different Python versions.

People who have Python 3 installed can be on many different versions. The thing is, depending on the version, quite often bug fixes included in later versions aren't in older versions. So if you want to make your code work -- got to get the patches in manually, monkey patch broken code, and do it that way. Then there's the seemingly random deprecation of standard library modules / other breaking changes.

I take python version support seriously because if people install your packages you'll be outsourcing all of the above crap to the user. They might not even know how to 'upgrade' python. Or end up on the wrong version. If your package doesn't work when they install it they'll just move on to something else. Python is a total shit show for packaging.

I use Poetry and pin the Python version. Ugly, but it works.

Given the many ways of creating virtual environments it's becoming more usual for a program's installation to create a specialist virtual environment just to support that application (pre-commit is. good example of this technique). Perhaps that'a a way ahead?

This is almost mandatory if you want to ship something which might be considered "standalone" on Python.

> For whatever reason (maybe post 2to3 PTSD), Python community seems not extremely eager to jump on latest versions of Python

I suspect this is a mix of Python 3 being “good enough” for most cases and companies not updating their stacks that often. I think most of us came into Python professionally around 2.7 so the need to keep updating our version hasn’t been heavily ingrained into our thinking.

> For whatever reason (maybe post 2to3 PTSD), Python community seems not extremely eager to jump on latest versions of Python

I don't think it's the 2 -> 3 thing any more, that was a while ago. Honestly there are just a lot of things that don't work well in 3.x.0 Python releases. For example, 3.12.0 had the per-interpreter GIL thing; I tried that in 3.12.0 and ran into a completely breaking issue almost immediately. They were responsive & helpful and did fix the first issue in 3.12.1, but we still had more issues with parts of the C API which seemed to work in 3.11 and better again in 3.13, but it really felt like that needed another release to solidify. (Also you can't import datetime in that setup in 3.12, which is also a pretty big deal-breaker).

I can only imagine the free-threading thing will need at least the same kind of time to work the kinks out, although it is nice to see them moving in that direction.

Oh, what a surprise, I thought PyO3 was dead. Glad to see it's not!

Its merged into CPython 3.13 but labeled as experimental.

Single threaded cpu bound workloads suffer in benchmarks (vs i/o workloads) till they put back the specializing adaptive interpreter (PEP 659) in 3.14. Docs say a 40% hit now, target is 10% at next release.

C extensions will have to be re-built and ported to support free threaded mode.

Some interesting and impactful bits of open source work for those with a c++ multithreading background.

May I ask which benchmarks you saw? I was looking for some reliable one and couldn’t find them.

Unfortunately, unfavorable benchmarks are flagged here. Python generally relies on suppressing information that is not expedient.

If the docs finally mention the issue, it means that the slowdown can no longer be hidden. The 40% is in line with several 50% slowdowns that have been posted here and have been flagged or buried.

In new code I try to use threads, but certain things like yield which rely on async are simply too common and useful to stop using.

So far in production if I need to use multiple cores, I use multiple processes and design apps that way. The discipline this imposes does seem to result in better apps than I wrote in an environment like Java with tons of threads.

> In new code I try to use threads, but certain things like yield which rely on async are simply too common and useful to stop using.

Huh? I python you have to choose either threads or asynchronous/await? Why not combine both of them? I am so confused. C# allows for both to be combined quite naturally. And JavaScript as well allows for workers with async/await.

What do you mean? Async/await uses threads

I was trying to make sense of this sentence of the original commenter: "In new code I try to use threads, but certain things like yield which rely on async are simply too common and useful to stop using."

Seems to suggest that threads and yield/async were mutually exclusive. I misunderstood. I will move on.

To use async await you have to execute the code in a run loop.

I've always used threads despite the GIL. I haven't tried NoGIL and am waiting to find out how many bugs it surfaces. I do get the impression that multi-threaded Python code is full of hazards that the GIL covers up. There will have to be locks inserted all over the place. CPython should have simply been retired as part of the 2 to 3 transition. It was great in its day, on 1-core machines with the constraints of that era. I have a feeling of tragedy that this didn't happen and now it can never be repaired. I probably wouldn't use Python for web projects these days. I haven't done anything in Elixir yet but it looks like about the best option. (I've used Erlang so I think I have a decent idea of what I'd be getting into with Elixir).

In a strange way, Python being so bad at interpreting bytecodes and limited by GIL, plus being good at interfacing with C cheaply (unlike Go and Java,) induced a programming style that is extremely suited for data-parallel computing which is the way to efficiently scale compute in today's SIMD/GPU world. If you wanted to be efficient, you had to prepare your data ahead of time and hand it off. Any intermediate interaction with that data would ruin your performance. That's mostly how efficient Python libraries and ecosystem are built.

Weakness may have turned into a strength.

Why would NoGIL change that, though? It's not like large data-parallel operations can suddenly be done efficiently in Python if you remove the GIL. The problem with GIL afaik is mostly latency problems in interactive applications.

Oh, I didn't mean to imply something will change. I am simply concurring with the parent while observing that limitation turned into a strength by established a certain ecosystem early on that fits the modern architectural developments well. I don't think that is going to change now, but had nogil been the original, it could have led to a different style of libraries being designed.

Can you elaborate more on the data-parallel computing part?

Pretty much the entire Data Science/Machine Learning landscape from numpy, etc. to tensorflow and alike are thin wrappers over C code and if you want performance, you better batch structure your operation beforehand and minimize back and forth from Python.

The GIL is only a problem if you’re trying to access the same memory.

If you take the time to split up your data into chunks you can avoid the GIL entirely with multiprocessing. Or by handing it off to a library that does it for you. just as long as they’re not using the same python objects.

Using multiprocessing adds some overhead for because of serialization, so it's slower than it would be to just hand off the Python objects directly to the workers (as you can with threads) because the process doing the parsing also has to spend time on serializing them again. So you can avoid the GIL, but it has a cost.

For example, if I parse an XML document with ElementTree, as a quick experiment, parsing the document takes ~1 second and serializing all the elements names and attributes to JSON takes an additional ~0.5 seconds. Serializing the whole ElementTree object using pickle takes ~4 seconds. Serializing it as XML takes roughly as long as parsing it.

He's probably talking about libraries like PySpark or PyFlink which are used a lot

Pyflink seems promising, I love vanilla flink but as soon as you need to debug your pyflink job pyflink becomes a hurdle. That translation layer between Python and Java can be opaque.

Not really... What I see in practice is that Python's shortfalls are being covered by throwing more hardware at it. (Beside of the more efficient, but also more complex: rewriting in C).

There are all kinds of micro-optimizations, as in: one has to know which Pandas operations are going to be more expensive than others, and organize the code accordingly, but these things often teach programmers the wrong ideas. It's not uncommon in Python world that a superior solution (from algorithmic perspective, i.e. the one that should use less time or space) is in practice inferior to a solution that's implemented in C. And so, writing more efficient Python code comes down to knowing which functions are faster or cheaper in some other way, but it doesn't generalize and doesn't transfer to other languages.

What usually happens in situation like this is that the developers of the language (or a product, a framework etc. that suffers a similar fate) start optimizing the bad solutions (because they are the go-to tool for their users) instead of actually improving the language (the product, the framework etc.) To give some examples of this happening in Python: there's a lot of work dedicated to the performance of lists and dicts. But, if anyone really wanted performance, they'd have to look for more specialized collections, rather than optimizing very generic ones.

So what would the alternative history have been if CPython was retired after Python 3 came out in 2008, what would we be using now? IronPython or GraalPy?

It would have suffered the same fate as Perl 6 and we'd all be on 2.1x.

I have the same question! I love Python and asynchronous stuff, and I do not know too much about threading.

Is threading potentially better for IO bound tasks than async?

Potentially, but probably not. The benefit of a parallel-enabled interpreter would be two CPU cores executing bytecode instructions at the same time in the same interpreter. So, you could have one python thread working on one set of data, and another python thread working on another set of data, and the two threads would not interfere with each other much or at all. Today, with the global interpreter lock, only one of those threads can be executing bytecode at a time.

> Today, with the global interpreter lock, only one of those threads can be executing bytecode at a time.

Yes, but Python now has a version without GIL, which prompted this post in the first place. So my question is: Now, if I use a version of Python 3.13 without GIL, can a threaded Flask app do better than an AIOHTTP server.

I think it will depend on which task we are talking about. For For CPU-bound tasks (i.e heavy computation, data processing), The No-GIL Flask with threading would likely perform better than AIOHTTP since it can truly parallelize computation across cores.Now for I/O-bound tasks (database queries, API calls or file operations) then AIOHTTP would still likely be more efficient due to its lower overhead and memory.

So for your original question

> Is threading potentially better for IO bound tasks than async?

async will be better in general, potentially due to Async co-routines using far less memory than threads and being better under high concurrency. But that's all will depend on the details of implementation of the server.

[deleted]

Quite honestly I'd tell you not to mix threads with asyncio. As you note: IO bound tasks aren't CPU hogs and there's little benefit to mixing it with threads. It will lead to unnecessary bugs, complexity, and problems with event loop management.

Asyncio can run tens of thousands of tasks if its used properly. If you think something will block it you should check out "process pool executors." Note that its very tricky to share resources like sockets between processes so its kind of another reason to avoid stuff like this.

I think Python 3.14 will have interpreter pools for even more concurrency options.

> there's little benefit to mixing it with threads

> If you think something will block it you should check out "process pool executors." Note that its very tricky to share resources like sockets between processes so its kind of another reason to avoid stuff like this.

Isn't that the benefit of no-gil? The ability to run CPU-intensive operations without incurring the overhead and friction of multiprocessing? Now you can do multicore processing while also having shared memory

Waiting for CFFI or pywin32 free-threaded support since I don't have time to work on CFFI myself

Started working on a no-gil gRPC Python implementation but super low priority.

Has anyone started deploying nogil at scale in prod?

I don't think it is ready to be used in prod. The feature is still experimental

No, I am not personally aware of anyone using it prod.

Hi