In 1994 at the second WWW conference we presented "An API to Mosaic". It was TCL embedded inside the (only![1]) browser at the time - Mosaic. The functionality available was substantially similar to what Javascript ended up providing. We used it in our products especially for integrating help and preferences - for example HTML text could be describing color settings, you could click on one, select a colour from the chooser and the page and setting in our products would immediately update. In another demo we were able to print multiple pages of content from the start page, and got a standing ovation! There is an alternate universe where TCL could have become the browser language.
For those not familiar with TCL, the C API is flavoured like main. Callbacks take a list of strings argv style and an argc count. TCL is stringly typed which sounds bad, but the data comes from strings in the HTML and script blocks, and the page HTML is also text, so it fits nicely and the C callbacks are easy to write.
[1] Mosaic Netscape 0.9 was released the week before
Another excellent GUI is gitg. You can select specific lines for staging, but also for discarding. The latter is especially useful for temporary debug only changes that you want to throw away.
Heap allocation does also allow other things. For example stack frames (PyFrame) are also heap allocated. When there is an exception, the C stack gets unwound but the heap allocated frames are retained forming the traceback. You can then examine the values of the local variables at each level of the traceback.
And it also allows async functions, since state is held off the C stack, so frames can be easily switched when returning to the event loop.
The other thing made easy is C extension authoring. You compile CPython without free lists and an address sanitizer, and getting reference counting wrong shows up.
Note that free threaded compatible doesn't necessarily mean the package supports free threading (concurrent execution), just that it can be loaded into a free threaded interpreter.
This is the case with my own package which is on the hugovk list (apsw) which will cause the GIL to be re-enabled if you load it into a free threaded Python. The reason I provide a binary wheel is so that you don't have to keep separate GIL full and free threaded interpreters around. They have a different ABI so you can't use extensions compiled against one with the other.
Free threading is at the beginning of its journey. There is a *lot* of work to on all C code that works with Python objects, and the current documentation and tools are immature. It is especially the case that anyone doing Python concurrent object mutation can cause corruption and crashes if they try, and that more auditing and locking need to be done in the C code. Even modules in the standard library have only been partially updated.
> so that you don't have to keep separate GIL full and free threaded interpreters around
It means the user doesn't have to keep two Pythons around, install packages in both of them, etc.
It is also possible with the free threaded Python to keep the GIL disabled even if a package such as mine says it needs the GIL. And my package will indeed work just fine, until you supply it with mutable data and concurrently modify it in another thread.
But the users install the free threaded python to do free threaded stuff. The second they use your package they have a GIL again, which entirely defeats the point.
Wouldn't it be much better to just not support it if it's not supported?
1) Saying your package supports free threading, but it isn't safe - ie concurrent mutation can result in corruption and crashes
2) Allowing the package to be loaded into a free threaded Python, which immediately enables the GIL. Concurrent mutation does not result in corruption and crashes because of the GIL. The user doesn't have to maintain two Python installations. They can set the environment variable PYTHON_GIL=0 or start Python with -Xgil=0 which will keep the GIL disabled, and they will be fine if they avoid concurrent mutation.
I chose 2. The stdlib json package (along with many others) picked 1. Heck I'll guarantee that most that picked 1 aren't 100% safe either, because doing the changes is hard work, *every* case has to be covered, and tools like thread sanitizers don't work.
The reason I chose 2 is because I care about data integrity. I will eventually reach 1, but only once I can be certain the code is correct.
You aren't forced to use a GIL as I keep stating. You can set an environment variable or a command line flag to Python and the GIL will remain disabled. My package will work just fine if you do that, unless you provide it with data you concurrently modify in which case you can get corruption and crashes.
Yes. The interpreter warns by default, and requires steps to disable the warning. My release notes say that the GIL will be enabled when the package is loaded.
Is it madness that other packages claim they support running without the GIL, yet it is possible to cause corruption and crashes just by writing concurrent Python code? That is the case with the standard library. Compiler thread sanitizers don't work with free threaded Python. Diligent code inspection by humans is the only way to update C code so far.
Free threading is at the beginning of the project. It works. You get slow downs in single threaded performance due to extra locking, and speedups in concurrent performance due to threading. But it is possible to cause corruption and crashes via Python code. Don't expose it to untrusted data and code.
But do investigate it and see what works well and what doesn't. See what code patterns are now possible. Help improve the tools and documentation. It had to start somewhere, and the current state is somewhere.
I think what you are doing is hiding problems. I think crashes and bugs are preferable to find the issues at this point. People who want the safe option will run the regular python.
1) Use regular GIL Python and you get the highest levels of integrity and correctness of operation of my package
2) Use a free threaded Python, the GIL will be enabled at load time, and you get the highest levels of integrity and correctness
3) Use a free threaded Python, and set $PYTHON_GIL=0 or start with -Xgil=0 to keep the GIL disabled, and providing you do not do concurrent mutation of data provided to my package, you get the highest levels of integrity and correctness
BTW I did not randomly choose to provide the free threaded builds. I specifically asked the setuptools maintainers (under the Python Packaging Authority) how to prevent free threaded builds for PyPI. They encouraged me to do the free threaded builds so that a user doesn't to have maintain parallel regular and free threaded Python installations. And it allows option 3 above.
C code needs to be updated to be safe in a GIL free execution environment. It is a lot of work! The pervasive problem is that mutable data structures (lists, dict etc) could change at any arbitrary point while the C code is working with them, and the reference count for others could drop to zero if *anyone* is using a borrowed reference (common for performance in CPython APIs). Previously the GIL protected where those changes could happen. In simple cases it is adding a critical section, but often there multiple data structures in play. As an example these are the changes that had to be done to the standard library json module:
The json changes above are in Python 3.15, not the just released 3.14.
The consequences of the C changes not being made are crashes and corruption if unexpected mutation or object freeing happens. Web services are exposed to adversity so be *very* careful.
It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.
I think Java got this mostly right. On the threading front, very little is thread-safe or atomic (x += 1 is not thread-safe), so as soon as you expose something to threads, you have to think about safe access. For interacting with C code, your choices are either shared buffers or copying data between C and Java. It's painful, but it's needed for memory safety.
The core Python data structures are atomic to Python developers. eg there is no way you can corrupt a list or dictionary no matter how much concurrency you try to use. This was traditionally done under the protection of the global interpreter lock which ensured that only one piece of C code at a time was operating with the internals of those objects. C code can also release the GIL eg during I/O, or operations in other libraries that aren't interacting with Python objects, allowing concurrency.
The free threaded implementation adds what amounts to individual object locks at the C level (critical sections). This still means developers writing Python code can do whatever they want, and they will not experience corruption or crashes. The base objects have all been updated.
Python is popular because of many extensions written in C, including many in the standard library. Every single piece of that code must be updated to operate correctly in free threaded mode. That is a lot of work and is still in progress in the standard library. But in order to make the free threaded interpreter useful at this point, some have been marked as free thread safe, when that is not the case.
So its the worst of all possible worlds then. It has the poorest performance due to forced locking even when not necessary and if you load a library in another language (C), then you can still get corruptions. If you really care about performance, probably best to avoid Python entirely, even when its compiled like it is in CPython.
PS For extra fun, learn what the LD_PRELOAD environmental variable does and how it can be used to abuse CPython (or other things that dynamically load shared objects).
It is multiple fine grained locking versus a single global lock. The latter lets you do less locking, but only have a single thread of execution at a time. The former requires more locking but allows multiple concurrent threads of execution. There is no free lunch. But hardware has become parallel so something has to be done to take advantage of that. The default Python remains the GIL version.
The locking is all about reading and writing Python objects. It is not applicable to outside things like external libraries. Python objects are implemented in C code, but Python users do not need to know or care about that.
As a Python user you cannot corrupt or crash things by code you write no matter how hard you try with mutation and concurrency. The locking ensures that. Another way of looking at Python is that it is a friendly syntax for calling code written in C, and that is why people use it - the C code can be where all the performance is, while retaining the ergonomic access.
C code has to opt in to free threading - see my response to this comment
It is true that more fine grained locking can end up being done than is strictly necessary, but user's code is loaded at runtime, so you don't know in advance what could be omitted. And this is the beginning of the project - things will get better.
Aside: Yes you can use ctypes to crash things, other compiled languages can be used, concurrency is hard
It depends on how you define "corruption". You can't get a torn read or write, or mess up a collection to the point where attempts to use it will segfault, sure. You can still end up with corrupt data in a sense of not upholding the expected logic invariants, which is to say, it's still corrupt for any practical purpose (and may in turn lead to taking code paths that are not supposed to ever happen etc).
A library written in another language would have a Python extension module wrapping it, which would still hold the GIL for the duration of the native call (it can be released, but this is opt-in not opt-out), so that is usually not the issue with this arrangement.
The bigger problem is that it teaches people dangerously misguided notions such as "I don't need to synchronize if I work with built-in Python collections". Which, of course, is only true if a single guaranteed-atomic operation on the collection actually corresponds to a single logical atomic operation in your algorithm. What often happens is people start writing code without locks and it works, so they keep doing it until at some point they do something that actually requires locking (like atomic remove from one collection & add to another) without realizing that they have crossed a line.
Interestingly, we've been there before, multiple times even. The original design of Java collections entailed implicit locking on every operation, with the same exact outcome. Then .NET copied that design in its own collections. Both frameworks dropped it pretty fast, though - Java in v1.2 and .NET in v2.0. But, of course, they could do it because the locking was already specific to collections - it wasn't a global lock used for literally every language object, as in Python.
> If you really care about performance, probably best to avoid Python entirely
This has been true forever. Nothing more needs to be said. Please, avoid Python.
On the other hand, I’ve never had issues with Python performance, in 20 years of using it, for all the reasons that have been beaten to death.
It’s great that some people want to do some crazy stuff to CPython, but honestly, don’t hold your breath. Please don’t use Python if Python interpreter performance is your top concern.
Arguably, it's a step in the wrong direction. Share memory by communicating is already doable in Python with Pipe() and Queue() and side steps the issue entirely.
Python and Javascript (in the browser) due to their single threaded nature. C++ too as long as you have a std::atomic on the left hand side (since they overload the operator).
There is NB_INPLACE_ADD... but I'm struggling to find enough details to be truly confident :\ possibly its existence is misleading other people (thus me) to think += is a single operation in bytecode.
whether 'someField' is volatile or not. The volatile just affects the load/store semantics of the GETFIELD/PUTFIELD ops. For atomic increment you have to go through something like AtomicInteger that will internally use an Unsafe instance to ensure it emits a platform-specific atomic increment instruction.
> It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.
Create or extend a list of answers to:
What heuristics predict that code will fail in CPython's nogil "free threaded" mode?
But as an example neither include PySequence_Fast which is in the json.c changes I pointed to. The folks doing the auditing of stdlib do have an idea of what they are looking for, and so would be best suited to keep a list (and tool) up to date with what is needed.
A list of Issue and PR URLs that identify and fix free threading issues would likely also be of use for building a 2to3-like tool to lint and fix C extensions to work with CPython free threading nogil mode
I agree and honestly it may as well be considered a form of ABI incompatibility. They should make this explicit such that existing C extensions need to be updated to use some new API call for initialization to flag that they are GILless-ready, so that older extensions cannot even successfully be loaded when GIL is disabled.
This has already been done. There is a 't' suffix in the ABI tag.
You have to explicitly compile the extension against a free threaded interpreter in order to get that ABI tag in your extension and even be able to load the extension. The extension then has to opt-in to free threading in its initialization.
If it does not opt-in then a message appears saying the GIL has been enabled, and the interpreter continues to run with the GIL.
This may seem a little strange but is helpful. It means the person running Python doesn't have to keep regular and free threaded Python around, and duplicate sets of extensions etc. They can just have the free threaded one, anything loaded that requires the GIL gives you the normal Python behaviour.
What is a little more problematic is that some of the standard library is marked as supporting free threading, even though they still have the audit and update work outstanding.
Also the last time I checked, the compiler thread sanitizers can't work with free threaded Python.
the problem with that is it effects the entire application and makes the whole thing free-threading incompatible.
it's quite possible to make a python app that requires libraries A and B to be able to be loaded into a free-threaded application, but which doesn't actually do any unsafe operations with them. we need to be able to let people load these libraries, but say: this thing may not be safe, add your own mutexes or whatever
SQLite has a builtin session extension that can be used to record and replay groups of changes, with all the necessary handling. I don't necessarily recommend session as your solution, but it is at least a good idea to see how it compares to others.
That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:
Unless you compile SQLite yourself, you'll find the maximum mmap size is 2GB. ie even with your pragma above, only the first 2GB of the database are memory mapped. It is defined by the SQLITE_MAX_MMAP_SIZE compile time constant. You can use pragma compile_options to see what the value is.
SQLite has 32 bit limits. For example the largest string or blob it can store is 2GB. That could only be addressed by an incompatible file format change. Many APIs also use int in places again making limits be 32 bits, although there are also a smattering of 64 bit APIs.
Changing this default requires knowing it is a 64 bit platform when the C preprocessor runs, and would surprise anyone who was ok with the 2GB value.
There are two downsides of mmap - I/O errors can't be caught and handled by SQLite code, and buggy stray writes by other code in the process could corrupt the database.
It is best practise to directly include the SQLite amalgamation into your own projects which allows you to control version updating, and configuration.
The general test suite is not proprietary, and is a standard part of the code. You can run make test. It uses TCL to run the testing, and covers virtually everything.
There is a separate TH3 test suite which is proprietary. It generates C code of the tests so you can run the testing in embedded and similar environments, as well as coverage of more obscure test cases.
This isn't an issue as SQLite doesn't accept contributions because they don't want to risk someone submitting proprietary code and lying about its origin.
I've never understood why other large open-source projects are just willing to accept contributions from anyone. What's the plan when someone copy-pastes code from some proprietary codebase and the rights holders finds it?
If the rights holder is particularly litigious then I could see them suing even if you agreed to take out their code under the argument that you've distributed it and profited from it. I don't know if there's been any cases of this historically but I'd be surprised if there hasn't been.
I'd love to see an analysis of byte ordering impact on CPU implementation. Does little vs big endian make any difference to the complexity of the algorithms and circuits?
For those not familiar with TCL, the C API is flavoured like main. Callbacks take a list of strings argv style and an argc count. TCL is stringly typed which sounds bad, but the data comes from strings in the HTML and script blocks, and the page HTML is also text, so it fits nicely and the C callbacks are easy to write.
[1] Mosaic Netscape 0.9 was released the week before
reply