uWSGI is a flexible application container that we use extensively at ticketea. It is very popular in the Python/WSGI ecosystem. It normally sits between your application and a webserver or reverse proxy such as NGINX.
One of its most interesting (and perhaps controversial?) design choices is the way uWSGI manages preforking, which is the subject of this post.
What is preforking?
In the context of a webserver or an application container, preforking means that a webserver will spawn a certain number of processes, and each of them will be handling incoming requests.
Sometimes, these processes are also called "worker processes", or simply "workers".
On Unix-like operating systems, the
fork() system call is used to create these processes and have them inherit a copy of the parent's address space.
Forking and copy-on-write
When a parent process calls
fork(), operating systems don't actually copy the parent process memory pages. Instead, they implement a strategy called "copy-on-write", which means that only a few data structures will be created for the forked processes. In particular, the heap pages will not be copied. They'll point to the ones of the parent until written.
uWSGI, forking and copy-on-write
When using multiple processes with uWSGI (e.g. with the
--processes parameter), uWSGI will instantiate your application in its first process, and will
fork() multiple times until the desired number of workers is reached.
This will create several copies of the first process. Each of them will be a fully instantiated web application ready to serve connections. Given the way
fork() and copy-on-write work, spawning these processes will be quick and memory efficient.
This is also explained in the uWSGI's documentation.
Is this always safe?
Most of the time, it is. But there are corner cases, particularly if you make explicit use of threads in your code.
Let's look at an example. At ticketea, we're trying to rely on managed services as much as possible. For this reason, we're migrating to Google Cloud Stackdriver logging using its python logging library.
This library creates a background thread to queue up logging calls before flushing them periodically to the stackdriver logging service. This is a good thing, because it eliminates the round-trip latency that a naive/synchronous approach would've caused for each logging call.
The background thread is created when the application starts in the first uWSGI process, and (by default) forking happens afterwards. Forking only creates a copy of the main thread (the background thread isn't taken along), which will effectively prevent buffers from being flushed.
uWSGI offers a way to execute fixup actions after
These fixup actions can be done using the
postfork() decorator. Using this decorator, we could make sure that each worker is backed by its own background thread.
Alternative 2: Lazy Apps
uWSGI offers a way to instantiate your application after the
fork() call, once for each worker.
This is called lazy apps mode. Using lazy apps, we can make sure that each process starts-up independently, ensuring more isolation and predictability. In the specific use case that we mentioned before, this will ensure that each process will start its own background thread.
Altough the lazy apps mode is safer than the default fork-after-instantiation, it comes with two drawbacks:
Start-up times will be (a bit) slower. Loading the application n times is slower than forking an application n times, thanks to copy-on-write. Usually, this is not a big deal.
It will use more memory. Each worker will be a completely separate and distinct process from the OS point of view. Less memory will be shared between them.
The difference in physical memory consumption (the
RES column in top) depends on the size of the application. For a small hello world application written in Django, the difference is about +15%. For larger applications, the difference will be higher.