Runaway complexity

Last week I had to work on a Django app again. Since Python is a very portable language that works on many different platforms, of course I’ve had to work on that in a Docker container, in a Linux VM in Qemu, on an arm64 Mac running macOS. Also because the official Docker for Desktop app is somewhat annoying, I’ve been giving Lima a try. Also because the standard Django development web server doesn’t offer the best debugging experience, I’ve been running an alternative server through django-extensions.

I’ve counted at least 8 distinct software vendors so far in that paragraph. When I’ve hit a bug that completely killed my productivity, it was far from obvious which one to look at. Let’s take a dive and see what happened.

Why so complex?

A development web server has three tasks. 1. When something goes wrong, show me more context to help me understand and fix the problem; 2. watch the source code for changes, and reload the application to speed up each iteration cycle; 3. otherwise, keep things as similar to the production environment as possible.

These three are absolutely crucial in maintaining productivity. If you don’t get enough context from a crash, you will be stumbling around in the dark. If you need to manually restart the application server, or worse yet, rebuild the Docker container, you will quickly lose focus, become annoyed or distracted. If your local environment diverges too far from how you run things in production, you will eventually hit production-only bugs. None of these things are desirable.

The bug

The runserver_plus command from django-extensions would keep detecting changes in files that did not come from my application. Here’s an excerpt from the logs:

* Detected change in '/usr/local/lib/python3.10/dist-packages/django/contrib/messages/storage/session.py', reloading
* Detected change in '/usr/local/lib/python3.10/dist-packages/django/contrib/messages/storage/base.py', reloading
* Detected change in '/usr/local/lib/python3.10/dist-packages/django/contrib/messages/utils.py', reloading
* Detected change in '/usr/local/lib/python3.10/dist-packages/django/contrib/messages/storage/cookie.py', reloading

This caused the server to remain stuck in a reload loop, which means every time I wanted to request a new page, it would take a few seconds for the server to start responding again, and then it would shortly go into another restart. Like a crash loop, but it does a little bit of work in between, so you can limp around for a while before you decide it’s too annoying and look for a solution.

The investigation

The exact set of files would differ, but what remained consistent, was that these were library files, which not only my text editor didn’t touch - they were on a separate volume on the VM, which was not shared with the host OS, where I was doing the editing.

However it took me a moment to connect the dots on that clue, so in the meantime I’ve been trying the following:

Notice that our list of vendors has grown by another four, some of which are now suspects as well.

The culprit

I don’t know why it took me so long to suspect that it was django-extensions that caused the problem; searching their bug tracker indeed has found issue 1805. Except the bug wasn’t in django-extensions; it was a bug in Werkzeug, which django-extensions uses directly for the reloader functionality. Werkzeug itself didn’t do anything wrong; issue came from the watchdog package, which changed the default behavior by including file open events in the notification stream. That explains it - the files that were triggering the reloads were not being changed, they were being opened.

So we’ve named twelve suspects… And the problem originated in the thirteenth, which was a transitive dependency of a transitive dependency of a “support goodies” package. Quite a game of Cluedo.

Conclusions?

As I said, I don’t know why I’ve first looked everywhere, except for the bug tracker of django-extensions. Perhaps because through the crazy mix of arm64 / x86-64, Mac / Linux / Windows, Docker / Compose / Swarm, Qemu / HVM, AWS / Hetzner / on-prem, Debian / Ubuntu / CentOS / macOS / OpenBSD, Terraform / Judo / NixOS, Python / Rust / Go / JS / TS / Swift / Kotlin, Django / Flask / Vue / Angular, Traefik / nginx / ELB, Postgres / sqlite, S3 / MediaStore (+CloudFront), and maybe a couple hundred smaller things that are too small to name, I’ve come to expect the issue to usually be at the boundaries?

If you stare at it long enough, you may notice Werkzeug’s fix is also incomplete, in that it does not secure their code against a similar change in watchdog in the future: should Watchdog choose to expose more kinds of inotify events, e.g. IN_ACCESS, Werkzeug still does not whitelist the narrow set that is actually relevant to their use case, so the code remains a ticking bomb for another change at the boundary.

My conclusion is that I would love to go back to programming on a Commodore 64. I was six years old, when the C64 marked the first step of my journey into both programming, and creating music. The machine is too simple and limited to afford the complexity that leads to these kinds of issues.

Of course I couldn’t use a C64 as my sole computing and development platform nowadays, probably unless I was already retired. But I’d like to write some software for it. To paraphrase ESR, I believe it is worth learning it for the experience, that may leave you a better programmer for the rest of your days, even if you never use the C64 a lot.

I don’t think we can throw away 40 years of progress in computing, but I do think that we can reflect upon this runaway complexity and start removing, simplifying things sometimes.