We are aware of an occasional issue over the past two days resulting in both staging and live sites being intermittently unavailable, caused by connections to central services failing (the database in particular).
Each instance of this occurring in our live environment has been picked up by our automated monitoring and alerting systems, and has been manually resolved. We have also been monitoring staging manually to pick this up quicker, and instances have also been manually resolved.
Our investigation into this issue has determined it is caused by a security update distributed by the maintainers of the operating system several days ago, which has had unintended knock on effects both to ourselves and others. More information about the specific bug is available at https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1674733
We are looking into a temporary fix for this while the upstream package maintainers test and issue a fix, and we are also looking into more automated monitoring for our staging environments.
Mar 22, 14:55 GMT