On December 8, several of our services experienced a brief period of instability due to an issue originating from our database provider. The incident began at 11:42 UTC and services recovered by 11:51 UTC.
What happened
Our vendor identified the root cause as a failure in their backend node replacement service. This failure caused delays in deploying new nodes, which led to performance issues and affected tasks related to DNS updates, including node replacements.
The impact was limited to databases whose nodes had recently been replaced or newly created, for example after forking or scheduled maintenance.
Resolution
The vendor has identified and fixed the underlying issue in their node replacement service. Once the affected database recovered, all Datacake services returned to normal operation and have remained stable since.
Next steps
We are continuing to work with the vendor to ensure the issue is fully understood and that safeguards are in place to prevent a recurrence.