Skip to content

Category: Computer Science

Rolling out patches and changes, often and fast

Fefe had a short pointer to an article Patching is Hard. It is, but you can make it a lot easier by doing a few things right.  I did s small writeup (in German) to explain this, which Fefe posted.

I do have an older talk on this, titled “8 rollouts a day” (more like 30 these days). There are slides and a recording. The Devops talk “Go away or I will replace you with a little shell script” addresses it, too, but from a different angle (slides, recording).

Here is the english version of the writeup:


Cloud Costs

Cloud cost models are sometimes weird, and billing sometimes is not quite transparent. The cost model can also change at will.

The Medium story reported by Home Automation is an extreme example, and contains a non-trivial amount of naiveté on their side, but underlines the importance of being spread through more than one cloud provider and having an exit strategy. Which is kind of a dud, if you are using more than simple IaaS – if you tie yourself to a database-as-a-service offer, you can’t really have an exit strategy at all.


TL;DR: Firebase accidentally wasn’t billing some traffic, and fixed that (the billing). They did not communicate the change, they did not update their status panels to report the increased traffic, and they did not measure the billing impact of their change to find extreme cases before the change and contact them.

The customer, Home Automation, has close to zero clue to using TLS correctly, was using connection inefficiently and kind of maximised overhead, ran into the worst case scenario for the change, got fucked. They would want out, but also had zero strategy for that, because DBaaS fuckup.

In the cloud you don’t need operations. Until you do.


Handling Wannacrypt – a few words about technical debt

So Microsoft had a bug in their systems. Many of their sytems. For many years. That happens. People write code. These people write bugs

Microsoft over the years has become decently good with fixing bugs and rolling out upgrades, quickly. That’s apparently important, because we all are not good enough at not writing bugs. So if we cannot prevent them, we need to be able to fix them and then bring these fixes to the people. All of them.

The NSA found a bug. They called it ETERNALBLUE and they have been using it for many years to compromise systems.

In order to be able to continue doing that they kept the bug secret. That did not work. The bug is now MS17-010 or a whole list of CVE-entries.

The NSA told MS about the bug when they learned that it had leaked, but not before. Microsoft patched the bug in March 2017, even for systems as old as Windows XP (which lost all support in 2014), but many people did not install the patch.

The result is “the largest cyberattack in the world”.


RTT-based vs. drop based congestion management

APNIC discusses different TCP congestion control algorithms, coming from Reno, going through CUBIC and Vegas, then introducing BBR (seems to be a variation on CoDel) and what they observed when running BBR in a network with other implementations.

TCP congestion control algorithms try to estimate the bandwidth limit of a multi-segment network path, where a stream crosses many routers. Each segment may have a different available capacity. Overloading the total path (that is, the thinnest subsegment of the path) will force packet drops by overloading the buffers of the router just in front of that thin segment. That in turn requires retransmits, which is inefficient and has nasty delays.

To make matters more complicated, the Internet is a dynamic environment and conditions can change during the lifetime of a connection.