Die wunderbare Welt von Isotopp

Change Data Capture

Kristian Köhntopp - December 5, 2022
Change Data Capture is a way to capture, well, events from a system that describe how the data in the system changed. For a system that does business transactions that may be at the lowest level Create, Update, or Delete of entities or relationships. Systems that emit this kind of events are called Entity Services and are kind of the lowest level of events that you can have in such a system.

USENET und Tiernetze

Kristian Köhntopp - December 2, 2022
Vor ziemlich genau 30 Jahren gab es in Deutschland die Anfangsgründe des Internet , aber es gab auch Netze, die auf anderer, viel älterer Technologie betrieben wurden – die Mailboxnetze. Das sind dezentrale Netze, bei denen denen lokale Rechner mit Modems ausgestattet wurden, bei denen man anrufen und dann Nachrichten a la Mastodon online lesen konnte. Oder man hatte Software daheim, die bei der Mailbox anrief, die Nachrichten heruntergeladen hat. Dann konnte man offline lesen, Antworten schreiben und ein zweites Mal anrufen.

Systemd and docker -H fd://

Kristian Köhntopp - November 28, 2022
Based on what I learned in Systemd Service and Socket Activation and Systemd Service and stdio , we can now have a look at Docker. The code for -H fd://-Handling is here . The file descriptors are coming from activation.Listeners(), and are in the listeners slice. In our case, the part after the fd:// is empty, so lines 83-85 are activated, and the incoming fd’s are passed to the Docker proper.

Systemd Service and stdio

Kristian Köhntopp - November 28, 2022
After yesterday’s article, Arne Blankerts pointed me at a note showing how to install a program using stdio with systemd. Code and Unit files The code: #! /usr/bin/env python3 import sys if __name__ == "__main__": while True: line = input().strip() print(f"ECHO: {line}") if line == "QUIT": sys.exit(0) The Socket Unit: $ systemctl --user cat kris2.socket # /home/kris/.config/systemd/user/kris2.socket [Unit] Description=My second service PartOf=kris2.service [Socket] ListenStream=127.0.0.1:12346 Accept=Yes [Install] WantedBy=sockets.target And the Service Unit, which has to be a template:

Systemd Service and Socket Activation

Kristian Köhntopp - November 27, 2022
In today’s Yak Shaving session I needed to understand how to expose the docker socket of a remote machine over the network. You should not do that, it is totally insecure, but I needed to do that to test something. Socket Activation I discovered that dockerd is running with -H fd://. # ps axuwww | grep docker[d] root 1616732 0.5 0.1 2930892 52168 ? Ssl 15:32 2:25 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.

ETL from a Django Model

Kristian Köhntopp - November 20, 2022
Continued from last weeks article on data warehouses. At work, I was tasked with building a capacity model for data center growth. The basic assumption of these things is often that the future behaves similarly to the past, so the future predicted capacity model is somehow an extension of past growth. I needed old server usage data, and was indeed able to find that in one of our systems, called ServerDB.

Of Stars and Snowflakes

Kristian Köhntopp - November 16, 2022
A sample system When you have an Online Transactional Database, you have to record transactions at some point in time. That means you get a table with time dimension in your OLTP system. Consider for example a system that records Reservations. Users exist and can reserve Things to use, for a day. You probably get a structure such as this: In an OLTP database, a reservation is a (resid, userid, thingid, date).

Databases on un-RAID-ed storage?

Kristian Köhntopp - November 9, 2022
Where I work, we run bare-metal databases on non-redundant local storage. That is, a database is a very cheap frontend blade server. It has 2 CPUs, with 8 cores/16 threads each. It contains 128 GB of memory, 2 or 4 TB of local NVME and it has a 10 GBit/s network interface. It costs around 120 to 150 Euro per month to run for 5 years, including purchase price and all datacenter costs.

Bandwidth, IOPS and Latency

Kristian Köhntopp - November 7, 2022
A harddisk from 1998. The opening image for this post shows the stock photo of a hard disk platter. You can see a movable arm that can ride in and out of a stack of rotating platters coated with some kind of metal oxide. We sometimes call this kind of storage condescendingly “rotating rust”, when in reality it is a triumph of material science. Moving an arm costs time, and bringing that arm into the right position and then waiting until the right segment of disk rotates underneath it so that we can write things to disk takes time.

Proper O11y for MySQL

Kristian Köhntopp - October 25, 2022
Three years ago, I learned that due to SREcon, Charity Majors was in Amsterdam. I set up a meeting between Benjamin Tyler, Yves Orton and a few more colleagues of mine, and her. That is, because apparently in a case of co-evolution, our company internal “Events” system and Honeycombs observability tooling, modelled after experiencing Fabooks “Scuba” seemed to be doing a lot of the same things. These days, we are using Honeycomb a lot to record events, and debug code running in distributed systems.

Software Supply Chain Issues

Kristian Köhntopp - October 18, 2022
The GitHub Security Lab has a long hard look at “Apache Commons Text” in March this year. That resulted in CVE-2022-42889 . The exploit goes like this: final StringSubstitutor interpolator = StringSubstitutor.createInterpolator(); String out = interpolator.replace("${script:javascript:java.lang.Runtime.getRuntime().exec('touch /tmp/foo')}"); System.out.println(out); Next to ${script:...} there are apparently also a ${url:...} and `${dns} as other unsuitable substitutions, and they nest. This was fixed in October 2022, after being reminded by GHSL in May and August.

Groups and Places

Kristian Köhntopp - October 12, 2022
In a distributed, asynchronous environment, there is a need for distributed, asynchronous interaction. This interaction is often written, but “writing” these days is actually a media-rich process that includes much more than letters. It also needs to be able to build some structure, and some gateway to level up to more synchronous and even richer communication. Let’s have a chat about chats, and what properties they have. Historically, chat was lines of text, without much structure.

Pan Narrans and Better Meetings

Kristian Köhntopp - October 10, 2022
When you are looking for a better Remote First culture, you are looking for better meetings. If you go for better meetings, you will also have fewer of them. “The anthropologists got it wrong when they named our species Homo sapiens (‘wise man’). In any case it’s an arrogant and bigheaded thing to say, wisdom being one of our least evident features. In reality, we are Pan narrans, the storytelling chimpanzee.

MySQL: Local and distributed storage

Kristian Köhntopp - September 27, 2022
Where I work, we are using MySQL in a scale-out configuration to handle our database needs. That means, you write to a primary server, but reads generally go to a replica database further down in a replication tree. A number of additional requirements that should not concern you as a developer make it a little bit more elaborate than a simple “primary and a number of replicas” configuration. But the gist of all that is:

MySQL: Data for Testing

Kristian Köhntopp - September 26, 2022
Where I work, there is an ongoing discussion about test data generation. At the moment we do not replace or change any production data for use in the test environments, and we don’t generate test data. That is safe and legal, because production data is tokenized. That is, PII and PCI data is being replaced by placeholder tokens which can be used by applications to access the actual protected data through specially protected access services.

MySQL: Sometimes it is not the database

Kristian Köhntopp - September 19, 2022
Query latencies in one data center are larger than elsewhere for one replication hierarchy, but only in the high percentiles. This impacts production and traffic is being failed away from that data center to protect production. When the P50 and P90 look okay, but the P99 and the P99.9 do not, the database(s) operate normally, and only some queries are running slow. The initial guess was “for some queries the plan has flipped, but only in that data center.

MySQL: Artifactory Conclusion

Kristian Köhntopp - September 14, 2022
Two weeks ago I was being drawn into the debug of artdb, the Replication hierarchy used by our Artifactory instance. TL;DR Artifactory overloaded the database. This was incident-handled by optimizing a number of slow queries using some covering index trickery, and by upgrading the hardware substantially. Using the runway we bought, we found and partially fixed the following problems: Fixed: A number of very expensive reporting queries were sped up 16x to 20x using covering indexes, from 180s runtime to 8s-12s runtime.

MySQL: Straight lines

Kristian Köhntopp - September 8, 2022
A database is showing replication delay, and so are all the other instances of the same replication hierarchy, all of which reside in Openstack. Shortly before 21:30 the database begins to lag, until around 23:45, when it starts to catch up, slowly. After 00:30, we gain delay again, plateau and then around 01:45, we catch up. The database is moving deep into replication delay sometimes. It does not do that on bare metal.

MySQL: YOLO mode

Kristian Köhntopp - September 2, 2022
OH: “And now let’s quickly push 2 billion rows into this database VM.” That is best done in YOLO mode. This is a mode of operation for a database that minimizes disk writes in favor of batched bulk writes. It is not ACID, so if anything goes wrong during the load, the instance is lost. That is why it is called YOLO mode. You are supposed to do this on a spare replica and not the production primary.

MySQL: Boiling JFrogs

Kristian Köhntopp - August 25, 2022
A work problem: A commercial application, Artifactory, where we do not control the source or the schema has performance problems involving a certain long running query. The data size and row counts are not outrageous, and the query itself and the schema are not broken. But the data is very skewed and for certain values the query is very slow, as almost the entire table is selected. We introduce an experimental covering index, and show a 16x improvement, going from 143s to 9s execution time.