Die wunderbare Welt von Isotopp

Systemd Service and stdio

Kristian Köhntopp - November 28, 2022
After yesterday’s article, Arne Blankerts pointed me at a note showing how to install a program using stdio with systemd. Code and Unit files The code: #! /usr/bin/env python3 import sys if __name__ == "__main__": while True: line = input().strip() print(f"ECHO: {line}") if line == "QUIT": sys.exit(0) The Socket Unit: $ systemctl --user cat kris2.socket # /home/kris/.config/systemd/user/kris2.socket [Unit] Description=My second service PartOf=kris2.service [Socket] ListenStream=127.0.0.1:12346 Accept=Yes [Install] WantedBy=sockets.target And the Service Unit, which has to be a template:

Systemd Service and Socket Activation

Kristian Köhntopp - November 27, 2022
In today’s Yak Shaving session I needed to understand how to expose the docker socket of a remote machine over the network. You should not do that, it is totally insecure, but I needed to do that to test something. Socket Activation I discovered that dockerd is running with -H fd://. # ps axuwww | grep docker[d] root 1616732 0.5 0.1 2930892 52168 ? Ssl 15:32 2:25 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.

ETL from a Django Model

Kristian Köhntopp - November 20, 2022
Continued from last weeks article on data warehouses. At work, I was tasked with building a capacity model for data center growth. The basic assumption of these things is often that the future behaves similarly to the past, so the future predicted capacity model is somehow an extension of past growth. I needed old server usage data, and was indeed able to find that in one of our systems, called ServerDB.

Of Stars and Snowflakes

Kristian Köhntopp - November 16, 2022
A sample system When you have an Online Transactional Database, you have to record transactions at some point in time. That means you get a table with time dimension in your OLTP system. Consider for example a system that records Reservations. Users exist and can reserve Things to use, for a day. You probably get a structure such as this: In an OLTP database, a reservation is a (resid, userid, thingid, date).

Databases on un-RAID-ed storage?

Kristian Köhntopp - November 9, 2022
Where I work, we run bare-metal databases on non-redundant local storage. That is, a database is a very cheap frontend blade server. It has 2 CPUs, with 8 cores/16 threads each. It contains 128 GB of memory, 2 or 4 TB of local NVME and it has a 10 GBit/s network interface. It costs around 120 to 150 Euro per month to run for 5 years, including purchase price and all datacenter costs.

Bandwidth, IOPS and Latency

Kristian Köhntopp - November 7, 2022
A harddisk from 1998. The opening image for this post shows the stock photo of a hard disk platter. You can see a movable arm that can ride in and out of a stack of rotating platters coated with some kind of metal oxide. We sometimes call this kind of storage condescendingly “rotating rust”, when in reality it is a triumph of material science. Moving an arm costs time, and bringing that arm into the right position and then waiting until the right segment of disk rotates underneath it so that we can write things to disk takes time.

Proper O11y for MySQL

Kristian Köhntopp - October 25, 2022
Three years ago, I learned that due to SREcon, Charity Majors was in Amsterdam. I set up a meeting between Benjamin Tyler, Yves Orton and a few more colleagues of mine, and her. That is, because apparently in a case of co-evolution, our company internal “Events” system and Honeycombs observability tooling, modelled after experiencing Fabooks “Scuba” seemed to be doing a lot of the same things. These days, we are using Honeycomb a lot to record events, and debug code running in distributed systems.

Software Supply Chain Issues

Kristian Köhntopp - October 18, 2022
The GitHub Security Lab has a long hard look at “Apache Commons Text” in March this year. That resulted in CVE-2022-42889 . The exploit goes like this: final StringSubstitutor interpolator = StringSubstitutor.createInterpolator(); String out = interpolator.replace("${script:javascript:java.lang.Runtime.getRuntime().exec('touch /tmp/foo')}"); System.out.println(out); Next to ${script:...} there are apparently also a ${url:...} and `${dns} as other unsuitable substitutions, and they nest. This was fixed in October 2022, after being reminded by GHSL in May and August.

Groups and Places

Kristian Köhntopp - October 12, 2022
In a distributed, asynchronous environment, there is a need for distributed, asynchronous interaction. This interaction is often written, but “writing” these days is actually a media-rich process that includes much more than letters. It also needs to be able to build some structure, and some gateway to level up to more synchronous and even richer communication. Let’s have a chat about chats, and what properties they have. Historically, chat was lines of text, without much structure.

Pan Narrans and Better Meetings

Kristian Köhntopp - October 10, 2022
When you are looking for a better Remote First culture, you are looking for better meetings. If you go for better meetings, you will also have fewer of them. “The anthropologists got it wrong when they named our species Homo sapiens (‘wise man’). In any case it’s an arrogant and bigheaded thing to say, wisdom being one of our least evident features. In reality, we are Pan narrans, the storytelling chimpanzee.

MySQL: Local and distributed storage

Kristian Köhntopp - September 27, 2022
Where I work, we are using MySQL in a scale-out configuration to handle our database needs. That means, you write to a primary server, but reads generally go to a replica database further down in a replication tree. A number of additional requirements that should not concern you as a developer make it a little bit more elaborate than a simple “primary and a number of replicas” configuration. But the gist of all that is:

MySQL: Data for Testing

Kristian Köhntopp - September 26, 2022
Where I work, there is an ongoing discussion about test data generation. At the moment we do not replace or change any production data for use in the test environments, and we don’t generate test data. That is safe and legal, because production data is tokenized. That is, PII and PCI data is being replaced by placeholder tokens which can be used by applications to access the actual protected data through specially protected access services.

MySQL: Sometimes it is not the database

Kristian Köhntopp - September 19, 2022
Query latencies in one data center are larger than elsewhere for one replication hierarchy, but only in the high percentiles. This impacts production and traffic is being failed away from that data center to protect production. When the P50 and P90 look okay, but the P99 and the P99.9 do not, the database(s) operate normally, and only some queries are running slow. The initial guess was “for some queries the plan has flipped, but only in that data center.

MySQL: Artifactory Conclusion

Kristian Köhntopp - September 14, 2022
Two weeks ago I was being drawn into the debug of artdb, the Replication hierarchy used by our Artifactory instance. TL;DR Artifactory overloaded the database. This was incident-handled by optimizing a number of slow queries using some covering index trickery, and by upgrading the hardware substantially. Using the runway we bought, we found and partially fixed the following problems: Fixed: A number of very expensive reporting queries were sped up 16x to 20x using covering indexes, from 180s runtime to 8s-12s runtime.

MySQL: Straight lines

Kristian Köhntopp - September 8, 2022
A database is showing replication delay, and so are all the other instances of the same replication hierarchy, all of which reside in Openstack. Shortly before 21:30 the database begins to lag, until around 23:45, when it starts to catch up, slowly. After 00:30, we gain delay again, plateau and then around 01:45, we catch up. The database is moving deep into replication delay sometimes. It does not do that on bare metal.

MySQL: YOLO mode

Kristian Köhntopp - September 2, 2022
OH: “And now let’s quickly push 2 billion rows into this database VM.” That is best done in YOLO mode. This is a mode of operation for a database that minimizes disk writes in favor of batched bulk writes. It is not ACID, so if anything goes wrong during the load, the instance is lost. That is why it is called YOLO mode. You are supposed to do this on a spare replica and not the production primary.

MySQL: Boiling JFrogs

Kristian Köhntopp - August 25, 2022
A work problem: A commercial application, Artifactory, where we do not control the source or the schema has performance problems involving a certain long running query. The data size and row counts are not outrageous, and the query itself and the schema are not broken. But the data is very skewed and for certain values the query is very slow, as almost the entire table is selected. We introduce an experimental covering index, and show a 16x improvement, going from 143s to 9s execution time.

MySQL: GIPK (InnoDB and Primary Keys)

Kristian Köhntopp - August 23, 2022
The MySQL InnoDB storage engine lives off primary keys: Records are stored physically in a B+-Tree, a balanced tree where the data pages are the leaves of the primary key B-Tree. That implies that the data is stored physically in primary key order – records with similar primary keys are usually physically closer together than records with larger differences in primary key value. In literature, such physically ordered indexes are often called clustered indexes.

MySQL: Row Literals

Kristian Köhntopp - August 22, 2022
Question on the Libera/#mysql IRC channel: Is there a way to split a simple select into multiple returned rows? For example, select 1, 2, 3 to be returned as rows? This is actually asking for a table literal notation. I know of four ways to construct a table literal in MySQL: UNION ALL The oldest way to construct a table literal in any SQL that supports UNION is the UNION ALL construct.

Good Energy News

Kristian Köhntopp - August 15, 2022
Two good news in one day. First, my energy provider tells me: Levert u meer terug dan u van het net afneemt, dan betalen we u vanaf die datum 16.8 eurocent per kWh. De oude vergoeding was 7 eurocent per kWh. “If you push energy into the grid, we now pay you 16.8 Eurocent per kWh. The old payment was 7.0 Eurocent per kWh.” The other good news is that it is very likely that the existing “Salderingsregeling” may be continued until 2025.