Die wunderbare Welt von Isotopp
What are the problems with POSIX?
Every once in a while there is the IT news article that kind of triggers me. This time it was “Object-Storage-Protokoll könnte Posix ablösen” in german computer news site Golem . The article speaks about mmap(), NVMEoF and object storage and how it could revolutionize or complete object storages, but does not link to an original article, names no persons and no paper. Also, what do these things - mmap, NVMEoF, object storage and Posix, even have in common? It is not explained anywhere in the article.
MySQL: Import CSV, not using LOAD DATA
All over the Internet people are having trouble getting LOAD DATA and LOAD DATA LOCAL to work. Frankly, do not use them, and especially not the LOCAL variant. They are insecure, and even if you get them to work, they are limited and unlikely to do what you want. Write a small data load program as shown below.
Not using LOAD DATA LOCAL
The LOCAL version of LOAD DATA has two potential security issues:
Importing account statements and building a data warehouse
This is an update and translation of a much older article , which I originally wrote in German. Back then, I was experimenting with importing account statements from my German Sparkasse, which were available as CSV files.
The initial data load
The data appeared as follows:
$ head -2 /home/kris/Documents/banking/umsatz-22758031-29122004.csv
"Local Account";"Book Date";"Valuta Date";"Transaction Type";
"Purpose";
"Remote Party";"Remote Account";"Bank Code";
"Amount";"Currency";"Info"
"08154711";"30.12";"30.12.05";"Direct Debit";
"DRP 08154711 040441777 INKL. 16% UST 5.38 EUR";
"STRATO MEDIEN AG";"040441777";"10050000";
"-39,00";"EUR";"Direct Debit booked"
Because I want to know how I spend my money, I am loading the data into MySQL. Here is how: To understand my spending habits, I decided to load the data into MySQL. Here’s how:
MySQL: automatic partitions surely would be nice
In Deleting data we have been looking at a process that loads data into MySQL, leveraging partitions to make it easier and faster to later get rid of the data again. For this, we created three processes, a data loader process, and two observers - one for creating partitions, and one for deleting them.
The observer processes have been running ANALYZE TABLES and then polling INFORMATION_SCHEMA.PARTITIONS every 1/10th of a second to check if intervention is needed. They then have been dynamically generating the necessary ALTER TABLE statements maintaining the proper partitioning of the table by adding and dropping additional partitions.
MySQL: Deleting data
Completing the data lifecycle is often harder than originally expected: Deleting data can sometimes cost way more than inserting it in the first place. MySQL Partitions can offer a way out. We have an earlier post on the subject.
A sample table, and a problem statement
Let’s define a kind of log table, to which data is added with an auto_increment id value and some data.
#! /usr/bin/env python3
from time import sleep
from random import randint
from multiprocessing import Process
import click
import MySQLdb
import MySQLdb.cursors
db_config = dict(
host="localhost",
user="kris",
passwd="geheim",
db="kris",
cursorclass=MySQLdb.cursors.DictCursor,
)
@click.group(help="Load and delete data using partitions")
def sql():
pass
@sql.command()
def setup_tables():
sql_setup = [
"drop table if exists data",
""" create table data (
id integer not null primary key auto_increment,
d varchar(64) not null,
e varchar(64) not null
)""",
"alter table data partition by range (id) ( partition p1 values less than (10000))",
"insert into data (id, d, e) values ( 1, 'keks', 'keks' )",
"commit",
]
db = MySQLdb.connect(**db_config)
for cmd in sql_setup:
try:
c = db.cursor()
c.execute(cmd)
except MySQLdb.OperationalError as e:
click.echo(f"setup_tables: failed {e} with {cmd}.")
sql()
This is our basic Python framework for experimentation, using the click framework, and a command setup-tables. This command will run a number of SQL statements to initialize our log table named data.
MySQL: Provisioning .mylogin.cnf
MySQL uses connection and config parameters from a number of possible sources. The easiest way to find out where it is looking for config files is to run
$ mysql --help | grep cnf
order of preference, my.cnf, $MYSQL_TCP_PORT,
/etc/my.cnf /etc/mysql/my.cnf /Users/kkoehntopp/homebrew/etc/my.cnf ~/.my.cnf
As can be seen, my version of the MySQL client checks in this order
- /etc/my.cnf
- /etc/mysql/my.cnf
- /Users/kkoehntopp/homebrew/etc/my/cnf
- ~/.my.cnf
The cnf file is a file in dot-ini syntax, so you have [groups] and each group contains lines with key = value pairs. Which groups are read?
MySQL: ALTER TABLE for UUID
A question to the internal #DBA channel at work: »Is it possible to change a column type from BIGINT to VARCHAR ? Will the numbers be converted into a string version of the number or will be it a byte-wise transition that will screw the values?«
Asking yielded more information: »The use-case is to have strings, to have UUIDs.«
So we have two questions to answer:
- Is
ALTER TABLE t CHANGE COLUMN clossy? INTEGER AUTO_INCREMENTvs.UUID
Is ALTER TABLE t CHANGE COLUMN c lossy?
ALTER TABLE is not lossy. We can test.
MySQL: Encoding fields for great profit.
Iterating schemas over time is not an uncommon thing. Often requirements emerge only after you have data, and then directed action is possible. Consequently, working on existing data, and structuring and cleaning it up is a common task.
In today’s example we work with a log table that logged state transitions of things in freeform VARCHAR fields. After some time the log table grew quite sizeable, and the log strings are repeated rather often, contributing to the overall size of the table considerably.
Mein selbstfahrendes Auto hat Kameras, oh nein!
Bei der ARD ist man empört! Teslas Kameras: ARD rückt Datenschutzbedenken ins Licht . Will sagen, jemand hat älteren Personen gesteckt, daß ein selbstfahrendes Auto wenig überraschend Kameras braucht, um bei seiner bestimmungsgemäße Verwendung weniger ältere Personen überzumöllern.
So weit so wenig überraschend.
Bei einem Tesla sind es 9, bei anderen Automobilen anderer Hersteller noch mehr, und Radar, und Lidar und weitere Sensorsysteme. Problematisch sind nur Kameras, weil Menschen als visuelle Wesen zum Bild eine emotional andere Beziehung haben als zu anderen Daten, die objektiv nicht weniger kritisch sind. Aber Bilder, das ist Hexenwerk, das die Seele stiehlt, das muß man dringend streng kontrollieren.
Feeds tagged, and a MySQL feed
I have made changes to the RSS Feed of this blog:
- Each
<item/>does now contain a container<tags/>, inside a sequence of<tag/>containers, with each posts tags. - There is now a second RSS feed for posts tagged
#mysql, because of demand. You can find it at https://blog.koehntopp.info/feed_mysql.xml .
Example for the tags:
<channel>
<title>Die wunderbare Welt von Isotopp</title>
...
<item>
<title>A post title</title>
<link>https://blog.koehntopp.info/2020/09/...</link>
<guid isPermalink="false">/2020/09/...</guid>
<tags><tag>lang_en</tag><tag>mysql</tag><tag>database</tag></tags>
<description>blah blah blah</description>
</item>
</channel>
So if you want only the MySQL content, subscribe to this feed
, if you want all the content, subscribe to the original feed
. In both cases, use the <tags/> to filter further down.