MySQL: GIPK (InnoDB and Primary Keys)

isotopp image Kristian Köhntopp -
August 23, 2022
a featured image
Previous Post
MySQL: Row Literals

The MySQL InnoDB storage engine lives off primary keys: Records are stored physically in a B+-Tree, a balanced tree where the data pages are the leaves of the primary key B-Tree.

That implies that the data is stored physically in primary key order – records with similar primary keys are usually physically closer together than records with larger differences in primary key value. In literature, such physically ordered indexes are often called clustered indexes.

InnoDB depends on primary keys

Secondary indexes in databases need to store the indexed data together with a row address, a pointer to the full row (in InnoDB, into the primary key).

If that row address was a physical address (a disk block number or a similar pointer tied to the position of the data), growing data pages and moving them around would often also make it necessary to update all secondary indexes. For example when a page is split or records move around for other reasons, the physical position of a page changes and all references to that page in SKs would also need an update.

By using the primary key value as a pointer to the full row, InnoDB achieves several things, good and bad:

  • The primary key is added as an (invisible) suffix to every SK. If you have a large PK, you are adding a lot of bytes to every SK. Keep PK values short.
  • The optimizer can leverage the PK at the end of the SK. Some queries are unexpectedly “covering”: SELECT id FROM t WHERE a = 10 is using index for a table t that has an INDEX(a), because internally it is INDEX(a, id).
  • Secondary Index lookups are taking exactly 2x as much time as a primary key lookup, because we do a lookup in the SK, and then using the PK value found, we dive into the PK.
  • Changes to the primary key, for example inserts into the middle of the table, are localized: Page splits happen internally in the primary key, but are not exposed to any SK. So while lookups to an SK cost more, SKs need a lot fewer updates.
  • Changes to any primary key value (UPDATE t SET id = -id WHERE id = 10) are fantastically expensive: The record is physically moved around in the PK, and all SK need changes, too. This can trigger a cascade of page splits and merges. Treat PK values as immutable, non-reusable tokens.

Because InnoDB has this clustered index as the PK, it is completely dependent on the primary key being present.

If you define a table without a primary key, InnoDB tries to secretly promote any other index that is suitable to primary key. Since the primary key is a non-nullable unique index that happens to be named PRIMARY in InnoDB, any other non-nullable unique index would be suitable.

If even that is absent, InnoDB starts to make up a primary key. For that, a global counter exists.

Which is shared between all tables that have no primary key. This counter is protected by a mutex, and if you have insert heavy tables without a primary key, you can end up having contention of this mutex.

Apparently this is common knowledge since 2013, as documented by Jeremy Cole and by Percona , but I did not know this (that the counter is globally shared).

Replication depends on Primary Keys

Other important subsystems also depend on primary keys: For example, all modern replication absolutely requires primary keys to function.

In Row Based Replication, row changes are communicated to the replica as row change events. These contain at minimum the primary key of the row change (or the entire old row, if there is no primary key), and the new value of all changed columns.

Of course, the replica must apply this. In the absence of any primary key, the row to change in the replica is found using a full table scan. And that does not scale for obvious reasons.

Why, do you ask, is that a problem, if InnoDB always has to have a primary key?

Because this autogenerated, internal, global primary key is not exposed in the binlog, and is local to an instance. The replica does not see it, and if it saw it, could not use it.

Worse, Group Replication will simply not work without primary key at all.

So make primary keys mandatory

Since 8.0.13 we can make an explicit primary key mandatory in MySQL. For that, we need to set sql_require_primary_key . If that is set, any table creation without a primary key present is rejected.

mysql> set sql_require_primary_key = on;
Query OK, 0 rows affected (0.00 sec)

mysql> create table rejected ( id integer, d varchar(200));
ERROR 3750 (HY000): Unable to create or change a table without a primary key,
when the system variable 'sql_require_primary_key' is set. Add a primary key 
to the table or unset this variable to avoid this message. Note that tables 
without a primary key can cause performance problems in row-based replication, 
so please consult your DBA before changing this setting.

Also, fix existing PK-less tables ex post

Adding a missing primary key is easy. Simple add it.

mysql> alter table t add id integer not null primary key auto_increment first;
...

But of course, this can break badly written SQL: Anything that does a SELECT *, or otherwise depends on column count, position or order will break. We would need a mechanism to add a primary key column that is not there.

Since 8.0.23, we can have invisible columns . They allow us to add an auto_increment to an existing table without primary key, without breaking badly written SQL.

mysql> show create table scooby\G
       Table: scooby
Create Table: CREATE TABLE `scooby` (
  `id` int DEFAULT NULL,
  `d` varchar(200) DEFAULT NULL,
  `scooby` int NOT NULL AUTO_INCREMENT /*!80023 INVISIBLE */,
  PRIMARY KEY (`scooby`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)

mysql> insert into scooby (id, d) values (1, "keks");
Query OK, 1 row affected (0.01 sec)

mysql> select * from scooby;
+------+------+
| id   | d    |
+------+------+
|    1 | keks |
+------+------+
1 row in set (0.00 sec)

mysql> select *, scooby from scooby;
+------+------+--------+
| id   | d    | scooby |
+------+------+--------+
|    1 | keks |      1 |
+------+------+--------+
1 row in set (0.00 sec)

We define a table scooby with a hidden primary key. When we insert data int this table, this just works normally, and auto_increment supplies the primary key value automatically.

When we look at the data using select *, the hidden column is not shown. When we ask for it explicitly, it is there.

Automating this: GIPK

Beginning with 8.0.30, we get automation for this: We could Generate Invisible Primary Keys , or GIPK.

That is of course a fix for broken designs. You should never need this for your own schemas. Simply go back to the source and fix that instead.

But if you happen to run JAMF or other software that you do not control and that insists on PK-less tables, you can now have it autofixed.

When you set sql_generate_invisible_primary_key , and try to create a table without a primary key, a my_row_id bigint unsigned not null auto_increment invisible primary key is added for you to the table.

mysql> show variables like '%primary%';
+------------------------------------+-------+
| Variable_name                      | Value |
+------------------------------------+-------+
| sql_generate_invisible_primary_key | OFF   |
| sql_require_primary_key            | OFF   |
+------------------------------------+-------+
2 rows in set (0.01 sec)

mysql> set sql_generate_invisible_primary_key = on;
Query OK, 0 rows affected (0.00 sec)

mysql> create table gipk_demo (id integer, d varchar(200));
Query OK, 0 rows affected (0.10 sec)

mysql> show create table gipk_demo\G
       Table: gipk_demo
Create Table: CREATE TABLE `gipk_demo` (
  `my_row_id` bigint unsigned NOT NULL AUTO_INCREMENT /*!80023 INVISIBLE */,
  `id` int DEFAULT NULL,
  `d` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`my_row_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

Bad naming choices, but good protection

The naming of that primary is bad, of course, because MySQL here takes arbitrary names out of a public namespace.

It would be much better if MySQL reserved the prefix __ (underbar underbar) in all namespaces for itself, and for example generated __id here.

We can try if they are smart about this, though:

mysql> set sql_generate_invisible_primary_key = on;
Query OK, 0 rows affected (0.00 sec)

mysql> create table breakit (
    ->   id integer, 
    ->   d varchar(200), 
    ->   my_row_id varchar(200)
    -> );
ERROR 4108 (HY000): Failed to generate invisible primary key.
Column 'my_row_id' already exists.

Nicely done. Next level:

mysql> set sql_generate_invisible_primary_key = off;
Query OK, 0 rows affected (0.00 sec)

mysql> create table breakit (
    ->   id integer primary key, 
    ->   d varchar(200)
    -> );
Query OK, 0 rows affected (0.12 sec)

mysql> alter table breakit add column my_row_id varchar(200);
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> set sql_generate_invisible_primary_key = on;
Query OK, 0 rows affected (0.00 sec)

mysql> alter table breakit drop column id;
ERROR 1235 (42000): This version of MySQL doesn't yet support 
'existing primary key drop without adding a new primary key. In 
@@sql_generate_invisible_primary_key=ON mode table should have a primary 
key. Please add a new primary key to be able to drop existing primary key.'

So that needs to be a

mysql> alter table breakit 
    ->   drop column id, 
    ->   add column keks int not null primary key;

to succeed.

Hiding the hidden columns even more

While invisible columns are invisible in select *, then are shown in show create table and also in I_S.

The self-documentingly named variable show_gipk_in_create_table_and_information_schema controls this.

When set to off, invisible columns are also no longer shown there. That should be even less necessary, and only very broken ORMs should need this.

Good primary keys

You should never need any of this. You should define an explicit primary key for all your tables. If they are missing and you control application source and schema, you can fix this at the source instead of using this kludge. Go back and fix things properly instead.

Every Table in MySQL Always Must Have a Primary Key When you are creating or altering a table in MySQL, make sure you have a primary key defined. We also advise you to never change a primary key value once it is set, and to never re-use a primary key value once a record is deleted. We advise you to keep the primary key short, below 16 or 32 bytes in size.

See the SQL Engineering Guide for a discussion of this topic.

  • A good primary key is short, less than 32 bytes.
  • A good primary key is immutable and never re-used.
  • A good primary key is an auto_increment value, or a UUIDv1 that is stored in a BINARY(16) that is being accessed with UUID_TO_BIN(...,1) (the swap flag is set to true).
    • A Java generated UUIDv4 is a random number and has horrible performance properties for InnoDB.
    • The topic of UUID is covered at length in MySQL and UUIDs and in ALTER TABLE for UUID .
  • auto_increment is not slow. MySQL keeps an auto_increment counter per table, and assigns values to each thread that inserts data in batches to limit contention on the counter. This is automatic and highly efficient.
    • That has the side-effect that there may be gaps in the auto_increment sequence, and values are not always assigned in a strict temporal order. The manual has a discussion of this.

TL;DR

MySQL 8.0.30 adds GIPK, generated invisible primary keys. They should only ever be needed if you have external software running where you cannot control and fix the schema, but they would allow you to run such software with RBR or GR without cooperation from the vendor.

GIPK is correctly implemented with respect to name clashes and dropping required primary keys. The column naming choice has the potential to clash with existing names and sits right there in the public namespace, which is a bit jarring.

This is a feature that is never needed unless you absolutely need it to get something broken running in a modern MySQL environment.

Share
Previous Post
MySQL: Row Literals