All over the Internet people are having trouble getting
LOAD DATA and
LOAD DATA LOCAL to work. Frankly, do not use them, and especially not the
LOCAL variant. They are insecure, and even if you get them to work, they are limited and unlikely to do what you want. Write a small data load program as shown below.
Not using LOAD DATA LOCAL
The LOCAL version of LOAD DATA has two potential security issues:
Because LOAD DATA LOCAL is an SQL statement, parsing occurs on the server side, and transfer of the file from the client host to the server host is initiated by the MySQL server, which tells the client the file named in the statement. In theory, a patched server could tell the client program to transfer a file of the server’s choosing rather than the file named in the statement. Such a server could access any file on the client host to which the client user has read access. (A patched server could in fact reply with a file-transfer request to any statement, not just LOAD DATA LOCAL, so a more fundamental issue is that clients should not connect to untrusted servers.)
In a Web environment where the clients are connecting from a Web server, a user could use LOAD DATA LOCAL to read any files that the Web server process has read access to (assuming that a user could run any statement against the SQL server). In this environment, the client with respect to the MySQL server actually is the Web server, not a remote program being run by users who connect to the Web server.
The second issue in reality means that if the web server has a suitable SQL injection vulnerability, the attacker may use that to read any file the web server has access to, bouncing this through the database server.
In short, never use (or even enable)
LOAD DATA LOCAL.
local_infileis disabled in the server config, and you should keep it that way.
- client libraries are by default compiled with
ENABLED_LOCAL_INFILEset to off. It can still be enabled using a call to the
mysql_options()C-API, but never do that.
- 8.0.21+ places additional restrictions on this, to prevent you from being stupid (that is, actually enabling this anywhere).
Not using LOAD DATA
LOAD DATA variant of the command assumes that you place a file on the database server, into a directory in the file system of the server, and load it from there. In the age of “MySQL as a service” this is inconvenient to impossible, so forget about this option, too.
If you were able to do place files onto the system where your mysqld lives,
- your user needs to have
FILEas a privilege, a global privilege (
GRANT FILE TO ... ON *.*)
- the server variable
secure_file_privneeds to be set to a directory name, and that directory needs to be world-readable.
SELECT INTO OUTFILEwork only with filenames below this directory. Setting this variable requires a server restart, this is not a dynamic variable (on purpose).
Note that the variable can be
NULL (this is secure in the sense that
LOAD DATA is disabled) or empty (this is insecure in that there are no restrictions).
There is nothing preventing you from setting the variable to
/var/lib/mysql or other dumb locations which would expose vital system files to load and save operations. Do not do this.
Also, a location such as
/tmp or any other world-writeable directory would be dumb: Use a dedicated directory that is writeable by the import user only, and make sure that it is world-readable in order to make the command work.
Better: Do not use this command at all (and set
secure_file_priv to NULL).
Using data dump and load programs instead
We spoke about dumping a schema into CSV files in Export the entire database to CSV already.
To complete the discussion we need to provide a way to do the inverse and load data from a CSV file into a table.
The full code is in load.py.
The main idea is to open a
.csv file with
csv.reader, and then iterate over the rows. For each row, we execute an
INSERT statement, and every few rows we also
In terms of dependencies, we rely on
We need to know the name of a table, and the column names of that table (in the order in which they appear).
We should also make sure we can change the delimiter and quoting character used by the CSV, and make the commit interval variable.
Finally, we need to be able to connect to the database.
From this, we can build a database connection and an
INSERT statement, using the table name and column names:
The actual code is then rather simple: Open the CSV file, named after the table, and create a
csv.reader(). Using this, we iterate over the rows.
For each row, we execute the insert statement.
commit_interval rows we commit, and for good measure we also commit after finishing, to make sure any remaining rows also get written out.
And that it. That’s all the code.
- No special permissions besides
insert_privinto the target table.
- No config in the database.
- No server restart to set up the permissions.
And using Python’s multiprocessing, you could make it load multiple tables in parallel or chunk a very large table and load that in parallel - assuming you have database hardware that could profit from any of this.
In any case - this is simpler, more secure and less privileged than any of the broken
LOAD DATA variants.
Don’t use them, write a loader program.
Let’s run it. First we generate some data, using the previous example from the partitions tutorial:
We then dump the data, truncate the table, and reload the data. We count the rows to be sure we get all of them back.