Skip to content

The difference between knowledge and experience

Tweet via Turbo Sandzwerg

I have a personal story that goes very well with this tweet:

Many years ago, I was doing database scalability at a company I have been working for. A new DBA needed to reload the a central MySQL master server in order for some config changes to /etc/my.cnf to go life.

Back then it could happen that when you “service mysql stop” something, it could go fast or it could take rather long. You would not know beforehand. Starting and stopping the server kindly would create an outage of unpredictable length, and consequently loss of money, because the company would not have an incoming while this server reloaded.

It was 2 in the afternoon, and the server was in good shape – I checked, and the redo log was at less than 400 MB in size. Also, critically, innodb_flush_log_at_trx_commit was set at the proper value, the default 1. That is, commits would go properly to disk, to the redo log, on each commit.

I knew from personal experience that this would be safe, and fast: a kill -9 on the mysql pid would terminate the server, the server would go into recovery and this particular hardware would recover 400 MB of redo log in less than 20 seconds. I had in fact at one time written a script that did restore a server from backup, start up recovery and replication, kill the server, and backup the server again, in a loop, for a week, in order to test and provoke a certain bug.

The new DBA knew all I did knew, but from a book. He had not tested this personally the way I did. He knew the server was important, and to him kill -9’ing this server at 2pm in the afternoon was a high risk operation.

So we agreed to reload the server the next morning at 6am. We met in a chat, and in a “screen -x” shared session. I typed the kill command, and in a second screen a “tail -F” on the error log of the server, then asked him to hit return.

He could not.

He physically was unable to hit the return key. He intellectally knew that it was safe and should recover in no time. He knew the operation should be safe. He could not bring himself to do this.

So I did.

Of course the server went through recovery, and came back in 12 seconds. We lost a minimal amount of business, far less then an orderly shutdown of the server would have cost.

The admin saw, and learned. He knew before, but now he experienced personally that his knowledge is true, and trustworthy. He did not just know he was safe, he felt safe. In the following months he became an excellent and courageous DBA and did whatever was operationally necessary.

The lesson here is that there is a huge difference between knowing things, knowledge in the head, and having experienced things personally, knowledge in the heart, and having done things so often that you do not even have to think about what goes which way when you act, knowing things inutitively in the stomach.

These are knowledge, experience and intuition.

Experience and Intuition cannot be taught in a book. They can only be gathered from practice, repetition and survived failure.

Published inMySQL

9 Comments

  1. … Afterwards the three Juniors were busy restoring the incinga database – while the rest of the company flew blind and without metrics.

    • kris kris

      See, that is precisely where experience comes in. As an experienced person, you will know which configurations of a database are crash safe, and how to quickly check for the critical parameters. And then take advantage of them.

  2. True. I remember a similar situation with an apprentice Oracle DBA who was afraid to “shutdown abort” instances (essentially the same as kill -9) because he wan’t sure recovery would work. I told him: “Look, this is how we just built this server: we made an inconsistent, hot copy of the datafiles from another server and recovered them from the logs…this server would not exist if recovery didn’t work. And none of our backups would work. Just do it!”

    Experience…the difference between a DBA who makes cold copies by the runbook and causes 5 hours downtime and one who knows how to safely do a hot copy and migrate with less than 5 minutes downtime. To bad this will be a lost art when everybody just clones whole VMs instead.

  3. You forgot to mention that you even had a budget for the downtime. So even if the company lost some money, management had told the devs/ops that that was o.k.

  4. Andre

    And that’s one of the reasons I tell those devopsy people that it wasn’t all bad in the ITIL world.. Just make sure you have a plan to follow if things go wrong.. Knowing that you won’t have to panic if something fails makes it a lot easier to be bold.. And once you get an idea of what could go wrong, you can set up a small test system for exactly that problem..

    “I don’t know what will happen”, “I don’t know how this works” or “it says so in the manual, but I’m not sure how accurate the documentation is” are good indicators for things that are likely to break and they SHOULD break.. For you to fix them afterwards and make things more robust…

    A bit like those people with flashing clocks on their HDD or dvd recorders.. They’re afraid of deleting something by setting the clock.. But what would be the worst thing that could happen? They could delete a couple of minutes of a TV show.. Or they could insert a blank disc to begin with :)

    You MIGHT lose some data, but you will always gain on knowledge.. Dare to fail ;)

  5. Glenn Nadeau

    I think I WAS this DBA you are talking about. Hope everything is great for you Kris!

  6. There is just one mistake in it: Those admins were hesitant to use -9, they thought about the usage … so they weren’t juniors. I saw junior admin several times to use -9 quite freely , without thinking about the impact of the bad habit of using -9 a second after the normal kill … or using it directly “as it always works” …

  7. Rince

    IIRC there is a good ITIL-Process-Terminlogy for it:
    – Data becomes Information
    – Information becomes Knowledge
    – Knowledge becomes Wisdom

    That’s sums it up quite good.

Leave a Reply

Your email address will not be published. Required fields are marked *