The history of
unserialize() in PHP begins with Boris
Erdmann and me, and we have to go 20 years back in time. This is the day of
the prerelease versions of PHP 3,
some time in 1998.
Boris and I were working on code for a management system for employee education for German Telekom. The front side is a web shop that sells classes and courses, the back end is a complex structure that manages attendance, keeps track of a line manager approval hierarchy and provides alternative dates for overfull classes. In order to manage authentication, shopping carts and other internal state, we needed something that allowed us to go from a stateless system to a stateful thing, securely. The result was PHPLIB, and especially the code in session.inc.
That code contained a function
serialize(), which created a stringified
representation of a PHP variable and appended it to a string. There was no
unserialize() necessary, because
serialize() generated PHP code.
One of the problems we had to solve was that PHP3 was slow. So if we were to write out a serialized representation of variables, we would have to write a parser for that, and that parser better be fast, because it would be executed at the beginning of each and every page load. We already had such a parser, and it was the fastest parser available, PHP itself. So we decided to write a function that takes a variable, recurses through that variable and generated PHP code to recreate the variable if that code would be executed.
For scalars, that is trivial.
For an array, that is possible with a simple recursion over each element.
For objects, things are complicated.
Firstly, in PHP3, an object does not have the slightest clue what it’s
classname is. Secondly, an object might have instance variables that we do
not want to be part of the stringified representation of the object, for
example file and image file handles or other resources. We solved both
problems by requiring serializable objects to have metadata instance
For unserialize, we simply executed that code inside an
From PHP 4 onward, Sascha Schumann wrote a C implementation of session management that became a standard part of PHP, which was largely based on our ideas. Unlike us, he made PHP store session data in the filesystem, inheriting our idea of probabilistic session expiration. He also converted the data format from PHP Code to the current representation.
Another problem that needed adressing was code uniformity:
If you load a serialized object written by another web page of your
application, that object has a class. And the code for that class needs to
be loaded before you can load and re-instantiate that object. Basically, you
saved an instance
DemoClass, and if you load that instance again,
unserialize() on it, PHP needs to know what a
DemoClass is and
how to make one, using
$a = new DemoClass(), before it can fill in the
values of all instance variables. So the includes of your app better be the
same on all pages, or you invent the
The Autoloader originally was a simple callback function that would be invoked every time you want to make an instance of an unknown class. The function would get the name of the missing class as a parameter, and would then be responsible to produce a class definition for that class.
This is your classic suicide autoloader: Whenever an undefined class is
__autoload() is being called with that classname. It
would then try to include a file with that name (and all dots and slashes
that you manage to falsify into that name) and hope that this would contain
code that defined a class with a matching name.
Real autoloaders should not look like this (but often they are close enough to this code to contain exploitable security problems).
Today the autoload mechanism of PHP is more complicated. It has a default autoloader, a mechanism to manage a stack of autoloaders, and a bunch of other things such as file extensions and include path names to influence all that. It is also quite good, even if more sophisticated serializers exist.
So what is the state of
unserialize() in PHP these days, and why?
It links to this:
but given the little tour above, you should be able to understand how
unserialize() cannot really be made safe for import of foreign data. You can
unserialize() stuff your code has written, after having verified that the
data you read still is the data you have written, using salted hash_hmac()s
on said data. That’s going to be reasonably safe.
You can’t, ever, use
unserialize() as a data interchange format with other
application or an end user.
unserialize() is not an exposable API. Never
was. Other functions and formats for this exist. Use those. Here is some
code to try:
You can use this to see what serialized data looks like:
More complicated stuff such as arrays:
Also, references half-work, if they are internal:
but serializing a reference $b that points to $a does not work: The value $b is referencing is saved, and the unserialized variable will have a value, but will not reference the previous value. There is no error or warning.
Here is what we do to Objects:
This shows how the contructor is being executed, how the
callback is called and the content of the serialized file with the
stringified DemoClass instance referencing the class name. If we load this
without DemoClass being defined, we get
We can load DemoClass using an autoloader we define. Then we get
You can see the Autoloader being called, the
__wakeup() function running
(it didn’t in the example before!) and then the properly re-instantiated
Note that our new version of DemoClass, as defined by the autoloader, does define different default values for the instance variables so we can show that the values from the file are being loaded.