There is a very nice talk by John Laban on the accumulation of cruft and old style features in how we are currently building data centers. Being an advocate for the Open Compute Foundation, Laban is an advocate for OCP, which at the core has several ideas.
One of them being the vision of a Data Center Room, Rack and Machine as a system that are depending on each other in construction.
The outcome of that is that you might be able to run OCP systems in a traditional data center room, but you won’t be able to unlock all advantages of Open Compute when doing this.
Open Compute machines are built for Open Compute Racks – they are dependent on each other. That for example enables the machines to be built without power supplies: Power is being supplied by larger central power supplies that are part of the rack.
The rack is being built for an Open Compute Room – they are dependent on each other, too. That for example allows the room to be built in a way that has relatively hot “cold aisles” – the equipment is rated for air-in temperatures of 35 degC or even 40 degC, and the hot aisle is really hot – so hot that you can’t actually put people in it performing service without venting it first. So the machinery is built in a way that all human access to the equipment happens from the cold aisle side.
Machinery, rack row and room come with dependencies on each other that maximise utilisation and minimise cost. Concrete floor with no raised floor allows relatively heavy racks. Proscribing hot aisle containment, very hot hot aisles, large delta-T across the equipment and relatively hot cold aisle air-in temperatures allow for free air cooling in most places. Airflow directions are proscribed – front to back is the only way it works, equipment access sides are proscribed – front side only. That also allows much narrower hot aisles – since neither people nor equipment must pass, the width of the hot aisle is only determined by thermodynamics, density can go up.
So for a successful Open Compute deployment, you need to make a room that matches the equipment. The Equipment you can buy, if you ask around.
The room – that is much more complicated.
If you are not building or have somebody build to your spec, you are dependent on commercially available Colocation space. That is not a problem for “The Hyperscalers”, which is basically GoogleAppleFacebookMicrosoftIBM and maybe another dozen. They have a queue of datacenters to be built.
It is a problem for smaller companies, because the commercial colo space is being defined by Uptime Institute Tier Certifications. A nicely formatted overview can be had from Uptime or many data centers that advertise and explain their product, such as this page.
Of course, most of the Uptime stuff makes very limited sense in the face of modern OCP requirements, even more so once you put software defined redundancy as provided by a Kubernetes cluster or a similar system on top. Redundant power, redundant networking and many other levels of redundancy at the building, room, row and rack level make little sense if redundancy is provided in other ways and the loss of a machine, a rack or even a full row is a normal and survivable thing.
So in order to deploy OCP successfully you need to either build or buy, but according to which standards?
Uptime has nothing.
And OCP itself has this, but that seems to be thin. It certainly is not a spec a data center provider and seller of colo space can build against and then use this to advertise a product.
In the terms of CMMI, a colo data center room is an outsourced resource, and successful outsourcing is only possible when specification for process and product exist with quantitative metrics so that KPIs exist against which process success or proper product delivery can be measured. That means Level 4 or better in the CMMI.
For OCP data center space that does not currently exist – we are at level 1 (individual heroics) or 2 (fragmentary specs with often qualitative metrics only), hence no market.
Looking into the direction of hardware vendors, Rack Scale Design initiatives exist outside of OCP – most are based on the Intel Rack Scale Design Initiative.
They, too, are starting to view system, rack, row (pod) and room as a unit, but requirements are from a hardware vendor POV. These specs have more coverage and are move flexible that OCP is, which is good, because they allow a more gradual migration from enterprisey snowflake systems with their composeable hardware to the uniform kubernetised mass server farms of the hyperscalers.
From the hyperscaler POV much of that is bullshit. Nobody needs the “VMs in hardware” capabilities of the composeable hardware designs, if you already are doing the same and more with Kubernetes, Trident, iSCSI/RoCE – and all servers have the same specs, anyways. But if you aren’t at that level, it might be useful to combine an arbitrary CPU/memory block with an arbitrary set of disks in a rack, assign them a free network card elsewhere in the same rack, in software, and call that a machine, no hardware touched.
Still, PEX? LOL. This is wrong. If it’s not Ethernet, it’s the wrong network.
The thing is: for this to actually work, like OCP, RSD too needs a cruft free, modern, minimised data center design that is way outside of the way of thinking that has been set into standard by the Uptime Institute. The Tiers, they are no longer a good fit for a machine-rack-row-room system where software is about to eat the world.
There is none.
So no space is being built to a spec that does not exist.
And again, we are at an impasse and we can’t unlock all of the savings such a design makes possible.