Skip to content

Redfish, Rack Scale Design and Remote Anything

The Intel RSD Platform Guide (PDF) is the one document you should skim front to back, it’s really useful.

Back in the bad old time, server computers had a proprietary  management controller (BMC), for example HP iLO or Dell iDRAC. These varied widely in capabilities, and worse, in data structures presented to the management software controlling the data center.

A lot of standards came, and failed, until pressure from certain customers with a lot of machines, everybody kind of centered around Redfish. All modern servers, no matter who makes them, understand Redfish.

But Redfish does not stop at the server, nor is it the whole story. It is cross linked to Rack Scale Design (RSD), which is an initiative lead by Intel and joined by many vendors to build composable hardware.

In RSD, we have a Pod, a collection of Racks and the Rack Management Modules. The racks contain drawers, which in turn contain parts of machines, pooled resources – Compute Drawers contain compute modules, machines with memory and processors, Storage Drawers contain disk drives or SSDs. A bunch of Top of Rack Switch (TORS) connects the racks together, and is built for a lot of rack-to-rack (East-West) traffic. At least one Pod Manager (PODM) exists, that  together with other lower level controllers controls and orchestrates the configuration of these pooled resources – and all of them speak Redfish.

The game that is being played here is disaggregation – taking a traditional server computer and cutting it into a bunch of generic pieces that make up the machine, then providing these pieces and composing the necessary actual computers dynamically as needed.

It’s like building a virtual machine in hardware, and if that is sounding weird to you, that is because it is weird.

The level of disaggregation is variable – some of it even makes sense.

Looking at the way we consume hardware at work, the biggest cause of variation between machines is storage.

The bulk of our machinery are either quite small machines (Dual E5-2620v4) or quite large (Dual E5-2690v4). Intermediates exist, but could be shoehorned into the smaller type with some effort.

But fact is: even the small type of machine is often already too large for what is needed by an application. A dual E5-2620v4 delivers 32 threads, and with 128GB of RAM and one or two 10 GBit/s Ethernet interfaces – that’s 4 GB of memory and 300 (or 600) MBit/s of networking per thread. A very useful and balanced configuration.

What varies is the storage provided to these machines – different types of storage (HDD or SSD) and different sizes of storage (one or two small disks, because the machine does not persist anything locally, or many disks, because it’s running a persistence service like a database).

NVME over Fabrics (PDF)

Disaggregating storage makes a lot of sense – have storage boxes in each rack, and provide disk storage to each machine that needs it, and – and that is an important point – in exactly the size needed. Because right now each box consumes the type of storage hardware one size larger than it needs in order to have sufficient space – and that leaves a lot of disk space unused. So it is useful to have a kind of distributed filer than creates volumes that are an arbitrary, application defined size, not the size of the disks that make up the total storage volume available to the filer. Having that kind of storage definition will most likely dramatically improve my storage utilisation.

But I am not using a PCI interconnect across my racks to do that, but most likely the existing Ethernet and things like iSCSI or RoCE for that. One bus tying together all my racks, one topology to manage, and one way of routing things.

Having smaller CPUs and proportionally less memory also makes sense – to have, but probably not to buy. And that is where I wonder why I would want the composability that is provided by RSD. I am not really going to configure a number of CPUs, DIMM modules and network adapters somewhere in my rack to assemble into Voltron a software defined machine, but I am most likely going to take a machine that I have and slice and dice it into smaller units.

That is, I am more likely to use Kubernetes or Openstack to take a set of machines and subdivide them into smaller, software-defined units of useful size, than I am going to take a bunch of software-configurable hardware parts and make machines of the appropriate size from them.

I am not assigning special hardware like a GPU to any machine, but I am more likely to have special hardware in one machine and then assign software to run on this machine as needed.

And then there is latency. Companies like HP keep talking about disaggregated memory in many forms for more than half a decade now, but of course that is not easily done – while you can provide bandwidth, simply moving memory away from the compute is going to incur latency. That is not fixable, so we are not going to disaggregate memory properly at all, ever, but add a new intermediate tier to the cache hierarchy in the data center – below local RAM, but above remote flash. Funnily enough, this again seems to point to RoCE again, and not to PCI switches.

So, if I can have disaggregated storage without RSD, and I don’t really want disaggregated memory, because it will never materialise, what other good may come from RSD?

RSD provides the management you do not have. Through PODM, the RMM and the lower level modules, all understanding the RESTful Redfish API and the tools built on top of that, finally we are getting software that can manage data center hardware in a useful way. Rack performance data (power draw, airflow, temperatures) are being exposed and are being collectible by modern software. Machines can be BIOS-updated, partitioned and installed automatically without manual intervention and dashboards are finally possible without writing a gazillion low-level proprietary hardware abstraction layers first.

So the composability is mostly bullshit, or at least a solution in search of a problem. Most people don’t need PCI switches, as they already have trouble managing their Ethernet switches, and most people are way better off having one network and one topology to manage. Also, you need IaaS/PaaS software such as Openstack or Kubernetes anyway, and you are most likely not going to assemble your hardware from parts across a rack anyway.

But the management is really nice, if you don’t happen to have such a management already.

Published inData Centers

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *