Previously, I wrote about how to survive OpenStack Development so now I’ll write about how I applied that knowledge. The goal of this project was to build infrastructure that would allow an untrusting cloud tenant to verify for themselves the integrity of their VM and the hypervisor it runs on. If you’ve done any work on cyber security you will understand that trust is very valuable these days, so this is a big step in allowing security sensitive entities the ability to make use of cloud computing infrastructure.
Building on that motivation, there’s this idea of Trusted Computing which is a set of technologies developed to make sure that a computer “behaves in an expected way”. For example, if there were a rootkit in the BIOS or kernel we don’t necessarily want to patch it up and carry on; we instead want to recognize that something unexpected happened and remove the system from the ring of trust before it can do any harm. I.e. we want to prevent this machine from accessing any higher level security services.
In order to do this we make use of a piece of hardware known as a TPM which has a set of registers known as PCRs and a hardware bound endorsement key (EK) for signing the PCRs. This configuration allows us to trust that the PCRs are coming from a particular TPM and that they haven’t been tampered with. To make this useful we measure (hash) any critical bit of software before running it and “extend” it into a PCR creating a hash chain. What you end up with is a set of registers that represent all the software that got you into user space. If you have a “white list” of PCR values, you can then immediately detect any anomalies. The caveat here is that this is only as good as your measurement and attestation infrastructure.
What we did extend the measurement infrastructure beyond the physical hardware and into VMs. What’s tricky is that you can no longer rely on a single physical TPM when you have multiple operating systems trying to measure things concurrently. To get around this hypervisors have implemented Virtual TPMs (VTPMs) which both provides a unique TPM to each VM and exposes the the physical TPM’s PCR values. That way, a tenant running in the VM’s user space can first attest the integrity of their hypervisor with the physical TPM and then extend that root of trust up to their OS with the VTPM. Currently, this is a fairly cumbersome process so we seek to automate all of this by allowing OpenStack to provision VTPM resources and integrating it with attestation infrastructure (Keylime) developed by this project’s mentor.
The diagram above illustrates our stack which consists of Xen, OpenStack, and Keylime. In Xen, you have domain0 which you can think of as the “root” user for the hypervisor. It exposes hypervisor management through a native library known as LibXL. In order to support multiple virtualization layers, OpenStack uses LibVirt as a common abstraction layer. What’s missing is support for VTPMs in everything above the LibXL layer.
Starting with the LibVirt layer, we needed a way to define our intent to spawn a VTPM so I wrote a specification for a new device in the domain configuration file. This was a matter of parsing parameters out of the XML file and into internal data structures. On the other end I then translated those internal configuration structures into the native Xen structures to spawn the VM. Overall, it was a very straightforward patch to carry out and my hope is to get it pushed upstream at some point.
The tricky bit was dealing with the OpenStack layer and its immensity. On top of spawning the VM it also had to provision the VTPM resources beforehand. As far as Xen was concerned, you needed a UUID from the VTPMMGR and a small backing image to create a VTPM. To get the UUID, we needed to expose a rest API on the Keylime VM which acted as a proxy to the VTPMMGR. This was necessary because Domain0’s kernel yields the physical TPM to the hypervisor by removing support for TPMs entirely. For the image, we simply had Nova create a file out of /dev/zero. Putting this all together, Nova provisions these resources and generates an XML file for the VTPM which goes into the patched LibVirt. After that, we generate the last XML file for the VM that connects it to the VTPM.
I learned a lot from this project and in some ways it may have changed my career trajectory. For one, I realize that I still love building backendy infrastructure type things. Unexpectedly, I also developed more of an appreciation for FOSS and its community. I’m now much more comfortable diving into these projects, reaching out for help, and contributing patches then I ever was before. Bugs that I would quietly complain to myself about before are now submitted as bug reports and I’d make an honest effort to patch myself. Most importantly though, this project and the class that came with it opened my eyes to the exciting work going on in cloud computing. While I missed being on the ground floor of this work, I believe were on the cusp of a Cambrian explosion of sorts in this field.