Blogging has been very difficult for me over the last 4 months. My move to the Office of the CTO within EMC changed much of what I did and left me searching for content I could write about. Most of what I was dealing with on a daily basis was either too early to mention or too secret to reveal.
Today, this changes with the release of a project I have spent the majority of my days and nights working on this year. Without long-worded wind up I am proud to announce the release of Razor, a cloud-provisioning tool to change the way we look at provisioning hardware for cloud stacks.
Razor is a software application, which is a combination of Ruby (main logic) and Node.js (API, Image Service) for rapidly provisioning operating systems and hypervisors for BOTH physical and virtual servers. It is designed to make standing up the base substrate underneath cloud deployments both simple and transactional.
Now at this point, many of you are thinking: “Great, another *cloud* provisioning tool.” And I don’t blame you at all. So what makes Razor different than many other tools out there like Cobbler, Dell’s Crowbar, or other deployment services? Just about everything.
The real answer to that question is related to the reason this project is named Razor. We based much of our design theory after Ockham’s razor. It is based on the belief that OS/hypervisor deployment should be simple, succinct, and incredibly flexible. Many products out there try to solve ALL the problems for every layer instead of focusing on their layer correctly. They try and make their software layer the most important piece. Razor is designed to enable other tools rather than replace them.
As part of this, Razor was designed to be extremely simple to extend and manage. Unlike other popular tools out there right now. Razor allows you to add support for an entirely new operating system with a single file. It allows you to create multiple versions of an operating system model by changing a few lines. And with this release it fully supports VMware’s ESXi 5, Centos 6, openSUSE 12, Ubuntu Oneiric & Precise, and Debian Wheezy with our first release.
But the ability to extend Razor is just part of the magic. Another critical design decision is how Razor links to upper-level configuration. Early on, the team I belong to was researching and exploring different newer provisioning tools. We found that many had very limited support. Others were just glorified scripts. And some were even linked to DevOps tools (awesome) but chose a design where they wrapped a state machine around the DevOps state machine (not so awesome). And I won’t even start on the horrible installers we ran into. If it takes 60-120 minutes of manual work to setup a DevOps integrated tool – then you are not getting the point behind DevOps.
We ended up at a point where we said to ourselves, “If I could have a new cloud provisioning tool, what would it look like?”. And we came up with some core ideas:
- Adding new OS or Hypervisor distributions should be simple – Mentioned this above, but many tools require major work to extend.
- Must be event-driven instead of user-driven – Many tools claim automation but require a user to have to select the 24 servers and push a button. We wanted Razor to enable users to create policy that automatically accomplishes what is needed when given physical or virtual hardware.
- Should have powerful vendor-agnostic discovery or physical or virtual compute resources – A powerful tool is useful whether it is a 5 year old HP server, KVM virtual machine, or brand new Cisco UCS Blade. It should be able to discover, understand, and provision to any of these and all of these based on a user’s needs.
- It should scale well – No monolithic structure. You should be able to run one instance or 50 instances without issue. This is a major reason behind why we chose Node.js for the API and Image Service layers. Event-driven and fast.
- It should directly integrate with DevOps toolsets to allow for cloud configuration – The biggest and most important requirement. And unlike tools that wrap and cripple a DevOps tool – Razor should integrate with them without affecting their ability to scale or manage resources.
- The control structure must support REST interface control out of the box – If you are going to build a system for automation- make sure it can be automated by another system.
With these requirements Razor was born. So, let me walk you through how Razor works and why this is so powerful.
Razor uses a powerful discovery mechanism that has a single purpose: find out what a compute node is made of. With Razor we designed what we call the MicroKernel (known as the MK) for Node discovery. When a server is booted your DHCP server will point the server to a static PXE provided by Razor. This PXE file will point the server at the Razor API and automatically pull down the MK and load it. The MK is tiny, around 20MB in size and is an in-memory Linux kernel that will boot, inventory, contact the Razor server, and register the node with Razor. It will then sit idle and check-in with a lightweight ping waiting for Razor to tell it what to do. The control link between Razor and the MK on the nodes is via REST to Razor’s Node.js API.
The MK then uses Puppet Lab’s Facter to gather information on every piece of hardware as well as what kind of server (virtual or physical) and even what kind of virtualization it is on (VMware, KVM, Xen). The MK sends this information back to Razor and even updates, should you change hardware on the fly. The end result is that Razor can automatically discover and know the makeup of hundreds of physical or virtual servers. It will know what CPU version, server vendor, how many NICs, how many physical disks, and much much more. All of which becomes very important soon. This information is available for every node within Razor (screenshot) via the CLI or REST interface.
The next step is taking this inventory of nodes and classifying and carving into something useful for deployment. This is where tagging comes in. In Razor you have a construct called a Tag Rule. A Tag Rule applies a Tag to a Node (discovered compute node). A Tag rule contains qualifying rules called Matchers. It may seem a little complex but is actually incredibly simple.
Let’s say you have 64 servers. 16 of them are brand new Cisco UCS blades. 18 of them are HP servers about 3 years old. And finally you have 24 old Dell servers that are quite dated. What you want to do is separate these servers by type so you can deploy and configure them differently. You want to put the Cisco UCS blades into a vSphere cluster for running Cloud Foundry. You want to take the HP servers and stand up an OpenStack test bed. And since the Dell’s are a bit dated, you want to provision them as development servers running Ubuntu for use by a development team.
Part of the beauty of Razor’s discovery mechanism is that it has already gathered all the information you need. Each Node contains attributes for both vendor and product name from the MK registration.
Tagging allows you to group these servers by applying a common tag. You create a Tag Rule called ‘CiscoUCS’. And then you add a single Matcher to that rule that says: if ‘vendor’ equals ‘Cisco’ then apply the Tag. Immediately every Node Razor has that matches that rule with be tagged: ‘Cisco’. Likewise you can setup tag rules for the HP and Dell servers. You can also tag on things like how much memory or CPU version and create helpful tags like ‘big_server’, ‘medium_server’, or ‘small_server’. And Tags stack also. So you can create multiple Tag Rules that can apply and classify complex servers. You can have a Node with [‘Cisco’,’big_server’,’4_nics’,’cluster_01’,’Dallas’] describing size, location, and grouping. Tag Rules also allow you to insert attributes into the Tag. So you can create rules that automatically name like ‘memory_96GB’ for one server and ‘memory_48GB’ for another.
So in our example use case we now have taken our 64 servers and applied a bunch of useful Tags to them. We now need to make those Tags useful and do something based on them. This is where Policy in Razor comes into play. A Policy in Razor is a rule that takes a Model (more on this in a second) and applies it to a Node (remember, discovered servers) based on matching against Tags. This is incredibly simple to setup. In our case we would have a Model called ‘ESXi5Standard’ which deploys VMware’s vSphere hypervisor to a Node. We would create a new Policy that says: if a Node matches the Tags ‘Cisco’,’big_server’,’cluster_01’ then apply my ‘ESXi5Standard’ model to this Node.
Now what makes this very very cool is this is completely automatic. As the Node checks-in from the MK, the Razor Engine checks the Policies to see if there is a match. Policies work like a firewall rule list. It starts at the top and moves down till it finds a match. When it finds that match it applies the Policy and binds the Model as the Active Model to the Node. This immediately starts applying whatever the Model is to the Node. In our case above, each one of our 16 UCS servers would quickly reboot and begin installing ESXi 5 to each node. What is more important to understand is that if only 1 Node was on when the rule was created only 1 would be installed. But as long as the Policy is enabled you could turn on the remaining 15 Nodes as you want and have them bind to the same Model and move on to become part of the vSphere Cluster.
In our use cases we would build 3 Policies. One for our UCS blades, one to install Redhat on our HP servers for OpenStack, and one to install Ubuntu Precise on our Dell servers. Any servers not fitting the Policies remain idle. If you were handed 16 more UCS blades and wanted them to deploy to the same cluster, you would just have to turn them on and let Razor continue to enforce the appropriate Policy.
Let me take a second to describe how Models work. You have two components to the Model structure; the Model Template and the Model itself. The Model Template is one or many files that describe how to do something. They are actually very simple to write and Razor comes with a bunch of Model Templates for installing a ton of common stuff. The Model is an instance of a Model Template plus some metadata required (like password, license key, hostname, etc). You may want to install ESXi or Ubuntu differently depending on if the destination is production, development or based on factors like location. So, you have the ability to use the Ubuntu Precise Model Template to create a Model for UbuntuProduction and a Model for UbuntuDevelopment with different settings for things like username, password, domain, etc. Or, you can use one Model for all and let upper-level tools like DevOps manage the configuration differences. I won’t go into how to create your own Model Template in this blog but even the Model Template can be customized for the same OS but different needs.
So far I have covered how Razor allows for dynamic provisioning of OS/Hypervisor layers. But, if I stopped here then all I would have done is talked about a slightly better mousetrap. The creation of Razor was based on the principle of the simplest solution being the most elegant. And the point of deploying Ubuntu or vSphere is to host something at a higher level. That is where Brokers come into play and where the real magic of Razor is important.
If I want to deploy something like OpenStack I want to do it with a system that is designed to do it right. We looked at DevOps products like Puppet Labs and realized they are much better at managing configurations for cloud stacks. So by design Razor is integrated to enable the Handoff of a provisioning Node to a system that will manage it long-term.
To do this, Razor uses a Broker Plugin. A Broker is an external system like a Puppet Master from Puppet Labs that will properly configure a Node for its true purpose in life. Out of the box we have worked hand-in-hand with Puppet Labs to include a Broker Plugin for Puppet that enables both agent handoff (Linux variants) and proxy handoff (vSphere ESXi). There are a couple really important things to point out here. First, we don’t wrap Puppet or attempt to control the Puppet Master from Razor (like the other guys do). Razor’s purpose in life is to get the Node attached to the right Model and get it to a state where it can give the Node to Puppet. It delivers all the metadata it gathered along the way also including tags. But once Puppet gives the thumbs up – Razor is done.
When a Broker like Puppet receives the Node, it can use the tags passed with it to make decisions on the configuration. You can link similarly tagged ESXi Nodes into the same cluster. You can setup one node as a Swift Proxy and the next 5 as Swift Object servers based on tagging. The important thing here is that Puppet is able to consume the hardware details and classification and in sequence, turn provisioned Nodes into stacks of application and services.
Which means in the end- when everything is setup, Razor and Puppet can take groups of servers and turn them into deployed services automatically. They can scale them incrementally. With the way binding works with Razor, you can even re-provision quickly.
This blog post is getting long enough as it is. I will leave much of the configuration details for the videos below. I won’t cover how the entire control structure is completely available via REST API (proper GET, POST, PUT, DELETE). I won’t mention the slick Image Service that allows you to load ISO’s into Razor and choose which Models to attach to. I will even skip the lightning fast Node.js layer which dynamically serves the API and the Image Service for all Nodes. And I won’t mention the detailed logging you get with provisioning tasks including the ability to define your own state machine within a Model.
But I will mention the best part about the Razor announcement. EMC has decided to donate the Razor code for release under Puppet Labs community with an open-source Apache license. We feel strongly that releasing Razor as open-source software for enabling the next-generation of cloud computing and integrated proper DevOps toolsets is critical to the community. We are looking forward to the cloud community helping us create something that can benefit everyone.
At this point you may be asking, how you can start using Razor. Unlike some other tools, Razor is incredibly simple to install and run. You can manually download the required pieces and clone from the github repo. But the slick cloud-magician way is to actually use Puppet and deploy Razor quickly and easily. Because what good is a DevOps integrated tool if you cannot deploy it with DevOps toolset?
Here is a quick video by Nan Liu of Puppet Labs showing how easy it is to install Razor via Puppet on Ubuntu Precise.
This video is a quick example of deployment with ESXi Models and handoff to Puppet Labs for automatic install of vSphere and configuration of clusters and virtual machines.
And finally here are some critical links to get started on working with Razor.
- MUST READ. Awesome quick guide by Puppet Labs on how to deploy and use Razor with Puppet. http://puppetlabs.com/blog/puppet-razor-module/?utm_campaign=blog&utm_medium=socnet&utm_source=twitter&utm_content=motwrazor
- The Puppet Labs press release regarding Razor: http://puppetlabs.com/company/news/press-releases/puppet-labs-announces-next-generation-provisioning-solution/
- Great Cloudcast podcast with myself, Dan Hushon, and the Puppet Labs team: http://www.thecloudcast.net/2012/05/cloudcast-eps38-project-razor-google.html
- The Razor code at Puppet Labs github account: https://github.com/puppetlabs/Razor
- A great post by my boss and cloud veteran, Dan Hushon: http://www.vdatacloud.com/blogs/?p=106
- Good EMC insider viewpoint from the one and only Chuck Hollis: http://chucksblog.emc.com/chucks_blog/2012/05/of-puppet-and-razor.html
I will be following up this blog post with videos on how to use each component of Razor and examples of using the REST API, and much, much more.
Feel free to leave comments and questions here. But please use the resources Puppet Labs has setup for the community projects as well.
Thanks for reading this long post – and have fun cutting some servers up,