Virtual Selection : Cloud Constructs for Rapid Prototyping/Testing

Check out that title. Pretty awesome way to sound smart, right? Well this blog post is another one of my long winded ones and concerns my recent 6 week side-project. So a little warning in advance: This is a long read and a minder-bender in spots. Have a hot or cool drink and some time before you start. I think you will enjoy the ending.

The Idea

I am a firm believer that virtualization and cloud computing are creating new paradigms to approach innovation, operation, and execution within information technology. I find myself inspired by ideas and concepts that would be impossible before the advent of virtualization as a common approach to logical abstraction of x86 compute, storage, and networking. In my feeble mind, I see endless possibilities not only in automation. I also see possibilities in creating intelligent systems; able to respond in way much more organic that we may have thought possible.

It is from this belief that this new idea came to me. The lifecycle of applications and infrastructure has been both very a manual and managed process. Creation, changes, and death (decommissioning) are all things that can be automated; but require prerequisite knowledge to orchestrate correctly. You would specifically know the quantity, scope, and configuration of physical or virtual servers prior to building for an application. Likewise, configured settings and metadata for the application would have been tested and discovered through intense integration and regression cycles by development/quality teams beforehand. All of this would be wrapped around processes and models (ITIL, COBIT) with the goal of ensuring control and accountability.

And I am not proposing that the model above is inherently wrong. Rather, I just think that it may be possible we are missing opportunities for the infrastructure to work for us.

virtselectThis idea has actually been with me for a while. The ability for virtualization to convert the paradigm of a rigid physical infrastructure in the form of a server, into a logical construct known as the virtual server opens a wealth of options. The lifecycle of the virtual server is something much different. Unlike the physical server, it can be created, modified, and decommissioned without physical access and through multiple methods. Even more interesting is the ability to clone copies of virtual machines with the *children* possessing the same makeup as the *parent*. I look at this and see a similar model to common creatures in our world around us. An animal is a product of its parents. It is a genetic clone with mixed attributes and minor mutations that make it unique.

So the question in my mind is:
If I took a virtual machine, used the ability to clone as a basis, and added the ability to spawn generations that not only inherit attributes from the parent generations but also introduce mutations. What would I have?

Would I be able to see virtual machines become more sophisticated or efficient over time? How would they evolve?

But then I ran into a brick wall. Random mutations and multiple generations of virtual machines would in the end, produce nothing. There is no goal, no end game. This would result in nothing but virtual white noise.

Then my thought process went even further. Creatures in our world are exposed and live within an environment. Just as an environment enforces life and death, favoring the strong and killing the weak; what if I could create an environment in which certain virtual machines were allowed to thrive and others not based on random mutations within each generation? I could contrive a goal. Using either a single linear metric which every generation is graded on or even a multidimensional map that would judge each virtual machine.

And that is how I came up with the idea for Virtual Selection. It is simply taking the unique constructs that represent virtualization and applying the model of real-world Natural Selection; with the object of allowing for unassisted evolvement of virtual machines towards a goal.

But to create this I first had to decide on a few of strict rules to follow:

1. The creatures (virtual machines) would have no ability to know the goal.

They must only produce an asymmetric result without any input from the environment and only a genetic map from their parent. Mutation must be random and varied in both rate and effect.

2. The environment would have no ability to enforce a result other than selection by reproduction.

The environment would choose which virtual machines procreate and which do not based on how well their genetics compared versus an *ideal* synthetic genetic code. They cannot change the mutation rate, specific attributes, or anything *within* the creature.

3. I would use as little code as possible and as much of the virtualization construct as possible.

This entire process can and has been done with logical code constructs. But what is important about the test is that our goal is not just an abstract test. It is an idea that a virtual machine can become *better* by applying Natural Selection concepts to it. If I wrote a bunch of code that did this all in memory, I have made it irrelevant versus a virtual machine model and ultimately a cloud construct. This test must be useful against a virtual machine that would include an operating system, settings, and applications. The idea is to prove that this is possible in our virtual infrastructure world.

Now I had some real problems to solve. First, how would I represent a synthetic genetic map that was inheritable, mutable, and allowed my environment to evaluate? How would I be able to have this exist inside a virtual machine? And more importantly: how can I make this genetic map as something someone could observed as an evolution through generations?

The answer came to me while staring as a restaurant menu. It is both simple and perfect. I should use an image. An image is a two dimensional map of pixels that each are a combination of possible values. This is inheritable, as each generation can inherit the image of its parent. This is mutable as individual pixels can mutate within the image for each generation. This is can be evaluated by the environment using a *goal* image which represents the ideal genetic map for the environment. And this can be easily relatable as a progression to outside viewers as images are a visual representation of information.

I still had to build the pieces but I had the overall design ready in my mind. I would have creatures as virtual machines. These would have a genetic map that they would take on boot and build their version. I would have an environment as an orchestration application that would evaluate each creature and choose with creatures would be allowed to replicate for the next generation. And all of this would run on a lab environment running VMware’s vSphere 4.1 and using the vSphere Web SDK.

I will go through each section below and talk about the design behind each piece and the lessons learned. And I will finish this post with both the results and some ideas on how this can be further extended into business relevant ideas.

platypus1. The Creature

To build my creatures I had to think in small terms. This entire test was to run on my single Intel Core i7, 12GB of RAM, and SSD-based system. Not a massive amount of power which means I had to be as frugal as possible. After starting with several Ubuntu builds I realized I needed smaller and faster VM’s. I needed to run some generations (more on that below) in quantities of 32-64+ creatures at a time and using 6GB creatures was too much for my poor SSD disk.

I ended up using Embedded Debian (Embedian) to build an 800MB creature with all the components I needed. This included PHP for all the genetic functionality and VMware tools (200MB+) to allow the environment to manage and maintain unique identification of the genetic models. I also ended up utilizing Linked-Clones to enable greater space savings and quicker cloning.

Within the creature itself I used PHP to load up the genetic map (image) and replicate using randomized mutation rates. Instead of a preconfigured rate of mutation, I actually treated every piece of genetic code (pixel) against a mutation rate that was randomly established on creation (boot). I also wanted the scope of the mutation rate to be somewhat variant so I added in algorithms that made subsets of generations by percentage either especially prone to mutation or without any mutation. I found it was especially important to make sure that each generation would contain a few creatures with very little mutation. This would prevent any positive progress in a generation from being completely reset by completely unproductive mutation across all creatures.

The funny thing about spawning hundreds of thousands of creatures (virtual machines) is that because they inherit all the details of their parent, I had to build in resets of log files and incremental things to prevent file space shortages. I also had to deal with operating system errors that only occur once in a few thousand boots. Needless to say I had quite a stable build by the end.

The end result is a very robust small creature (virtual machine) that performs a specific function each boot and is not impacted by cloning. Each creature took what it inherited by being cloned and mutated ever so slightly and randomly to become a unique creature.

new-zealand-forest2. The Environment

The first problem to solve would be the ability of the environment to know the uniqueness of the creatures. It must be able to evaluate each creature to determine which *thrived* against the goal genetic model. To do this I actually used DHCP and a Private Class A range (10.x.x.) with a one hour lease time. Each creature would boot, obtain a unique IP, and produce its genetic map. The environment would use this IP address as a unique key for the creatures. The environment possessed several key components that allowed evaluation.

1. Genetic map drop

This was a network share that allowed each creature to drop its unique genetic map keyed with its unique identity. Since this location was asynchronous (drop only) it maintains rule #1 and #2 above. For simplicity this is the only touch point between genetic maps and the environment.

2. Evaluation algorithm (written in C#)

This was functionality that would inventory both the entire generation of creatures (virtual machines through vSphere SDK) and the corresponding genetic maps (image files). Then it would grade against the goal comparing images and produce a map of the results.

3. Life & Death engine (also C#)

This was an orchestration layer that would be fed the map of generational results from the step above and clone, kill, and power on the next generation. This had to be able to managed the genetic map drop, the clone/delete/power on/power off integrations, and manage linked-clones. It also would handle taking the image comparison scores and choose who would procreate and who would not. And finally it would also have to have the ability to handle errors and recover should any error occur in any of the other components.

The environmental process is quite simple. First the environment L&D engine is enabled/started. It examines the existing creatures (first generation) using the evaluation algorithm. Then it chooses the best performers and clones a set of children. This set is actually also slightly random. It is a set with a defined minimum to make sure there are always generations and a defined maximum to ensure that I do not crash my workstation. The actual value in this set is influenced by how the variance of change. If there is little change generation to generation then the number of children increases. Otherwise it will remain the same or reduce with a massive amount of mutation. This serves as an environmental balance to maintain progressive mutations while not over producing children.

While the new children are cloned, every previous generation is destroyed. This ensures that every generation is unique and no comparison is done cross-generation. The use of linked-clones here makes the delete/clone process more difficult, but saves on space and time.

After all this is complete, the L&D engine goes idle and watching the environment for when the children have completed their cycle. Once this has happened it evaluates this new generation and starts from the top.

DNA-fragment-3D-vdW4. The Genetic Map

I chose a picture that was 100 pixels by 100 pixels with a possible gray scale color range of 8 bit (256 colors). This results in the odds of changing 1 pixel correctly at 1 in 256,000. In hindsight this was probably too ambitiously large of a genetic map with only my meager workstation. When I started I knew I would have no idea of the efficiency of the system until I tried to use it. I quickly realized that the closer I got to the goal, the slower it would go and the longer it would take. To complete the project I compressed the selection of color range to compensate for my limited CPU/Storage in my lab.

But, even with color range reduction I still had to find ways to optimize the process. I learned how to clone creatures as fast as possible, how to boot into Linux in half the time, and how to orchestrate vSphere in the correct order. A genetic map with 256,000 combinations of values is quite a feat. However, because the learning process included the slow evolving of this genetic map I stayed true to my goal and simply had this running 24 hours a day, 7 days a week.

5. Results

And the result is that it works quite well. I cannot really describe my own amazement as these little creature’s genetic maps started looking more like the end goal. What started as a white noise evolved, on its own, into a result that you might agree is pretty darn cool.

The video below is a segment of about 85% of the generations I was able to run against my original goal. Remember this progress was completely random with only procreation selection by the environment. As I was constantly refining the process the rate of change varied as well as the fact that random mutations are in fact, very random.

Each frame in the video represents the highest scoring creature’s genetic map for that generation. I am only playing the generations with a noticeable change (with a few exceptions) to keep the video small. The closer to the end, the more sparse positive change occurred. This is a symptom of a static goal. I am convinced that a dynamic open-ended goal would see a more linear rate of change.

Pandorum-Final-One-Sheet16. Possible Applications

This whole experiment is meant to prove the possibility of using something like Virtual Selection to bring improvements to virtual infrastructure and cloud computing. I can image a development environment where modules of an Enterprise Architecture design can be loaded and each generation is an iteration of the previous with a finite set of values mutated. Each generation of the model can be recorded and compared with Virtual Selection using metrics based on desired performance results. This would create a system that could organically improve upon natural development cycles in a way not possible in human processes. In addition, it is a way to do this using the existing virtualization constructs meant to house the end product (the virtual machine).

The possibilities are endless to the applications possible. To imagine a virtual machine running applications as a creature, to apply Natural Selection as a model to influence it; is just one of the many applications cloud computing will enable. I am hoping this project will help open your minds to cool ideas and maybe inspire you to try something new.

I plan on writing a front end report for this application and loading on my site. The goal is to show a live view of Virtual Selection running for readers to watch and observe with a simpler goal image. Whether I get this done or end up on the next project, only time and my own natural selection will tell.

As always, comments/complaints/ideas are welcome below.

.nick

Featured Tools VirtualAppliances

20 Comments Leave a comment

  1. First of all cool project.  Couple of thoughts and questions
    1). I think to really fit the coming cloud model you would need to add the idea of a collective.  Identical pieces functioning toward a common goal.  SAAS, and web services type apps will ultimately depend on a loosely coupled infrastructure of like systems, so if  you could organically improve the performance over time while keeping consistency across the collective you allow users and applications to roam across that infrastructure.  So, a group of systems that evolve identically over time.
    2) I am really struggling with idea of the Iterative improvement toward a defined end state.  How are you evaluating one generation to the next?  Is it progress toward a performance goal or set of goals?  Not sure if I am interpreting it correctly, but I am reading that you are evaluating how close the set of dimensions (variables) matches the goal and then over time randomly changing values of those that don’t match until you have a 100% match. 
    3) would be really cool for this type of thing to dump that genetic match into a CMDB which could then be used to generate pre-evolved clones 🙂

    • 1) Cool idea
      2) That is pretty close. Actually, each generation can change even the correct bits. Everything is truly random but mutations are only a percentage of the genetic map with the majority a carbon copy of its parent. And each generation would have several that would have no changes. The environment would judge based on the amount of pixels that were closer to the goal image. The closest were replicated and had children. The farthest were deleted. The big deal is I knew the goal image. So to make the test relevant against a situation where the goal is open ended (it just is *better*) the creatures have no idea that they got one pixel right or wrong. All they know is a the image their parent gave them.
      3) Another killer idea

  2. Where are you running the C# CLR? I wonder if you could run the PHP bootstrap and the environment on bare metal and would they change the results? Also, is the environment virtualized?

    • Everything is virtualized (the true point behind all of this). Also, baremetal wouldn’t give me the ability to rapidly clone using vSphere and defeat the purpose of the experiment.

      The L&D Engine ran from my machine with a Windows 2008R2 server serving as the Environment.

    • Everything is virtualized (the true point behind all of this). Also, baremetal wouldn’t give me the ability to rapidly clone using vSphere and defeat the purpose of the experiment.

      The L&D Engine ran from my machine with a Windows 2008R2 server serving as the Environment.

  3. Truly mind boggling project Nick. Great job! I can’t wait to see where this goes. Once the OS is also abstracted, this could be applied to a code base that self-evolves and improves over time.

  4. Nick, this is simply awesome out of the box thinking.

    Reminds me of what Veracode is doing with code testing – using lessons learned from earlier unrelated code reviews to automatically fix new code submissions, so that the more time the system runs, the more secure the code it spits out.

    It seems that the work we did on DCML – Data Center Markup Language – might be helpful. That was a proposed standard championed by Opsware (LoudCloud at first, Andreessen’s post-NetScape company) and EDS (now Dell, and the acquirer of LoudCloud’s hosting division). I joined the DCML standards body because of my work on auto-provisioning clouds at Exodus and Speedera/Akamai. Anyway, the goal was to be able to use XML to describe the complete state of a data center, including physical topology, server/OS/app configuration, interconnectivity, and state. DCML is now part of OASIS, and it has not evolved to my knowledge to include virtualization, although it would easily work for that too.

  5. I’m having trouble understanding what sets your method apart from traditional genetic algorithms. I understand that you’re using the virtualization infrastructure to manage the instantiation, reproduction (“cloning”) and termination of your chromosomes (“creatures”). I also see that you introduce random mutations and that your fitness function compares against a known ideal.

    Is the image representative of some deeper state of the virtual machine’s configuration or is it really just a bitmap image that you randomly seeded and evolves closer to the target? It seems like maybe you’re sacrificing efficiency (much slower to spawn brand new VMs instead of processes/threads in an existing GA framework) for ease of setup (you’ve minimized the amount of code you have to write).

    Further, you have the luxury of sidestepping the classic problem that GAs have which is the tendency to converge on local optima since your ideal end state is known.

    I’m not trying to hate, I’m really just wondering if I’m missing something. If you’re demonstrating that you’re using the virtualization mechanisms to do some basic GA, then that’s cool in the making-it-do-something-it-wasn’t-intended-for way but it doesn’t seem practical for evolving solutions to real problems.

    • Thanks Robert,

      The point of this is not to reinvent genetic simulations. It is to apply the logic to a different construct. In this case the virtual machine/cloud computing. And yes, the image is just meant to easily show a simple progression towards a static state. I am less concerned with GA problems directly because I am more concerned with results than achieving a simulation.

      I think you are looking at this from the perspective of trying to represent true GA in a model. Virtual Selection is not meant to accomplish this at all.

      This is created to illustrate a way to use new logical abstractions of previously physical components to do things much more similar to purely logical constructs. This is way to enable automation of strict constructs such as operating systems and applications configurations with macro-level testing using GA-type models.

      I called out in the article that this has been done in development designs in memory before and that this is strictly a test to prove viability against asynchronous results from a virtual machine construct. With any kind of easily modeled logical model this is NOT an effective method by any means. I could have done this whole simulation in code and achieved the image in an extremely short amount of time. But that would have been a worthless model to apply to a virtual infrastructure application.

      In the world of web servers, enterprise architecture modules, or even online gaming; this could be used to quickly test finite configurations with either static or open-ended metrics. Especially in scenarios where modeling logically is difficult at best, e.g. try modelling a Windows 2008 R2 in memory 🙂

      I had already tested this against a web server farm with similar results. I chose the image generation example to illustrate the simple concept.

      Nick

      • Thanks for the reply Nick.

        Cast in that light I could see some utility in things like automatically tuning server performance. I suspect the level of success there would depend on what levers were available and how closely the simulation mimicked real-world conditions, but I wouldn’t be too surprised if you could come up with some interesting findings based only on mutations of sysctl settings and off the shelf load testing tools.

        Nice work.

%d bloggers like this: