I am very prone to drifting off into thoughts about patterns in real life and how they correlate to things I deal with in my work life. I am fascinated by the thought of the constantly blurring line between ourselves and technology. It is really amazing to think about how social and mobile technologies have changed they way we work, communicate, and relax. I am just as guilty as the next guy of constantly tweeting during my vacation, contacting someone through Facebook only, or the fact that I have not written an actual letter in over ten years.
It is out of this day-dreaming that I often start thinking about current cloud designs and how I would change them. In my mind both public and private cloud have several core demands that have been around for a while and are an essential part of expectations in any computing utility. A simple list of these would be things like being cost effective, performant, reliable, secure, and scalable. I could spend a large amount of time defining the rules about what makes a good “cloud”. But instead I will move forward with the assumption that a cloud service provides the same or better relative utility while being cost effective to the consumer. You can find a great many blogs and personalities out there that do a much better job of defining a robust cloud service offering. My thoughts our more focus on how that actually happens.
I am an oddball when it comes to technology. I do a lot of different things and have made it a habit to refuse to fit in any box. I am easily inspired and even more easily entertained. I have a reasonable amount of experience in fun infrastructure topics like networking, storage, systems, patching, and security. I also have a good solid background working with the software engineering side and terms like software development life cycles, integration, regression, unit testing, APIs, monads, semaphores, enterprise service bus, builds, and so much more I cannot get out of my head. From this I have been able to look at the whole stack in technology terms and have conversations with people from multiple responsibilities. And what I see with most deployments is a mindset still focused around a *stack*, even in cloud computing.
I am not saying that the approach used today to stack-build most private and some public cloud is completely wrong. I just ask myself, what if it there was a better way? In my mind we waste effort in making silos of people and processes. We waste effort in segmentation of knowledge and designs. Ever worse, we adopt principles that may define cloud designs layers-in-a-stack only approach. The greatest benefit to the logical abstraction brought by virtualization in compute, networking, storage and more; is that we can now utilize these models via constructs normally reserved for software engineering designs. Of these, I believe logical automation is one of the biggest benefits.
If you think about it, logical automation has already been around a long time in the software engineering world. It would be ludicrous for a modern software application or enterprise system to need a user to manually type code to patch to a new version. Or imagine if you had to manually modify registry entries every time you needed to change a setting in Outlook. Because modern software language and frameworks allow us to orchestrate software to meet maintenance and operational needs; we install things with a few clicks and drastically change behavior the same way. And we use automation via software to run some of the most complex systems in the world. This is important because it is the basis of my theory on why a change is possible. If my server’s compute resources can be divided and subdivided, storage latency can be maintained dynamically, and network traffic can be adjusted, all based on frameworks, languages, and APIs; then I can control these components in much the same way I control my applications above. It is out of this push towards generalization using abstraction that I thought of the Organic Cloud.
My idea came from reading a discussion on one of my favorite websites. It was around the feasibility of humankind ever truly reaching a far away star system given what we know of the limits of physics. Out of reading this, I realized my own personal opinion is that ultimately, the best vehicle to travel outside of our solar system will be organic-based. It is amazing to observe the adaptability and tenaciousness of our world around us. Take an organic vehicle designed to adapt to varying degrees of radiation, foreign objects, gravitational forces, and heat; and you have a possible survivor for such a long voyage. What is important is the idea that any one portion of the vehicle is able to sustain damage without dying, heal itself, and adapt to changing conditions. This got me thinking about the nature of ourselves. What makes *me* adaptable? What makes my body self-heal? Or a better question is: what is the design structure that my body molds to that allows this? With the goal of building cloud services that can provide high-level utility, at a high reliability, while dealing with random failures throughout the stack; how can modeling cloud designs to an organic model help?
Out of that thought process, I considered how almost every component of our body is built around a simple design at the root: the human cell. Each cell can be of a specific type that performs a specific function. Take these specific tiny constructs with a multitude of purposes, organize, and manage them and you have a robust system that is the basis for all organic life. And interesting enough, other than early fertilization no point of a human life can fail if a single cell dies. This bring an interesting correlation if you consider this in building for reliability in a cloud service. So out of this I will present a couple ideas on how we can synthesize a cloud to be more organic rather than artificial in design.
As more and more of the lower levels of infrastructure make their into legitimate components driven as abstracted virtual machines or services, we can change the way we look at a *layer* in the stack. I propose, that we choose design principles that assume every layer is built of Basic Units. A Basic Unit is the representation of functionality alone. This functionality is open-ended and is explained further-on in the post. And similar to the way organic cells work; these basic units follow the rules which are implicit throughout the service. The following rules are meant to be strict in concept but loose how they are implemented. The rules for the basic units are:
Rule #1 : They are allowed to die.
Even better, we assume they will die. From the pages of telephony history and the models used in Erlang and Akka, we need to assume that any Basic Unit can fail. If we assume this, we automatically adopt design patterns that won’t place dependencies on a single unit. Or more importantly, we design functionality as provided by groups of units. We see each basic unit as expendable and easily replaced. This does require managing atomicity at the data-level (another Basic Unit function) when dealing with certain cases. But most of these problems themselves can be broken down further into the Organic Model. Accepting death has benefits when coupled with what automation allows for management (more on this later).
Rule #2 : A layer is made of Basic Units
Just like my skin is made of multiple cells, each layer of a service designed to produce a specific utility should depend on the rule above to be built as a group of interconnecting Basic Units. This means we have to think of layers as interconnecting units as well as a unique services to needs above or below them in a design stack. In a lot of ways, designs in many layers already fit this; networking switches, clustered file systems, and even web farms are designed as a service provided by a group of objects designed to handle failures. A layer can be built of multiple types of multiple Basic Units and with complex services may have dozens or a hundred different types. The end layer produces a similar result as a common stack-based design, albeit with many possible new benefits.
Rule #3 : A layer must self-heal lost/failed units
This is definitely the big one in the list. In order for the layer to be resilient, it must be able to easily replace lost Basic Units that have had some failure (we assume they will). There needs to be built-in logic which deals with this in each layer. Even if the end result requires user intervention, we still must have something in place to remediate and monitor. I cover how this done later in the post.
Rule #4: A basic unit can be anything
Being the virtualization guy I am I would love to insist that every Basic Unit is a virtual machine. However that is too strict a rule. A virtual machine is a great example of something that can be designed as a Basic Unit for a multitude of possible uses. In cases where encapsulating the functionality (core driver) into a VM is easy and can follow the rules above; then it is a time saving way to create our unit. And in the case of legacy x86 applications that must be adapted into the model, is a must. But at some point things have to become physical. It is at the lowest infrastructure layers that hardware must be designed to be scalable, modular, programmable, and easily replaceable to fit our Basic Unit model. In my own personal opinion: I think technology like Cisco UCS is a great example of something that can be extended with this design. Take the stateless of the UCS blades and use other Basic Unit types (support units explained below) that manage and monitor the core Basic Units (blades). You use this application of the design to supply a cell-based resilient layer of compute resources to layers above. This gives us the Basic Unit flexible model by using extensions of logical abstraction used by UCS to create blades that are: allowed to die, make up a layer, provide a similar service, and can be self-healed by automatic replacement using stateless identification (given reserve capacity).
These same rules also apply further up the stack. If a function is extremely efficient as a memory-resident process within an operating environment then a Basic Unit can be as small as single software process. We ensure that the VMs (or stateless physical servers) that run the processes follow the same rules but design for a layer within memory or partitioned memory inside a few or many VMs (but never just one). This is important for efficiency of resources. In the end we don’t place a Basic Unit into a single box. We design for the need while being cognizant of efficiency and following the simple rules.
The Support Unit
At this point I have ventured a medium distance from mainstream design patterns. I have taken common design patterns in use already, modified the picture, and applied them broadly with my own perspective. But it is the broadly applied design that enables my favorite part of this idea.
When you take a look at organic models you find that the cells with primary purposes are supported by systems of cells that provide essential management and support. In my proposed design I call Basic Units dedicated to assisting core units: Support Units.
The Support Units are the foundation of what makes the Organic Cloud work. They provide the automation, monitoring, management, and even security necessary to enable a shifting organic-like mass of complex services an functionality. With any design the support services must be able to be as scalable and reliable as the core utility. The Support Unit it is based on the Basic Unit model with the addition that its core functionality is dedicated to supporting one or many Basic Units. I will walk through a couple examples to illustrate.
Since we accept that any Basic Unit can fail we accept that attrition will, given an amount of time, reduce the ability for layers to provide their service. We must have a system in place that: identifies units that will or are failing, possibly kills units gracefully(units may self-kill), and successfully spawns new units to prevent attrition. Regeneration Support Units for this would follow the same rules as regular Basic Units while providing the before-mentioned functionality. They would have functionality designed to understand how to kill and spawn new units of multiple on single types depending on the layer they reside. They would themselves manage their own level of service and self-kill and spawn brother/sister units. This would enable recovery from failures by a organic self-healing design. These units would could also be fed information on how to measure and evaluate their units by external layers tasked with adapting to change.<
Similar to the Regeneration Support Units, the Monitoring Support Units would be charged with gathering information across all layers. These would not only gather independently but share with each other creating a shared source-of-truth about environments. They would provide functionally to groups of other units to allow them to perform core or support functionality. Specialized versions of these units could supply specific information such as geographical location or closest neighbor information that could be critical in cloud service needs. Performance gathering would be one of the critical functions needed as well. Metrics must be gathered not only on the health of the units within each layer; but also on the overall health of the service provided. This allows for interesting abilities with using service demand information to orchestrate automatic scaling of units for a layer.
A major component of our own organic bodies is our self-defense system against the attacks from microorganisms. I feel this is one of the strongest examples of a well-designed system able to adapt to rapidly changing attacks. This brings interesting theories to how you could use the Support Unit model to audit, identify, isolate, and remediate security attacks against core Basic and Support units. A Security Support Unit could be one of multiple types that are designed to gain information, directly and by hooking into Monitoring Support Units, and use this to audit activity. They would use this monitoring to identify rogue activity within the cloud service. They could use both existing knowledge of patterns that indicate an attack and by observing and correlating when an attack takes down a Basic Unit. The important ability is for the Security Support Unit to be adaptable to learn. Because the Basic Unit model requires that each type of unit have similar behavior, units dedicated to security can use this to isolate strange behavior and either alert or remediate with isolation. Since a single unit is never a failure point for a layer or service, our Security Support Units can be aggressive in killing suspect units and triggering Regeneration Support Units to replace them. The ideal design allows for the attack surface of a cloud service to shift so quickly that any attacks become extremely difficult to embed. And by having Security Support Units of multiple types that exist at multiple layers, we can use different collection methods for capturing network traffic, storage traffic, hypervisor commands, application API calls, and database queries. Again, we use a simple model with in the unit while applying it in a broad application. This allows us to have active and resilient security across all layers.
The Management Support Units would be a very broad group of many types. These would be the central orchestration systems similar to our central nervous system. There would be a need for many different types ranging from units designed to control storage replication polices to units designed to trigger updates within pools of application functionality Basic Units. Perhaps the coolest use of a Management Support Unit would be in managing the *size* of a layer of units. Lets say we have a set of Basic Units that provide a REST API that is used by external customers to make transactions. A Management Support Unit could be designed to gather information from Monitoring Support Units on the trending over time for requests to the API units. It would then communicate with the Regeneration Support Units responsible for the API core units and change the requirements for the amount of units needed. The Regeneration units would spawn more API units before a common busy request time and kill redundant units after. This would allow for a greater efficiency of resources by allowing the cloud service to adapt to demand by learning patterns.
When I mentioned the orchestration that could enable the Cisco UCS to use its blades as Basic Units, the added functionality that would enable this would be a Management Support Unit. A UCS is equipped with all the functionality to perform the needed task. However, a support management layer that utilizes monitoring and orchestration would be the key to extending the UCS to the Basic Unit model. It does this by automatically adapting to change and leveraging the UCS capabilities to react as needed.
I could go on and on with examples of how to use Support Units to provide a superior service alongside Basic Units. The use cases for simple things like networking, message bus, load balancing, and more are quite large. What is important about the Support Unit is that it is the collective design that enables the adaptation that is required. By building unique support services as viable options you start opening the ability to leverage them across the stack in new exciting ways.
As I type out my nirvana of a cloud service I am cognizant of the hurdles that would still exist. Things like managing updates of data schema, locking, control, and more would be obstacles to overcome in some portions of the stack. But even as I type this, many of these problems have and are being solved in new ways. Design patterns like NoSQL and event-driven architectures also offer interesting possibilities when coupled with organic modeling.
However, almost all of this blog post requires a drastic change in thinking for some groups responsible for areas of the stack (looking at you, infrastructure people). A design of this kind needs extremely strong generalists, innovative architects, and very strong communication within the core teams (DevOps time?). Because of the cellular nature of the design, APIs would be king and intercommunication would be one of the most critical aspects of the environment.
So there it is, my crazy idea for what my dream of a cloud service would look like. Organic Cloud is my crack at theorizing a new approach for what I consider a critical time period in information technology. I believe we have to learn to create services and systems with intelligence and adaptability in order to really make the next step as Human beings. I do want to clarify that this entire post is about looking at things differently. I am not saying that my ideas above should be taken strictly. I am just illustrating how the natural environment around us is such a powerful model for adaptation. And I strongly feel the success of virtualization and the newly adopted models of cloud computing are a great place for us to start exploring and modeling our designs after the natural world around us all.
This being one of my longer posts; I highly appreciate any comments or feedback.
Commentary Featured Life/Work Automation Cloud Computing Cloud Designs Enterprise Architecture Event-driven Infrastructure Management NoSQL orchestration Organic Cloud Regeneration Security Virtualization