Eucalyptus, a “self-build” Amazon Cloud

Open Source virtualization is not just about Xen and KVM any more.  The innovation is now elsewhere, in companies like Eucalyptus Systems Inc., founded and venture-funded in 2009 and based in Santa-Barbara.

Eucalyptus Systems is a cloud computing company. Its product, Eucalyptus, is not in itself a Cloud, rather it is a software stack that when added to a standard virtualized data-center or co-located server network, turns it into a Cloud which looks exactly like the Amazon Elastic Compute Cloud (EC2). It is a “self-build” Amazon Cloud kit. Just add hypervisor.

Eucalyptus Systems seems most focused on Enterprises and/or Outsourcers doing private clouds in datacenters, Service providers could also use Eucalyptus to build their own public cloud to compete with Amazon (although no billing software is included with Eucalyptus).

Eucalyptus is an additional layer on top of the virtualization technologies of vMware, Xen and multiple varieties of Linux, and provides APIs to this layer which make it look exactly the same as Amazon Cloud.   Whilst Eucalyptus is not wedded to the Amazon Cloud API, its use has two main advantages

  • Customers  can make use of the tools ecosystem of the Amazon Cloud.
  • Customers can build a part-private part-Amazon cloud (a phenomenon known as cloud-bursting),

Amazon EC2 works through the instantiation of pre-configured virtual machine images on-demand. These images are sourced from and dynamically connected to an underlying storage system, and network access is dynamically configured for them as they are instantiated.

Amazon EC2 is based on a lot of Open Source software, but because Amazon does not distribute code, it is exempt from the provisions of the Gnu Public Licence which might require onward re-distribution of the actual EC2 codebase.  Eucalyptus is a separate codebase, a re-implementation of the same APIs.  Eucalyptus is architected around five components which together mimic EC2. The components each speak WSDL APIs and inter-authenticate through WS-Security.

The Cloud Controller is the entry point to the cloud.  It provides the external web services API which allows instantiation of images, and performs load-balancing amongst physical hosts, as well as providing a gateway to external clouds.  It is the gateway by which users access another web service which offers a cloud Management Platform.

Clouds are comprised of multiple clusters (think of these as typically being internal IP subnets) each of which is managed by a Cluster Controller which speaks to an agent known as a Node Controller, which sits on each of the physical machines. The node controller’s job is to execute, configure and terminate the instances of virtual machines which run on the physical nodes according to requests from the Cloud Controller. It will use the underlying APIs of the virtual environment, just as regular 3rd’party management or automation tools might. Its function is to map the API calls coming in at the Cloud API level into a corresponding set of calls at the relevant Virtual Infrastructure API.  It dynamically fetches and configures VM instance images (kernels, root file, Ramdisk, etc.) and configures the local networking on the host.

Images are stored and retrieved through the Amazon Machine Image image-management interface which is layered onto the S3 API, one of the two Amazon Storage APIs.  S3 provides a simple put/get interface with no locking or guaranteed ordering of transactions.  It is hugely scalable, but given its “eventually-consistent” semantics is only applicable for certain forms of transactions (e.g. Web Services).  The other API is known as Elastic Block Storage.  This is essentially a remote file system, accessed through traditional filesystem APIs but localized to a single VM.  The file system can be persisted and/or snapshotted via the S3 interface.

Eucalyptus implements the S3 API through its own storage stack called “Walrus” and the EBS persistent storage abstractions through its Storage Controller. Both of these systems are written as independent service components that deploy as part of a Eucalyptus installation.

Eucalyptus does not implement the two other services, Amazon SimpleDB and Amazon Simple Queue Service (Amazon SQS) as part of the cloud platform itself because equivalent tools are available that can be run in Eucalyptus-hosted VMs.

Eucalyptus was derived from a UCSB research project and is licenced under an “Open-Core” model where the core technology is Open Source under GPL, and there are additional modules which are commercially licenced to form Eucalyptus Enterprise Edition. The only feature for which a commercial licence is currently required is  the support of vMware virtualized servers.  Open Source hypervisors are free.  Additional features may be added to the Enterprise Edition in due course, and of course it provides a support contract.

The Open Source version is bundled with Debian Squeeze and  Ubuntu 9.04 (Jaunty) and will be part  of the next release 9.10 (Karmic Koala) and the 10.04 LTS release (Lucid Lynx) next Spring.  Eucalyptus Systems themselves provide integrations with the other main Linux distros.  The Open Source version has a sizable community, and the Enterprise Edition is sold through a mix of direct sales and strategic systems integrators, in the manner of a traditional enterprise software product.

Our interest in Eucalyptus is that it has the potential to be game-changing. The emergence of a broadly-adopted API can often be associated with a significant market transition.  The key factor here is that the owner of the application infrastructure within the cloud does not need to concern him or herself with the APIs to the underlying virtualization layer that implements the cloud.  Rather they work to the Cloud APIs.

The higher level API is used to encapsulate and subsume any underlying APIs.  If the Cloud API is widely adopted, not only is the hypervisor subsumed (and thus commoditized), but much of what we now consider virtualization management is commoditized too.

One other interesting aspect of the Amazon Cloud API is that it may provide an opportunity for standardization. There are not that many truly cross-platform standards in the virtualization arena.  There is libvirt, a de-facto standard API to a linux VM which Red Hat has promoted, but it doesn’t work on vMware or Hyper-v; and there are standards emerging around Virtual Machine packaging (DMTF’s OVF), but that is a small piece of the overall puzzle. The problem is that standards emerge through useage, but when there is too much usage there are entrenched positions that are difficult to resolve into standards.

We are the point in the adoption curve where sufficient useage of the Amazon API has occurred to understand its efficacy. Furthermore, since the definition of the Amazon APIs is based on existing open web standards, they aren’t bound into specific implementation platforms and languages (like, for example, WMI on Microsoft), and can be universally implemented. In fact there are at least two other standards initiatives in the area, involving vMware, IBM and Microsoft, and it should not be beyond this group to get together and sort out a common standard. Eucalyptus Systems seems content to take whatever API Amazon gives it, but would likely gain significantly from standardization of the Amazon Cloud API.  Standards would bring confidence to end-users and to the open source and commercial tools ecosystem around Cloud.

The key difference Eucalyptus Systems sees between a standard virtualized datacenter environment and a cloud is the separation between the configuration of the layer of physical machines, and the configuration of the layer of virtual machines.  One group of administrators cares about physical machines and about providing an infrastructure up to and including the Cloud API.  Another group of administrators cares about the virtualized infrastructure that is instantiated and configured and persisted through the Cloud API. The administrator of the Cloud provides a service to the administrator of the application infrastructure, who in turn provides a service to the user.

Assuming some momentum builds around the use of Amazon Cloud APIs to bridge the gap between Public and Private clouds, it is worth re-visiting our recent post “Cloud Computing and the End Run around IT – Here We Go Again” which discussed the cultural issues around adoption of Cloud computing by enterprises.

Whilst some cultural barriers remain (will IT really be content to own the private Cloud, ceding ownership of the virtualized application infrastructure to the Department, in a re-run of the 1990s departmental server? ) Eucalyptus at least helps with one scenario we considered, the use of a public cloud for quick Departmental application development, followed by bringing it into IT’s datacenter to operate in production.  Eucalyptus can facilitate that transition because, rather than providing a data-center-like environment in the Cloud, it provides a cloud-identical environment in the data-center.