Microsoft is in the news for finally ending its extended, months-long preview period for HDInsight and rolling out the welcome mat for big data workloads in the Microsoft Windows Azure cloud computing platform. Just in case you’re not quite sure what exactly Hadoop is, let me give a quick definition. Hadoop is an open-source software framework that was developed by the Apache Software Foundation. This platform is written in Java with a specially designed Hadoop Distributed File System (HDFS), created for cross-platform operating systems. All of the Hadoop modules are designed with the fundamental assumption that hardware is going to fail; because of that, the software of the framework can and will handle the hardware failure of a single machine or rack of machines. In a side note on its history, Apache Hadoop’s MapReduce and HDFS components were originally derived by developers from Google’s own MapReduce and Google File System (GFS).
Hadoop’s primary purpose is batch processing, which has made it one of the dominant big data analytics platforms. It owes a lot of its popularity to its SQL-friendly hooks as well as to its enablement of real time interactive analytics and its ability to scale its ecosystem of products. In fact, back in 2010, Facebook claimed they had the largest Hadoop cluster in the world, with over 21 petabytes of storage; by 2012 they had announced that the data had grown to over 100 petabytes.
It seems like almost all of the major technology companies, like Amazon, Google, IBM, EMC, Netflix, Twitter, and VMware, to name a few, have jumped on the Hadoop bandwagon. It is no surprise that Microsoft would become part of that list. It has been stated that Microsoft’s long-term strategy is to bring big data to a billion people. This is going to be a great fit within the Microsoft world, especially with the SQL-friendly hooks available with Hadoop. It also appears that most companies that have jumped on the Hadoop bandwagon fully realize that Hadoop is currently the cornerstone of how value is realized from big data. Microsoft realizes this and has been investing in the technology to ensure that Hadoop will be an integral part of the Microsoft Enterprise, as well as of Microsoft’s future, by working to integrate HDInsight into the Microsoft Office platform and ecosystem.
How does Microsoft plan to achieve its stated goal of bringing big data to a billion people? The answer is by making it easy to spin up and deploy Hadoop clusters as a new deployment or by scaling out a current cluster. This cluster should take a just a few minutes to create and deploy the virtual machines that make up the Hadoop Cluster.
Although this is not new news about Microsoft’s HDInsight, since it has been available in its preview period for about six months, it is a milestone for Microsoft to take HDInsight to general availability and continue the progression of the technology into the Microsoft world of technology. Microsoft is touting the success of HDInsight with partners like the city of Barcelona, Spain. The city is employing the technology to comb through data on traffic patterns, social media mentions, and other sources to drive decision making in transportation, security, and spending. Back here in the states, computer scientists at Virginia Tech are using HDInsight to provide cost-effective access to DNA sequencing tools and resources.
In my opinion, the Hadoop framework is clearly the future of big data analytics. Would we really expect anything less from Microsoft when it makes up its mind to go after and own a technology? Virtualization is another example that comes to mind.