In the previous blog, we looked at how the Apache Software Foundation (ASF) works. The different software developed under the ASF like Hadoop, Pig, Hive, Spark & others are becoming the foundations for building platforms. There is no denying that without the ASF the IT world would have been not a bit but entirely different. ASF is entirely defining the way we think of what software is, but also how it should be built. It’s all good till now, the only gripe is that much of focus in the ASF projects is about building new and cool features, but less about usability.
It’s not easy for those who want to get started with Big Data to install the myriad of Apache Big Data software and integrate them. There are entire companies (like Cloudera, Hortonworks, MapR and others) whose main purpose is to get the different software from the ASF and make sure they play nice to each other. For those who are interested to get started with Big Data, but are stuck with the installation and configuration the following have been created by Collabera TACT.
Big Data Virtual Machine (VM): The VM uses Ubuntu and the different Big Data software like Hadoop, Hive, Pig, Sqoop, Oozie etc are already installed and configured so that any Big Data enthusiastic can easily get started. The VM runs on a Laptop/Desktop with a minimum 3 GB RAM, 20 GB of Hard Disk and a processor which has been bought within the last 3-4 years.
All the software used in the Big Data VM are a free and open source, so there is no expiry time. The good thing about the VM is that it is self-contained and can work in an offline mode, there is no need to be connected to the internet. For those who are really curious about the Big Data VM, below is how the different software are stacked together to get the desired productivity.
Big Data Cluster: While the VM runs on the local machine. It is like a personal edition of the different Big Data software and helps one to get started with Big Data very easily. Collabera has also created a cluster (a group of machines) using the Cloudera CDH software. This cluster mimics the production environment with multiple machines. It should be possible to login from the gateway remotely from the local machine and run MapReduce programs, put/get data in HBase and other Big Data activities.
Since it is the production-like environment and that multiple users would be accessing it at any point in time, limited access will be given to the Collabera cluster. Access to the Big Data Virtual Machine (VM) and the Cloudera Big Data Cluster are provided for those who enroll for a Big Data courses with CollaberaTACT. For the VM there is no limit on how long it has to be used, the cluster would be provided for a duration of two years for all the participants from the beginning of the course. Here are more details about the different course offerings from Collabera TACT.