The previous blog discussed the different ways of setting up a Big Data cluster. The challenges with many of these approaches are that it requires multiple machines in a cluster connected to each other. One of the approaches which were discussed was to use virtualization to set up a small Cloudera cluster as shown below on a single machine. This blog will look into this in a bit more detail.
Any virtualization software like VMWare/VirtualBox can be used for running multiple operating systems. But, VirtualBox from Oracle is free and easy to use. In the below screenshot, there are six virtual machines with different roles as guest operating systems not yet started. Only one of the guest OS has to be installed and the others can be cloned. All the virtualization software support the option of cloning.
Virtualbox server not started
In the below screen all the guest operating systems have started. The Cloudera cluster services are configured to start automatically on the OS startup. So, the Cloudera related software will start automatically. The different Cloudera cluster services require a different amount of the resources (CPU and Memory). For each of the guest OS, the appropriate resources should be allocated.
Virtualbox server started
Once all the machines start and the Cloudera cluster initialization has been done, the Cloudera Manager Web UI can be accessed as shown below. Because we are trying to run everything on the same machine it might take some time for the Web UI to be available.
Cloudera manager login screen
Once logging into the Cloudera Manager, the hosts can be managed and monitored from the hosts’ tab. Existing nodes can be deleted, new nodes can be added and monitored from with the Cloudera Manager UI.
Cloudera manager hosts
Similarly, from the Clusters tab, new services can be added and the existing services can be monitored and managed. For ex., an Accumulo can be installed, the default replication factor of the blocks in HDFS can be changed. All these activities can be done from the command line also, but the Cloudera Manager UI makes it easy to monitor and manage the Big Data cluster in an easy fashion.
Cloudera manager services
The whole setup was done an AMD FX 8320 with 16GB RAM. In the below screenshot the resource utilization can be observed. Tweaking the different guest and host OS really helped running a simple cluster on a single machine.
The above setup is not running Big Data processing in production, but to learn on to create a cluster and manage the same. This particular configuration is really helpful for those who wanted to get started with the big data administration.
In a future blog, we will look into the sequence of detailed steps required to set up the above cluster.