Become an expert Hadoop Administrator by getting your hands-on Hadoop Clusters, including monitoring the Hadoop Distributed File System and Planning & Deployment. The course will also take a hands-on approach to the Hadoop Ecosystem, which consists of YARN, Map Reduce, HDFS, Cloudera Manager, Hadoop Cluster with Hive, HBase, Pig, Flume, and RDBMS using Sqoop.
Become a Hadoop Administrator by mastering Hadoop Clusters! Cognixia’s Big Data Hadoop Administrator course is specifically designed to provide a hands-on experience to install, configure, and manage the Apache Hadoop platform.
By the end of the module, the student will be able to understand the basics of big data, and will have the foundation of Hadoop daemons and Hadoop architecture.
- a.Understanding Big Data Basics
b. Big Data Use Cases
c. Introduction to Hadoop
d. Understanding Hadoop Ecosystem
e. Introduction to HDFS
- a. Introduction to Namenode
b. Introduction to Datanode
- a. Introduction to Secondary Namenode
- a. Introduction to MapReduce
- a. Introduction to JobTracker
b. Introduction to TaskTracker
- a. Summarizing Hadoop Architecture
b. Roles and Responsibilities of a Hadoop Administrator
By the end of the module, the student will be able to create a multi-node Hadoop cluster. Preparing students to create Hadoop clusters, this module gives a deep understanding of how Linux works, how to setup virtual machines, and how to set up the password-less SSH.
- Linux internals
- i. Commands that are required
ii. Linux basics
- Hadoop Cluster Installation Pre-requisites
- Pre-requisites of Hadoop Installation
- i. Software Downloads
ii. Preparing your VM
iii. Enabling VM with VMware
iv. Understanding mandatory changes in the operating system
- Installation and Configuration
- i. Understanding Hadoop cluster installation modes
ii. Understanding Hadoop Version 1 installation and configuration
iii. Password-less SSH setup
- Hands-On Practice for creating a Hadoop cluster
- Helping individually in practicing Hadoop cluster installation
- By the end of the module, the student will be able to understand how to plan a production cluster of Hadoop. Students will understand the hardware and software requirements of a Hadoop cluster, performance tuning after cluster creation, and benchmarking.
By the end of the module, the student will be able to administrate a Hadoop cluster. Students will understand how to copy data from one Hadoop cluster to another Hadoop cluster, how to use different Hadoop schedulers to run jobs, how to perform backup and recovery of metadata, data, configurations, and application data, and how to recover cluster data.
By end of the module, the student will be able to understand how the next version of Hadoop and YARN works. An understanding of the new features of Hadoop Version 2 and Yarn framework will also be provided, and the knowledge to deploy a Hadoop 2 cluster in a pseudo-distributed and multi distributed mode.
- i. Hadoop 2.0 new features
- i. Understanding Resource Manager
ii. Understanding Application Master
iii. Understanding Node Manager
iv. Understanding Hadoop 2 Job Execution Framework
- Hadoop 2 Multi-node cluster creation
- i. Pre-requisites of Hadoop Installation
ii. Software Downloads
iii. Preparing your VM
iv. Enabling VM with VMware
v. Understanding mandatory changes in the operating system
vi. Installation and Configuration
vii. Understanding Hadoop version 2 installation and configuration
viii. Passwordless SSH setup
By the end of the module, the student will be able to learn how to achieve high availability, how to enable Federation in Namenode, and what the various improvements in Hadoop 2 are.
- Practice Hadoop 2 Multi-node Cluster Creation
- Helping individuals in practicing Hadoop 2 cluster installation
- a. Sample Yarn Job execution
c. Understanding Issues of Hadoop 1
d. Understanding improvements in Hadoop 2
e. Namenode Federation
- Enable segregation of HDFS using multiple Namenodes
- Namenode – High Availability
- i. Achieving Namenode High-Availability using Quorum Journal Manager
ii. Achieving Namenode High-Availability using Network File System
- Implementation of NN High Availability
- Helping individuals achieving Namenode High Availability
By end of the module, the student will be able to administrate the basics of Hadoop ecosystem components like Hive, Hbase, Sqoop, Flume, and Pig.
- Hadoop Ecosystem Introduction
- Understanding the integration of Hadoop ecosystem
- Touchbase with Hive
- What is Hive?
ii. Architecture of Hive
iii. Understanding Hive meta-store concepts
- Understading HBase Basics
ii. Understanding HBase storage Model
iii. Understanding HBase Architecture
iv. Cluster Installation and Configuration
- What is Pig?
ii. How Pig integrates with Hadoop cluster?
iii. Demo of Pig Jobs using MapReduce
- What is Sqoop?
ii. How to import and export the data from Sqoop to RDBMS?
iii. Example of Sqoop jobs using MySQL
- What is F?
ii. Sample Flume jobs
By the end of the module, the student will be able to build a multi-node Cloudera cluster using Cloudera Manager, will know how to achieve high availability, and how to add a new node into the cluster using Cloudera Manager.
- Understanding the internals of Cloudera Manager
a. Understanding the automation of Hadoop installation using Cloudera Manager
b. Understanding Cloudera Hadoop Distribution and Cloudera Manager
c. Understanding the underlying directory structure of Cloudera Hadoop
d. Cloudera Hadoop Cluster Installation – CDH
Interested in this course? Let’s connect!
Yes, the course completion certificate is provided once you successfully complete the training program. You will be evaluated on parameters such as attendance in sessions, an objective examination, and other factors. Based on your overall performance, you will be certified by Cognixia.