Introduction to Hadoop Administration

Course Code: 1203



This three-day introductory course focusses on helping participants gain a thorough understanding of maintaining a Hadoop cluster and its components. Compared to other cluster architectures, Hadoop s designed for massive scalability and has superior fault tolerance. The course also covers how to install, configure and maintain Hadoop on Linux in various computing environments.

Schedule Classes

Looking for more sessions of this class?

Course Delivery

This course is available in the following formats:

Live Classroom
Duration: 3 days

Live Virtual Classroom
Duration: 3 days

What You'll learn

  • Learn to install, configure and maintain the Apache Hadoop framework
  • Explore MapReduce, YARN and Spark
  • Explore Mahout and MLib as well as other frameworks
  • Explore Hadoop architecture (MapReduce, YARN, HDFS, Spark, Cassandra, HBase, Pig, Hive)
  • Install Hadoop
  • Test-run Hadoop programs (Explore basic tests)
  • Learn to optimize and performance-tune Hadoop
  • Explore installing Hadoop for the cloud and HBase (optional)


  • Hadoop history and concepts
  • Ecosystem
  • Distributions
  • High level architecture
  • Hadoop myths
  • Hadoop challenges (hardware / software)
  • Selecting software and Hadoop distributions
  • Sizing the cluster and planning for growth
  • Selecting hardware and network
  • Rack topology
  • Installation
  • Multi-tenancy
  • Directory structure and logs
  • Benchmarking
  • Concepts (horizontal scaling, replication, data locality, rack awareness)
  • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
  • Health monitoring
  • Command-line and browser-based administration
  • Adding storage and replacing defective drives
  • Parallel computing before MapReduce: compare HPC versus Hadoop administration
  • MapReduce cluster loads
  • Nodes and Daemons (JobTracker, TaskTracker)
  • MapReduce UI walk through
  • MapReduce configuration
  • Job config
  • Job schedulers
  • Administrator view of MapReduce best practices
  • Optimizing MapReduce
  • Fool proofing MR: what to tell your programmers
  • YARN: architecture and use
  • Hardware monitoring
  • System software monitoring
  • Hadoop cluster monitoring
  • Adding and removing servers and upgrading Hadoop
  • Backup, recovery, and business continuity planning
  • Cluster configuration tweaks
  • Hardware maintenance schedule
  • Oozie scheduling for administrators
  • Securing your cluster with Kerberos
  • The future of Hadoop
View More


Participants need to be familiar with navigating the Linux command-line and have a basic knowledge of Linux editor, such as, VI/nano, etc. for editing code.

Who Should Attend

The course is highly recommended for –

  • Hadoop administrators
  • Software administrators
  • System administrators

Interested in this course? Let’s connect!