Banner

Introduction to Hadoop Administration

Live Classroom
Duration: 3 days
Live Virtual Classroom
Duration: 3 days
Pattern figure

Overview

This three-day introductory course focusses on helping participants gain a thorough understanding of maintaining a Hadoop cluster and its components. Compared to other cluster architectures, Hadoop s designed for massive scalability and has superior fault tolerance. The course also covers how to install, configure and maintain Hadoop on Linux in various computing environments.

What You'll Learn

  • Learn to install, configure and maintain the Apache Hadoop framework
  • Explore MapReduce, YARN and Spark
  • Explore Mahout and MLib as well as other frameworks
  • Explore Hadoop architecture (MapReduce, YARN, HDFS, Spark, Cassandra, HBase, Pig, Hive)
  • Install Hadoop
  • Test-run Hadoop programs (Explore basic tests)
  • Learn to optimize and performance-tune Hadoop
  • Explore installing Hadoop for the cloud and HBase (optional)

Curriculum

  • Hadoop history and concepts
  • Ecosystem
  • Distributions
  • High level architecture
  • Hadoop myths
  • Hadoop challenges (hardware / software)

  • Selecting software and Hadoop distributions
  • Sizing the cluster and planning for growth
  • Selecting hardware and network
  • Rack topology
  • Installation
  • Multi-tenancy
  • Directory structure and logs
  • Benchmarking

  • Concepts (horizontal scaling, replication, data locality, rack awareness)
  • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
  • Health monitoring
  • Command-line and browser-based administration
  • Adding storage and replacing defective drives

  • Parallel computing before MapReduce: compare HPC versus Hadoop administration
  • MapReduce cluster loads
  • Nodes and Daemons (JobTracker, TaskTracker)
  • MapReduce UI walk through
  • MapReduce configuration
  • Job config
  • Job schedulers
  • Administrator view of MapReduce best practices
  • Optimizing MapReduce
  • Fool proofing MR: what to tell your programmers
  • YARN: architecture and use

  • Hardware monitoring
  • System software monitoring
  • Hadoop cluster monitoring
  • Adding and removing servers and upgrading Hadoop
  • Backup, recovery, and business continuity planning
  • Cluster configuration tweaks
  • Hardware maintenance schedule
  • Oozie scheduling for administrators
  • Securing your cluster with Kerberos
  • The future of Hadoop
waves
Ripple wave

Who should attend

The course is highly recommended for –

  • Hadoop administrators
  • Software administrators
  • System administrators

Prerequisites

Participants need to be familiar with navigating the Linux command-line and have a basic knowledge of Linux editor, such as, VI/nano, etc. for editing code.

Interested in this Course?

    Ready to recode your DNA for GenAI?
    Discover how Cognixia can help.

    Get in Touch
    Pattern figure
    Ripple wave