Banner

Apache Spark Primer | Spark Essentials, Components, RDDs & UDFs

Live Classroom
Duration: 5 days
Live Virtual Classroom
Duration: 5 days
Pattern figure

Overview

Apache Spark Primer is a one-day course designed to introduce developers to the practices and concepts of Spark programming. This course explores the wide-ranging concepts of Spark, giving participants the exposure working with the Spark Shell, using RDDs and also DataFrames. Participants further explore features and concepts such as NOSQL, Spark Streaming, Spark SQL, and Spark MLLib to learn how various pieces are integrated to build a larger application.

What You'll Learn

  • Use cutting-edge technologies in the data science spectrum, with an emphasis on Spark and related tools
  • Learn about the basics of Spark architecture and applications
  • Execute Spark Programs and how the core components of Spark are assembled to build whole applications
  • Create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames).

Curriculum

  • Hadoop ecosystem
  • Hadoop YARN vs. Mesos
  • Spark vs. Map/Reduce
  • Spark: Lambda architecture
  • Spark in the enterprise data science architecture

  • Spark shell
  • RDDs: Resilient distributed datasets
  • DataFrames
  • Spark 2 unified DataFrames
  • Spark sessions
  • Functional programming
  • Spark SQL
  • MLib
  • Structured streaming
  • Spark R
  • Spark and python
  • Exercise: Hello, Spark

  • Coding with RDDs
  • Transformations
  • Actions
  • Lazy evaluation and optimization
  • RDDs in Map/Reduce
  • Exercise: Working with RDDs

  • RDDs vs. DataFrames
  • Unified DataFrames (UDF) in Spark 2.x
  • Partitioning
  • Exercise: Working with unified DataFrames

  • NOSQL
  • Spark SQL
  • Spark streaming
  • Spark ML Lib
  • Demo/Lab [Optional]: Advanced spark overview
waves
Ripple wave

Who should attend

This is an intermediate level course, geared for data scientists, software engineers, data engineers or developers who have basic experience working with Python, R or Scala, who need to learn the essentials of Spark interaction.

This course is highly recommended for:

  • Project leads
  • Data scientists
  • Senior data platform engineers
  • Data solution architects
  • Software engineers
  • Data engineers
  • Software developers

Prerequisites

Participants must have a working knowledge of programming languages such as Scala, Python, or R.

Interested in this Course?

    Ready to recode your DNA for GenAI?
    Discover how Cognixia can help.

    Get in Touch
    Pattern figure
    Ripple wave