Apache Spark Primer | Spark Essentials, Components, RDDs & UDFs

Course Code: 1338



Apache Spark Primer is a one-day course designed to introduce developers to the practices and concepts of Spark programming. This course explores the wide-ranging concepts of Spark, giving participants the exposure working with the Spark Shell, using RDDs and also DataFrames. Participants further explore features and concepts such as NOSQL, Spark Streaming, Spark SQL, and Spark MLLib to learn how various pieces are integrated to build a larger application.

Schedule Classes

Looking for more sessions of this class?

Course Delivery

This course is available in the following formats:

Live Classroom
Duration: 5 days

Live Virtual Classroom
Duration: 5 days

What You'll learn

  • Use cutting-edge technologies in the data science spectrum, with an emphasis on Spark and related tools
  • Learn about the basics of Spark architecture and applications
  • Execute Spark Programs and how the core components of Spark are assembled to build whole applications
  • Create and manipulate both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames).


  • Overview of Spark
    • Hadoop ecosystem
    • Hadoop YARN vs. Mesos
    • Spark vs. Map/Reduce
    • Spark: Lambda architecture
    • Spark in the enterprise data science architecture
  • Spark component overview
    • Spark shell
    • RDDs: Resilient distributed datasets
    • DataFrames
    • Spark 2 unified DataFrames
    • Spark sessions
    • Functional programming
    • Spark SQL
    • MLib
    • Structured streaming
    • Spark R
    • Spark and python
    • Exercise: Hello, Spark
  • RDDs: Resilient distributed datasets
    • Coding with RDDs
    • Transformations
    • Actions
    • Lazy evaluation and optimization
    • RDDs in Map/Reduce
    • Exercise: Working with RDDs
  • DataFrames
    • RDDs vs. DataFrames
    • Unified DataFrames (UDF) in Spark 2.x
    • Partitioning
    • Exercise: Working with unified DataFrames
  • Advanced Spark overview
    • NOSQL
    • Spark SQL
    • Spark streaming
    • Spark ML Lib
    • Demo/Lab [Optional]: Advanced spark overview
View More


Participants must have a working knowledge of programming languages such as Scala, Python, or R.

Who Should Attend

This is an intermediate level course, geared for data scientists, software engineers, data engineers or developers who have basic experience working with Python, R or Scala, who need to learn the essentials of Spark interaction.

This course is highly recommended for:

  • Project leads
  • Data scientists
  • Senior data platform engineers
  • Data solution architects
  • Software engineers
  • Data engineers
  • Software developers


Interested in this course? Let’s connect!