Banner

JumpStart to Developing in Spark | Spark Programs, RDDs, NoSQL, Spark Machine Learning & More

Live Classroom
Duration: 5 days
Live Virtual Classroom
Duration: 5 days
Pattern figure

Overview

This course offers a holistic overview in some of the most cutting-edge technologies in the data science spectrum, with an emphasis on Spark and related tools. The framework of this course is structured for developers interested in enhancing their skills and learning enterprise-grade Spark programming. The course covers a wide array of topics ranging from features of Spark to practical experience with the specific set of technologies.

What You'll Learn

  • Basics of Spark architecture and applications
  • Executing Spark Programs
  • Creating and manipulating both RDDs (Resilient Distributed Datasets) and UDFs (Unified Data Frames)
  • Restoring data frames
  • Essential NOSQL access
  • Integrating machine learning into Spark applications
  • Using Spark Streaming and Kafka to create streaming applications

Curriculum

  • Overview of Spark
  • Hadoop ecosystem
  • Hadoop YARN vs. Mesos
  • Spark vs. Map/Reduce
  • Spark: Lambda architecture
  • Spark in the enterprise data science architecture

  • Spark shell
  • RDDs: Resilient distributed datasets
  • DataFrames
  • Spark 2 unified DataFrames
  • Spark sessions
  • Functional programming
  • Spark SQL
  • MLib
  • Structured streaming
  • Spark R
  • Spark and Python
  • Exercise: Hello, Spark

  • Coding with RDDs
  • Transformations
  • Actions
  • Lazy evaluation and optimization
  • RDDs in Map/Reduce
  • Exercise: Working with RDDs

  • RDDs vs. DataFrames
  • Unified Dataframes (UDF) in Spark 2.x
  • Partitioning
  • Exercise: Working with unified DataFrames

  • RDD persistence
  • DataFrame and unified DataFrame persistence
  • Distributed persistence
  • Exercise: Saving and restoring DataFrames

  • Ingesting data
  • Relational databases and Sqoop
  • Interacting with Hive
  • Graph data
  • Accessing Cassandra data
  • Exercise: NoSQL data access

  • Spark SQL
  • SQL and DataFrames
  • Spark SQL and Hive
  • Spark SQL and JDBC
  • Exercise: Working with SparkSQL

  • ML Lib
  • Mahout
  • Exercise: Hello, MLib

  • Streaming overview
  • Streams
  • Structured streaming
  • Lambda streaming
  • Spark and Kafka
  • Exercise: Hello, Spark Streaming
waves
Ripple wave

Who should attend

This course is geared for experienced developers and architects (with development experience) who seek to be proficient in advanced, modern development skills, working with Apache Spark in an enterprise data environment.

This course is highly recommended for:

  • Hadoop/Spark developers
  • Data scientists
  • Data engineers
  • Big Data engineers
  • Java developers
  • Application developers
  • Full stack developers

Prerequisites

Participants must be proficient in Java Programming Fundamentals. They need to have a thorough understanding of the basics of Python programming and SQL.

Interested in this Course?

    Ready to recode your DNA for GenAI?
    Discover how Cognixia can help.

    Get in Touch
    Pattern figure
    Ripple wave