Training with Iverson classes

Training is not a commodity – all training centres are not the same. Iverson Associates Sdn Bhd is the most established, the most reputable, and the top professional IT training provider in Malaysia. With a large pool of experienced and certified trainers, state-of-the-art facilities, and well-designed courseware, Iverson offers superior training, a more impactful learning experience and highly effective results.

At Iverson, our focus is on providing high-quality IT training to corporate customers, meeting their learning needs and helping them to achieve their training objectives. Iverson has the flexibility to provide training solutions whether for a single individual or the largest corporation in a well-paced or accelerated training programme.

Our courses continue to evolve along with the fast-changing technological advances. Our instructor-led training services are available on a public and a private (in-company) basis. Some of our courses are also available as online, on demand, and hybrid training.

This four-day workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges.

 

Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment.

Additional Info

  • Certification Course only
  • Course Code CDST
  • Price RM9000
  • Exam Price Exclude
  • Duration 4 Days
  • Principals Cloudera
  • Schedule

    Available upon request

  • At Course Completion

    The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful.

    Workshop participants should have a basic understanding of Python or R and some experience exploring and analyzing data and developing statistical or machine learning models. Knowledge of Hadoop or Spark is not required.

     

  • Module 1 Title Overview of data science and machine learning at scale
  • Module 2 Title Overview of the Hadoop ecosystem
  • Module 3 Title Working with HDFS data and Hive tables using Hue
  • Module 4 Title Introduction to Cloudera Data Science Workbench
  • Module 5 Title Overview of Apache Spark 2
  • Module 6 Title Reading and writing data
  • Module 7 Title Inspecting data quality
  • Module 8 Title Cleansing and transforming data
  • Module 9 Title Summarizing and grouping data
  • Module 10 Title Combining, splitting, and reshaping data
  • Module 11 Title Exploring data
  • Module 12 Title Configuring, monitoring, and troubleshooting Spark applications
  • Module 13 Title Overview of machine learning in Spark MLlib
  • Module 14 Title Extracting, transforming, and selecting features
  • Module 15 Title Building and evaluating regression models
  • Module 16 Title Building and evaluating classification models
  • Module 17 Title Building and evaluating clustering models
  • Module 18 Title Cross-validating models and tuning hyperparameters
  • Module 19 Title Building machine learning pipelines
  • Module 20 Title Deploying machine learning models
RM9,000.00(+RM540.00 Tax)

Cloudera Data Science Workbench Training prepares learners to complete exploratory data science and machine learning projects using Cloudera Data Science Workbench (CDSW).

Get hands-on experience
Through narrated demonstrations and hands-on exercises, learners gain familiarity with CDSW and develop the skills required to:

  • Navigate CDSW’s options and interfaces with confidence
  • Create projects in CDSW and collaborate securely with other users and teams
  • Develop and run reproducible Python and R code
  • Customize projects by installing packages and setting environment variables
  • Connect to a secure (Kerberized) Cloudera cluster
  • Work with large-scale data using Apache Spark 2 with PySpark and sparklyr
  • Perform full exploratory data science and machine learning workflows in CDSW using Python or R—read, inspect, transform, visualize, and model data
  • Work collaboratively using CDSW together with Git

What to Expect
This course is designed for learners at organizations using CDSW under a Cloudera Enterprise license or a trial license. The learner must have access to a CDSW environment on a Cloudera cluster running Apache Spark 2. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required.

Additional Info

  • Certification Course only
  • Course Code CDSW
  • Price RM3180
  • Exam Price Exclude
  • Principals Cloudera
  • Schedule

    Self-paced Online 

  • Module 1 Title Overview of CDSW
  • Module 1 Content
    • Introduction to CDSW
    • How to Access CDSW
    • Navigating around CDSW
    • User Settings
    • Hadoop Authentication
  • Module 2 Title Projects in CDSW
  • Module 2 Content
    • Creating a New Project
    • Navigating around a Project
    • Project Settings
  • Module 3 Title The CDSW Workbench Interface
  • Module 3 Content
    • Using the Workbench
    • Using the Sidebar
    • Using the Code Editor
    • Engines and Sessions
  • Module 4 Title Running Python and R Code in CDSW
  • Module 4 Content
    • Running Code
    • Using the Session Prompt
    • Using the Terminal
    • Installing Packages
    • Using Markdown in Comments
  • Module 5 Title Using Apache Spark 2 in CDSW
  • Module 5 Content
    • Scenario and Dataset
    • Copying Files to HDFS
    • Interfaces to Apache Spark 2
    • Connecting to Spark
    • Reading Data
    • Inspecting Data
  • Module 6 Title Exploratory Data Science in CDSW
  • Module 6 Content
    • Transforming Data
    • Using SQL Queries
    • Visualizing Data from Spark
    • Machine Learning with MLlib
    • Session History
  • Module 7 Title Teams and Collaboration in CDSW
  • Module 7 Content
    • Collaboration in CDSW
    • Teams in CDSW
    • Using Git for Collaboration
    • Conclusion
RM9,000.00(+RM540.00 Tax)

Individuals who earn the CCA Administrator certification have demonstrated the core systems and cluster administrator skills sought by companies and organizations deploying Cloudera in the enterprise.

  • Number of Questions: 8–12 performance-based (hands-on) tasks on pre-configured Cloudera Enterprise cluster. 
  • Time Limit: 120 minutes
  • Passing Score: 70%
  • Language: English

Additional Info

  • Certification Certificate only
  • Price RM1327.50
  • Exam Price Exclude
  • Exam Code CCA-131
  • Duration 0.5 Day
  • CertificationInfo Cloudera Certified Associate (CCA) Administrator
  • Principals Cloudera
  • Audience

    There are no prerequisites for taking any Cloudera certification exam; however, a background in system administration, or equivalent training is strongly recommended. The CCA Administrator exam (CCA131) follows the same objectives as Cloudera Administrator Training and the training course is an excellent part of preparation for the exam.

  • Prerequisities

    Install

    Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

     

    • Set up a local CDH repository
    • Perform OS-level configuration for Hadoop installation
    • Install Cloudera Manager server and agents
    • Install CDH using Cloudera Manager
    • Add a new node to an existing cluster
    • Add a service using Cloudera Manager

     

    Configure

    Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

    • Configure a service using Cloudera Manager
    • Create an HDFS user's home directory
    • Configure NameNode HA
    • Configure ResourceManager HA
    • Configure proxy for Hiveserver2/Impala

    Manage

    Maintain and modify the cluster to support day-to-day operations in the enterprise

    • Rebalance the cluster
    • Set up alerting for excessive disk fill
    • Define and install a rack topology script
    • Install new type of I/O compression library in cluster
    • Revise YARN resource assignment based on user feedback
    • Commission/decommission a node

    Secure

    Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices

    • Configure HDFS ACLs
    • Install and configure Sentry
    • Configure Hue user authorization and authentication
    • Enable/configure log and query redaction
    • Create encrypted zones in HDFS

    Test

    Benchmark the cluster operational metrics, test system configuration for operation and efficiency

    • Execute file system commands via HTTPFS
    • Efficiently copy data within a cluster/between clusters
    • Create/restore a snapshot of an HDFS directory
    • Get/set ACLs for a file or directory structure
    • Benchmark the cluster (I/O, CPU, network)

    Troubleshoot

    Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios

    • Resolve errors/warnings in Cloudera Manager
    • Resolve performance problems/errors in cluster operation
    • Determine reason for application failure
    • Configure the Fair Scheduler to resolve application delays
RM1,252.36(+RM75.14 Tax)

A CCA Spark and Hadoop Developer has proven their core skills to ingest, transform, and process data using Apache Spark and core Cloudera Enterprise tools.

  • Number of Questions: 8–12 performance-based (hands-on) tasks on Cloudera Enterprise cluster. See below for full cluster configuration
  • Time Limit: 120 minutes
  • Passing Score: 70%
  • Language: Englis

Additional Info

  • Certification Certificate only
  • Price RM1327.50
  • Exam Price Exclude
  • Exam Code CCA-175
  • Duration 0.5 Day
  • CertificationInfo Cloudera Certified Associate (CCA) Spark and Hadoop Developer
  • Principals Cloudera
  • Audience

    There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training for Spark and Hadoop and the training course is an excellent preparation for the exam. 

  • Prerequisities

    Data Ingest

    The skills to transfer data between external systems and your cluster. This includes the following:

    • Import data from a MySQL database into HDFS using Sqoop

    • Export data to a MySQL database from HDFS using Sqoop

    • Change the delimiter and file format of data during import using Sqoop

    • Ingest real-time and near-real-time streaming data into HDFS

    • Process streaming data as it is loaded onto the cluster

    • Load data into and out of HDFS using the Hadoop File System commands

    Transform, Stage, and Store

    Convert a set of data values in a given format stored in HDFS into new data values or a new data format and write them into HDFS.

    • Load RDD data from HDFS for use in Spark applications

    • Write the results from an RDD back into HDFS using Spark

    • Read and write files in a variety of file formats

    • Perform standard extract, transform, load (ETL) processes on data

    Data Analysis

    Use Spark SQL to interact with the metastore programmatically in your applications. Generate reports by using queries against loaded data.

    • Use metastore tables as an input source or an output sink for Spark applications

    • Understand the fundamentals of querying datasets in Spark

    • Filter data using Spark

    • Write queries that calculate aggregate statistics

    • Join disparate datasets using Spark

    • Produce ranked or sorted data

    Configuration

    This is a practical exam and the candidate should be familiar with all aspects of generating a result, not just writing code.

    • Supply command-line options to change your application configuration, such as increasing available memory
RM1,252.36(+RM75.14 Tax)

A CCA Data Analyst has proven their core analyst skills to load, transform, and model Hadoop data in order to define relationships and extract meaningful results from the raw input.

You are given eight to twelve customer problems with a unique large data set, a CDH cluster, and 120 minutes. For each problem, you must implement a technical solution with a high degree of precision that meets all the requirements. You may use any tool or combination of tools on the cluster (see list below) -- you get to pick the tool(s) that are right for the job. You must possess enough knowledge to analyze the problem and arrive at an optimal approach given the time allowed. You need to know what you should do and then do it on a live cluster, including a time limit and while being watched by a proctor.

  • Number of Questions: 8–12 performance-based (hands-on) tasks on CDH 5 cluster. See below for full cluster configuration

  • Time Limit: 120 minutes

  • Passing Score: 70%

  • Language: English

Additional Info

  • Certification Certificate only
  • Price RM1327.50
  • Exam Price Exclude
  • Exam Code CCA-159
  • Duration 0.5 Day
  • CertificationInfo Cloudera Certified Associate (CCA) Data Analyst
  • Principals Cloudera
  • Audience

    Candidates for CCA Data Analyst can be SQL devlopers, data analysts, business intelligence specialists, developers, system architects, and database administrators. There are no prerequisites.

  • Prerequisities

    Prepare the Data

    Use Extract, Transfer, Load (ETL) processes to prepare data for queries.

    • Import data from a MySQL database into HDFS using Sqoop

    • Export data to a MySQL database from HDFS using Sqoop

    • Move data between tables in the metastore

    • Transform values, columns, or file formats of incoming data before analysis

    Provide Structure to the Data

    Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.

    • Create tables using a variety of data types, delimiters, and file formats

    • Create new tables using existing tables to define the schema

    • Improve query performance by creating partitioned tables in the metastore

    • Alter tables to modify existing schema

    • Create views in order to simplify queries

    Data Analysis

    Use Query Language (QL) statements in Hive and Impala to analyze data on the cluster.

    • Prepare reports using SELECT commands including unions and subqueries

    • Calculate aggregate statistics, such as sums and averages, during a query

    • Create queries against multiple data sources by using join commands

    • Transform the output format of queries by using built-in functions

    • Perform queries across a group of rows using windowing functions
RM1,252.36(+RM75.14 Tax)
* Training Dates:

Cloudera University’s four-day Data Analyst Training course will teach you to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

Advance Your Ecosystem Expertise

Apache Hive makes transformation and analysis of complex, multi-structured data scalable in Cloudera environments. Apache Impala enables real-time interactive analysis of the data stored in Hadoop using a native SQL environment. Together, they make multi-structured data accessible to analysts, database administrators, and others without Java programming expertise.

Additional Info

  • Certification Course & Certificate
  • Course Code CDAT
  • Price RM9000
  • Exam Price Exclude
  • Exam Code CCA-159
  • Duration 4 Days
  • CertificationInfo Cloudera Certified Data Analyst
  • Principals Cloudera
  • Schedule

    2-5 Dec 2019

    6-9 Apr 2020

    15-18 June 2020

    13-16 Jul 2020

    7-10 Sep 2020

    7-10 Dec 2020

  • Audience

    This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity. Prior knowledge of Apache Hadoop is not required.

  • At Course Completion

    Get Certified

    Upon completion of the course, attendees are encouraged to continue their study and register for the CCA Data Analyst exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

  • Module 1 Title Apache Hadoop Fundamentals
  • Module 1 Content
    • The Motivation for Hadoop
    • Hadoop Overview
    • Data Storage: HDFS
    • Distributed Data Processing: YARN, MapReduce, and Spark
    • Data Processing and Analysis: Hive, and Impala
    • Database Integration: Sqoop
    • Other Hadoop Data Tools
    • Exercise Scenario Explanation
       
  • Module 2 Title Introduction to Apache Hive and Impala
  • Module 2 Content
    • What Is Hive?
    • What Is Impala?
    • Why Use Hive and Impala?
    • Schema and Data Storage
    • Comparing Hive and Impala to Traditional Databases
    • Use Cases
  • Module 3 Title Querying with Apache Hive and Impala
  • Module 3 Content
    • Databases and Tables
    • Basic Hive and Impala Query Language Syntax
    • Data Types
    • Using Hue to Execute Queries
    • Using Beeline (Hive's Shell)
    • Using the Impala Shell
  • Module 4 Title Common Operators and Built-In Functions
  • Module 4 Content
    • Operators
    • Scalar Functions
    • Aggregate Functions
  • Module 5 Title Data Management
  • Module 5 Content
    • Data Storage
    • Creating Databases and Tables
    • Loading Data
    • Altering Databases and Tables
    • Simplifying Queries with Views
    • Storing Query Results
  • Module 6 Title Data Storage and Performance
  • Module 6 Content
    • Partitioning Tables
    • Loading Data into Partitioned Tables
    • When to Use Partitioning
    • Choosing a File Format
    • Using Avro and Parquet File Formats
  • Module 7 Title Working with Multiple Datasets
  • Module 7 Content
    • UNION and Joins
    • Handling NULL Values in Joins
    • Advanced Joins
  • Module 8 Title Analytic Functions and Windowing
  • Module 8 Content
    • Using Common Analytic Functions
    • Other Analytic Functions
    • Sliding Windows
  • Module 9 Title Complex Data
  • Module 9 Content
    • Complex Data with Hive
    • Complex Data with Impala
  • Module 10 Title Analyzing Text
  • Module 10 Content
    • Using Regular Expressions with Hive and Impala
    • Processing Text Data with SerDes in Hive
    • Sentiment Analysis and n-grams
  • Module 11 Title Apache Hive Optimization
  • Module 11 Content
    • Understanding Query Performance
    • Bucketing
    • Hive on Spark
  • Module 12 Title Apache Impala Optimization
  • Module 12 Content
    • How Impala Executes Queries
    • Improving Impala Performance
  • Module 13 Title Extending Apache Hive and Impala
  • Module 13 Content
    • Custom SerDes and File Formats in Hive
    • Data Transformation with Custom Scripts in Hive
    • User-Defined Functions
    • Parameterized Queries
  • Module 14 Title Choosing the Best Tool for the Job
  • Module 14 Content
    • Comparing Hive, Impala, and Relational Databases
    • Which to Choose?
  • Module 15 Title Conclusion
  • Module 15 Content
    • How Impala Executes Queries
    • Improving Impala Performance
  • Module 16 Title Extending Hive and Impala
  • Module 16 Content
    • Custom SerDes and File Formats in Hive
    • Data Transformation with Custom Scripts in Hive
    • User-Defined Functions
    • Parameterized Queries
  • Module 17 Title Choosing the Best Tool for the Job
  • Module 17 Content
    • Comparing Pig, Hive, Impala, and Relational Databases
    • Which to Choose?
  • Module 18 Title Conclusion
RM9,000.00(+RM540.00 Tax)
* Training Dates:

Cloudera University’s one-day Python training course will teach you the key language concepts and programming techniques you need so that you can concentrate on the subjects covered in Cloudera’s developer courses without also having to learn a complex programming language and a new programming paradigm on the fly.

Additional Info

  • Certification Course only
  • Course Code JEP
  • Price RM2500
  • Exam Price Exclude
  • Duration 1 Day
  • Principals Cloudera
  • Schedule

    9 Dec 2019

    20 Jan 2020

    4 May 2020

    3 Aug 2020

    14 Dec 2020

  • Audience

    Prior knowledge of Hadoop is not required. Since this course is intended for developers who do not yet have the prerequisite skills writing code in Scala, basic programming experience in at least one commonly-used programming language (ideally Java, but Ruby, Perl, Scala, C, C++, PHP, or Javascript will suffice) is assumed.

    NOTE: This course does not teach Big Data concepts, nor does it cover how to use Cloudera software. Instead, it is meant as a precursor for one of our developer-focused training courses that provide those skills, such as Developer Training for Spark and Hadoop I or Developer Training for Apache Spark.

  • At Course Completion
    • How to define, assign, and access variables
    • Which collection types are commonly used, how they differ, and how to use them
    • How to control program flow using conditional statements, looping, iteration, and exception handling
    • How to define and use both named and anonymous (Lambda) functions
    • How to organize code into separate modules
    • How to use important features of standard Python libraries, including mathematical and regular expression support
  • Module 1 Title Introduction
  • Module 2 Title Introduction to Python
  • Module 2 Content
    • Python Background Information
    • Scope
    • Exercises
  • Module 3 Title Variables
  • Module 3 Content
    • Python Variables
    • Numerical
    • Boolean
    • String
  • Module 4 Title Collections
  • Module 4 Content
    • Lists
    • Tuples
    • Sets
    • Dictionaries
  • Module 5 Title Flow Control
  • Module 5 Content
    • Code Blocks
    • Repetitive Execution
    • Iterative Execution
    • Conditional Execution
    • Tentative Execution (Exception Handling)
  • Module 6 Title Program Structure
  • Module 6 Content
    • Named Functions
    • Anonymous Functions (Lambda)
    • Generator Functions
  • Module 7 Title Working with Libraries
  • Module 7 Content
    • Storing and Retrieving Functions
    • Module Control
    • Common Standard Libraries
  • Module 8 Title Conclusion
RM2,500.00(+RM150.00 Tax)
* Training Dates:

Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification
Cloudera University’s three-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second.


Advance Your Ecosystem Expertise
Apache HBase is a distributed, scalable, NoSQL database built on Apache Hadoop. HBase can store data in massive tables consisting of billions of rows and millions of columns, serve data to many users and applications in real time, and provide fast, random read/write access to users and applications.

Additional Info

  • Certification Course only
  • Course Code CCSHB
  • Price RM7420
  • Exam Price Exclude
  • Duration 4 Days
  • Principals Cloudera
  • Schedule

    4-6 Nov 2019

    13-15 Apr 2020

    6-8 Jul 2020

    2-4 Nov 2020

  • Audience

    This course is appropriate for developers and administrators who intend to use HBase. Prior experience with databases and data modeling is helpful, but not required. Prior knowledge of Java is helpful. Prior knowledge of Hadoop is not required, but Cloudera Developer Training for Apache Hadoop provides an excellent foundation for this course.

  • Module 1 Title Introduction to Hadoop and HBase
  • Module 1 Content
    • What Is Big Data?
    • Introducing Hadoop
    • Hadoop Components
    • What Is HBase?
    • Why Use HBase?
    • Strengths of HBase
    • HBase in Production
    • Weaknesses of HBase
  • Module 2 Title HBase Tables
  • Module 2 Content
    • HBase Concepts
    • HBase Table Fundamentals
    • Thinking About Table Design
  • Module 3 Title The HBase Shell
  • Module 3 Content
    • Creating Tables with the HBase Shell
    • Working with Tables
    • Working with Table Data
  • Module 4 Title HBase Architecture Fundamentals
  • Module 4 Content
    • HBase Regions
    • HBase Cluster Architecture
    • HBase and HDFS Data Locality
  • Module 5 Title HBase Schema Design
  • Module 5 Content
    • General Design Considerations
    • Application-Centric Design
    • Designing HBase Row Keys
    • Other HBase Table Features
  • Module 6 Title Basic Data Access with the HBase API
  • Module 6 Content
    • Options to Access HBase Data
    • Creating and Deleting HBase Tables
    • Retrieving Data with Get
    • Retrieving Data with Scan
    • Inserting and Updating Data
    • Deleting Data
  • Module 7 Title More Advanced HBase API Features
  • Module 7 Content
    • Filtering Scans
    • Best Practices
    • HBase Coprocessors
  • Module 8 Title HBase on the Cluster
  • Module 8 Content
    • How HBase Uses HDFS
    • Compactions and Splits
  • Module 9 Title HBase Reads and Writes
  • Module 9 Content
    • How HBase Writes Data
    • How HBase Reads Data
    • Block Caches for Reading
  • Module 10 Title HBase Performance Tuning
  • Module 10 Content
    • Column Family Considerations
    • Schema Design Considerations
    • Configuring for Caching
    • Dealing with Time Series and Sequential Data
    • Pre-Splitting Regions
  • Module 11 Title HBase Administration and Cluster Management
  • Module 11 Content
    • HBase Daemons
    • ZooKeeper Considerations
    • HBase High Availability
    • Using the HBase Balancer
    • Fixing Tables with hbck
    • HBase Security
  • Module 12 Title HBase Replication and Backup
  • Module 12 Content
    • HBase Replication
    • HBase Backup
    • MapReduce and HBase Clusters
  • Module 13 Title Using Hive and Impala with HBase
  • Module 13 Content
    • Using Hive and Impala with HBase
  • Module 14 Title Conclusion
RM7,000.00(+RM420.00 Tax)
* Training Dates:

Take your knowledge to the next level
This four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark (including Spark Streaming and Spark SQL), Flume, Kafka, and Sqoop, this training course is the best preparation for the real-world challenges faced by Hadoop developers. With Spark, developers can write sophisticated parallel applications to execute faster decisions, better decisions, and interactive actions, applied to a wide variety of use cases, architectures, and industries.


Get hands-on experience
Through expert-led discussion and interactive, hands-on exercises, participants will learn how to:

  • Distribute, store, and process data in a Hadoop cluster
  • Write, configure, and deploy Apache Spark applications on a Hadoop cluster
  • Use the Spark shell for interactive data analysis
  • Process and query structured data using Spark SQL
  • Use Spark Streaming to process a live data stream
  • Use Flume and Kafka to ingest data for Spark Streaming

What to expect
This course is designed for developers and engineers who have programming experience, but prior knowledge of Hadoop is not required

  • Apache Spark examples and hands-on exercises are presented in Scala and Python. The ability to program in one of those languages is required
  • Basic familiarity with the Linux command line is assumed
  • Basic knowledge of SQL is helpful


Get certified
Upon completion of the course, attendees are encouraged to continue their study and register for the CCA Spark and Hadoop Developer exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

Additional Info

  • Certification Course & Certificate
  • Course Code CDTSH
  • Price RM9540
  • Exam Price Exclude
  • Exam Code CCA-175
  • Duration 4 Days
  • CertificationInfo CCA Spark and Hadoop Developer
  • Principals Cloudera
  • Schedule

    7-11 Oct 2019

    16-19 Mar 2020

    30 Mar – 2 Apr 2020 (Penang)

    22-25 Jun 2020

    21-24 Sep 2020 (Penang)

    5-8 Oct 2020

  • Module 1 Title Introduction
  • Module 1 Content

    Introduction to Apache Hadoop and the Hadoop Ecosystem

    • Apache Hadoop Overview
    • Data Storage and Ingest
    • Data Processing
    • Data Analysis and Exploration
    • Other Ecosystem Tools
    • Introduction to the Hands-On Exercises
  • Module 2 Title Apache Hadoop File Storage
  • Module 2 Content
    • Problems with Traditional
  • Module 3 Title Large-Scale Systems
  • Module 3 Content
    • HDFS Architecture
    • Using HDFS
    • Apache Hadoop File Formats
  • Module 4 Title Data Processing on an Apache Hadoop Cluster
  • Module 4 Content
    • YARN Architecture
    • Working With YARN
  • Module 5 Title Importing Relational Data with Apache Sqoop
  • Module 5 Content
    • Apache Sqoop Overview
    • Importing Data
    • Importing File Options
    • Exporting Data
  • Module 6 Title Apache Spark Basics
  • Module 6 Content
    • What is Apache Spark?
    • Using the Spark Shell
    • RDDs (Resilient Distributed Datasets)
    • Functional Programming in Spark
  • Module 7 Title Working with RDDs
  • Module 7 Content
    • Creating RDDs
    • Other General RDD Operations
  • Module 8 Title Aggregating Data with Pair RDDs
  • Module 8 Content
    • Key-Value Pair RDDs
    • Map-Reduce
    • Other Pair RDD Operations
  • Module 9 Title Writing and Running Apache Spark Applications
  • Module 9 Content
    • Spark Applications vs. Spark Shell
    • Creating the SparkContext
    • Building a Spark Application (Scala and Java)
    • Running a Spark Application
    • The Spark Application Web UI
  • Module 10 Title Configuring Apache Spark Applications
  • Module 10 Content
    • Configuring Spark Properties
    • Logging
  • Module 11 Title Parallel Processing in Apache Spark
  • Module 11 Content
    • Review: Apache Spark on a Cluster
    • RDD Partitions
    • Partitioning of File-Based RDDs
    • HDFS and Data Locality
    • Executing Parallel Operations
    • Stages and Tasks
  • Module 12 Title RDD Persistence
  • Module 12 Content
    • RDD Lineage
    • RDD Persistence Overview
    • Distributed Persistence
  • Module 13 Title Common Patterns in Apache Spark Data Processing
  • Module 13 Content
    • Common Apache Spark Use Cases
    • Iterative Algorithms in Apache Spark
    • Machine Learning
    • Example: k-means
  • Module 14 Title DataFrames and Spark SQL
  • Module 14 Content
    • Apache Spark SQL and the SQL Context
    • Creating DataFrames
    • Transforming and Querying DataFrames
    • Saving DataFrames
    • DataFrames and RDDs
    • Comparing Apache Spark SQL, Impala, and Hive-on-Spark
    • Apache Spark SQL in Spark 2.x
  • Module 15 Title Message Processing with Apache Kafka
  • Module 15 Content
    • What is Apache Kafka?
    • Apache Kafka Overview
    • Scaling Apache Kafka
    • Apache Kafka Cluster Architecture
    • Apache Kafka Command Line Tools
  • Module 16 Title Capturing Data with Apache Flume
  • Module 16 Content
    • What is Apache Flume?
    • Basic Flume Architecture
    • Flume Sources
    • Flume Sinks
    • Flume Channels
    • Flume Configuration
  • Module 17 Title Integrating Apache Flume and Apache Kafka
  • Module 17 Content
    • Overview
    • Use Cases
    • Configuration

    Apache Spark Streaming: Introduction to DStreams

    • Apache Spark Streaming Overview
    • Example: Streaming Request Count
    • DStreams
    • Developing Streaming Applications

    Apache Spark Streaming: Processing Multiple Batches

    • Multi-Batch Operations
    • Time Slicing
    • State Operations
    • Sliding Window Operations

    Apache Spark Streaming: Data Sources

    • Streaming Data Source Overview
    • Apache Flume and Apache Kafka

    Data Sources

    • Example: Using a Kafka Direct Data Source
  • Module 18 Title Conclusion
RM9,000.00(+RM540.00 Tax)
* Training Dates:

Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification

Cloudera University’s four-day administrator training course for Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster using Cloudera Manager. From installation and configuration through load balancing and tuning, Cloudera’s training course is the best preparation for the real-world challenges faced by Hadoop administrators.

Additional Info

  • Certification Course & Certificate
  • Course Code CCAH
  • Price RM9540
  • Exam Price Exclude
  • Exam Code CCA-131
  • Duration 4 Days
  • CertificationInfo Cloudera Certified Administrator (CCA)
  • Principals Cloudera
  • Schedule

    18-21 Nov 2019

    2-5 Dec 2019 (Penang)

    17-20 Feb 2020

    8-11 Jun 2020

    15-18 Jun 2020 (Penang)

    24-27 Aug 2020

    16-19 Nov 2020

    30 Nov – 3 Dec 2020 (Penang)

  • Audience

    This course is best suited to systems administrators and IT managers who have basic Linux experience. Prior knowledge of Apache Hadoop is not required.

  • Module 1 Title The Case for Apache Hadoop
  • Module 1 Content
    • Why Hadoop?
    • Fundamental Concepts
    • Core Hadoop Components
  • Module 2 Title Hadoop Cluster Installation
  • Module 2 Content
    • Rationale for a Cluster Management Solution
    • Cloudera Manager Features
    • Cloudera Manager Installation
    • Hadoop (CDH) Installation
  • Module 3 Title The Hadoop Distributed File System (HDFS)
  • Module 3 Content
    • HDFS Features
    • Writing and Reading Files
    • NameNode Memory Considerations
    • Overview of HDFS Security
    • Web UIs for HDFS
    • Using the Hadoop File Shell
  • Module 4 Title MapReduce and Spark on YARN
  • Module 4 Content
    • The Role of Computational Frameworks
    • YARN: The Cluster Resource Manager
    • MapReduce Concepts
    • Apache Spark Concepts
    • Running Computational Frameworks on YARN
    • Exploring YARN Applications Through the Web UIs, and the Shell
    • YARN Application Logs
  • Module 5 Title Hadoop Configuration and Daemon Logs
  • Module 5 Content
    • Cloudera Manager Constructs for Managing Configurations
    • Locating Configurations and Applying Configuration Changes
    • Managing Role Instances and Adding Services
    • Configuring the HDFS Service
    • Configuring Hadoop Daemon Logs
    • Configuring the YARN Service
  • Module 6 Title Getting Data Into HDFS
  • Module 6 Content
    • Ingesting Data From External Sources With Flume
    • Ingesting Data From Relational Databases With Sqoop
    • REST Interfaces
    • Best Practices for Importing Data
  • Module 7 Title Planning Your Hadoop Cluster
  • Module 7 Content
    • General Planning Considerations
    • Choosing the Right Hardware
    • Virtualization Options
    • Network Considerations
    • Configuring Nodes
  • Module 8 Title Installing and Configuring Hive, Impala, and Pig
  • Module 8 Content
    • Hive
    • Impala
    • Pig
  • Module 9 Title Hadoop Clients Including Hue
  • Module 9 Content
    • What Are Hadoop Clients?
    • Installing and Configuring Hadoop Clients
    • Installing and Configuring Hue
    • Hue Authentication and Authorization
  • Module 10 Title Advanced Cluster Configuration
  • Module 10 Content
    • Advanced Configuration Parameters
    • Configuring Hadoop Ports
    • Configuring HDFS for Rack Awareness
    • Configuring HDFS High Availability
  • Module 11 Title Hadoop Security
  • Module 11 Content
    • Why Hadoop Security Is Important
    • Hadoop’s Security System Concepts
    • What Kerberos Is and how it Works
    • Securing a Hadoop Cluster With Kerberos
    • Other Security Concepts
  • Module 12 Title Managing Resources
  • Module 12 Content
    • Configuring cgroups with Static Service Pools
    • The Fair Scheduler
    • Configuring Dynamic Resource Pools
    • YARN Memory and CPU Settings
    • Impala Query Scheduling
  • Module 13 Title Cluster Maintenance
  • Module 13 Content
    • Checking HDFS Status
    • Copying Data Between Clusters
    • Adding and Removing Cluster Nodes
    • Rebalancing the Cluster
    • Directory Snapshots
    • Cluster Upgrading
  • Module 14 Title Cluster Monitoring and Troubleshooting
  • Module 14 Content
    • Cloudera Manager Monitoring Features
    • Monitoring Hadoop Clusters
    • Troubleshooting Hadoop Clusters
    • Common Misconfigurations
  • Module 15 Title Conclusion
RM9,000.00(+RM540.00 Tax)
* Training Dates:

PMP, Project Management Professional (PMP), CAPM, Certified Associate in Project Management (CAPM) are registered marks of the Project Management Institute, Inc.

We are using cookies to give you the best experience on our site. By continuing to use our website without changing the settings, you are agreeing to use of cookies.
Ok Decline