Cloudera - Iverson Associates Sdn Bhd

Our Location
View Location Maps

Customer Centre
+ 6012 602 6277

Training with Iverson classes

Training is not a commodity – all training centres are not the same. Iverson Associates Sdn Bhd is the most established, the most reputable, and the top professional IT training provider in Malaysia. With a large pool of experienced and certified trainers, state-of-the-art facilities, and well-designed courseware, Iverson offers superior training, a more impactful learning experience and highly effective results.

At Iverson, our focus is on providing high-quality IT training to corporate customers, meeting their learning needs and helping them to achieve their training objectives. Iverson has the flexibility to provide training solutions whether for a single individual or the largest corporation in a well-paced or accelerated training programme.

Our courses continue to evolve along with the fast-changing technological advances. Our instructor-led training services are available on a public and a private (in-company) basis. Some of our courses are also available as online, on demand, and hybrid training.

Keyword
Training
Certification
Certificate only Course & Certificate Course only
Order by
Link

Subscribe to this RSS feed

Cloudera Training for Apache Kafka

This four-day instructor-led course begins by introducing Apache Kafka, explaining its key concepts and architecture, and discussing several common use cases. Building on this foundation, you will learn how to plan a Kafka deployment, and then gain hands-on experience by installing and configuring your own cloud-based, multi-node cluster running Kafka on the Cloudera Data Platform (CDP).

You will then use this cluster during more than 20 hands-on exercises that follow, covering a range of essential skills, starting with how to create Kafka topics, producers, and consumers, then continuing through progressively more challenging aspects of Kafka operations and development, such as those related to scalability, reliability, and performance problems. Throughout the course, you will learn and use Cloudera’s recommended tools for working with Kafka, including Cloudera Manager, Schema Registry, Streams Messaging Manager, and Cruise Control.

Additional Info

Certification Course & Certificate
Course Code CTAK
Price 13600
Exam Price Exclude
Exam Code CDP-4001
Duration 4 Days
Principals Cloudera
Schedule
19-22 Mar 2024
12-15 Aug 2024
1-4 Oct 2024
Audience
This course is designed for system administrators, data engineers, and developers.
Prerequisities
All students are expected to have basic Linux experience, and basic proficiency with the Java programming language is recommended. No prior experience with Apache Kafka is necessary.
At Course Completion
During this course, you learn how to:
- Plan, deploy and operate Kafka clusters
- Create and manage topics
- Develop producers and consumers
- Use replication to improve fault tolerance
- Use partitioning to improve scalability
- Troubleshoot common problems and performance issues
Module 1 Title Kafka Overview
Module 1 Content
- High-Level Architecture
- Common Use Cases
- Cloudera's Distribution of Apache Kafka
Module 2 Title Deploying Apache Kafka
Module 2 Content
- System Requirements and Dependencies
- Service Roles
- Planning Your Deployment Deploying Kafka Services
- Exercise: Preparing the Exercise Environment
- Exercise: Installing the Kafka Service with Cloudera Manager
- Exercise (optional): Create Metrics Dashboards
- Exercise (optional): Using the CM API
Module 3 Title Kafka Command Line Basics
Module 3 Content
- Create and Manage Topics
- Running Producers and Consumers
Module 4 Title Using Streams Messaging Manager (SMM)
Module 4 Content
- Streams Messaging Manager Overview
- Producers, Topics, and Consumers
- Data Explorer
- Brokers
- Topic Management
- Exercise: Managing Topics using the CLI
- Exercise: Connecting Producers and Consumers from the Command Line
Module 5 Title Kafka Java API Basics
Module 5 Content
- Overview of Kafka's APIs
- Topic Management from the Java API
- Exercise (optional): Managing Kafka Topics Using the Java API
- Using Producers and Consumers from the Java API
- Exercise: Developing Producers and Consumers with the Java API
Module 6 Title Improving Availability through Replication
Module 6 Content
- Replication
- Exercise: Observing Downtime Due to Broker Failure
- Considerations for the Replication Factor
- Exercise: Adding Replicas to Improve Availability
Module 7 Title Improving Application Scalability
Module 7 Content
- Partitioning
- How Messages are Partitioned
- Exercise: Observing How Partitioning Affects Performance
- Consumer Groups
- Exercise: Implementing Consumer Groups
- Consumer Rebalancing
- Exercise: Using a Key to Control Partition Assignment
Module 8 Title Improving Application Reliability
Module 8 Content
- Delivery Semantics
- Demonstration (optional): ISRs vs. ACKs
- Producer Delivery
- Exercise: Idempotent Producer
- Transactions
- Exercise: Transactional Producers and Consumers
- Handling Consumer Failure
- Offset Management
- Exercise: Detecting and Suppressing Duplicate Messages
- Exercise: Handling Invalid Records
- Handling Producer Failure
Module 9 Title Analyzing Kafka Clusters with SMM
Module 9 Content
- End-to-End Latency
- Notifiers
- Alert Policies
- Use Cases
Module 10 Title Monitoring Kafka
Module 10 Content
- Monitoring Overview
- Monitoring using Cloudera Manager
- Charts and Reports in CM
- Monitoring Recommendations
- Metrics for Troubleshooting
- Diagnosing Service Failure
- Exercise: Monitoring Kafka
Module 11 Title Managing Kafka
Module 11 Content
- Managing Kafka Topic Storage
- Demonstration (optional): Message Retention Period
- Log Cleanup and Collection
- Rebalancing Partitions
- Cruise Control
- Exercise: Installing Cruise Control
- Exercise: Troubleshooting Kafka Topics
- Unclean Leader Election
- Exercise: Unclean Leader Election
- Adding and Removing Brokers
- Exercise: Adding and Removing Brokers
- Best Practices
Module 12 Title Message Structure, Format, and Versioning
Module 12 Content
- Message Structure
- Schema Registry
- Defining Schemas
- Schema Evolution and Versioning
- Schema Registry Client
- Exercise: Using an Avro Schema
Module 13 Title Improving Application Performance
Module 13 Content
- Message Size
- Batching
- Compression
- Exercise: Observing How Compression Affects Performance
Module 14 Title Improving Kafka Service Performance
Module 14 Content
- Performance Tuning Strategies for the Administrator
- Cluster Sizing
- Exercise: Planning Capacity Needed for a Use Case
Module 15 Title Securing the Kafka Cluster
Module 15 Content
- Encryption
- Authentication
- Authorization
- Auditing
Module 16 Content
Module 17 Content
Module 18 Content
Module 19 Content
Module 20 Content
Module 21 Content
Module 22 Content
Module 23 Content
Module 24 Content
Module 25 Content
Module 26 Content
Module 27 Content
Module 28 Content
Module 29 Content
Module 30 Content
Module 31 Content
Module 32 Content
Module 33 Content
Module 34 Content
Module 35 Content
Module 36 Content
Module 37 Content
Module 38 Content
Module 39 Content
Module 40 Content
Module 41 Content
Module 42 Content
Module 43 Content
Module 44 Content
Module 45 Content
Module 46 Content
Module 47 Content
Module 48 Content
Module 49 Content
Module 50 Content

Apache Spark Application Performance Tuning

This three-day hands-on training course delivers the key concepts and expertise developers need to improve the performance of their Apache Spark applications. During the course, participants will learn how to identify common sources of poor performance in Spark applications, techniques for avoiding or solving them, and best practices for Spark application monitoring.Apache Spark Application Performance Tuning presents the architecture and concepts behind Apache Spark and underlying data platform, then builds on this foundational understanding by teaching students how to tune Spark application code. The course format emphasizes instructor-led demonstrations illustrate both performance issues and the techniques that address them, followed by hands-on exercises that give students an opportunity to practice what they've learned through an interactive notebook environment. The course applies to Spark 2.4, but also introduces the Spark 3.0 Adaptive Query Execution framework.

Additional Info

Certification Course only
Course Code ASPT
Price RM10200
Exam Price Exclude
Duration 3 Days
Principals Cloudera
Schedule
22-24 Jan 2024
24-26 Apr 2024
17-19 Jul 2024
9-11 Sep 2024
Audience
This course is designed for software developers, engineers, and data scientists who have experience developing Spark applications and want to learn how to improve the performance of their code. This is not an introduction to Spark.
Prerequisities
Spark examples and hands-on exercises are presented in Python and the ability to program in this language is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful.
At Course Completion
Students who successfully complete this course will be able to:
- Understand Apache Spark's architecture, job execution, and how techniques such as lazy execution and pipelining can improve runtime performance
- Evaluate the performance characteristics of core data structures such as RDD and DataFrames
- Select the file formats that will provide the best performance for your application
- Identify and resolve performance problems caused by data skew
- Use partitioning, bucketing, and join optimizations to improve SparkSQL performance
- Understand the performance overhead of Python-based RDDs, DataFrames, and user-defined functions
- Take advantage of caching for better application performance
- Understand how the Catalyst and Tungsten optimizers work
- Understand how Workload XM can help troubleshoot and proactively monitor Spark applications performance
- Learn about the new features in Spark 3.0 and specifically how the Adaptive Query Execution engine improves performance
Module 1 Title Spark Architecture
Module 1 Content
- RDDs
- DataFrames and Datasets
- Lazy Evaluation
- Pipelining
Module 2 Title Data Sources and Formats
Module 2 Content
- Available Formats Overview
- Impact on Performance
- The Small Files Problem
Module 3 Title Inferring Schemas
Module 3 Content
- The Cost of Inference
- Mitigating Tactics
Module 4 Title Dealing With Skewed Data
Module 4 Content
- Recognizing Skew
- Mitigating Tactics
Module 5 Title Catalyst and Tungsten Overview
Module 5 Content
- Catalyst Overview
- Tungsten Overview
Module 6 Title Mitigating Spark Shuffles
Module 6 Content
- Denormalization
- Broadcast Joins
- Map-Side Operations
- Sort Merge Joins
Module 7 Title Partitioned and Bucketed Tables
Module 7 Content
- Partitioned Tables
- Bucketed Tables
- Impact on Performance
Module 8 Title Improving Join Performance
Module 8 Content
- Skewed Joins
- Bucketed Joins
- Incremental Joins
Module 9 Title Pyspark Overhead and UDFs
Module 9 Content
- Pyspark Overhead
- Scalar UDFs
- Vector UDFs using Apache Arrow
- Scala UDFs
Module 10 Title Caching Data for Reuse
Module 10 Content
- Caching Options
- Impact on Performance
- Caching Pitfalls
Module 11 Title Workload XM (WXM) Introduction
Module 11 Content
- WXM Overview
- WXM for Spark Developers
Module 12 Title What's New in Spark 3.0?
Module 12 Content
- Adaptive Number of Shuffle Partitions
- Skew Joins
- Convert Sort Merge Joins to Broadcast Joins
- Dynamic Partition Pruning
- Dynamic Coalesce Shuffle Partitions
Module 13 Content
Module 14 Content
Module 15 Content
Module 16 Content
Module 17 Content
Module 18 Content
Module 19 Content
Module 20 Content
Module 21 Content
Module 22 Content
Module 23 Content
Module 24 Content
Module 25 Content
Module 26 Content
Module 27 Content
Module 28 Content
Module 29 Content
Module 30 Content
Module 31 Content
Module 32 Content
Module 33 Content
Module 34 Content
Module 35 Content
Module 36 Content
Module 37 Content
Module 38 Content
Module 39 Content
Module 40 Content
Module 41 Content
Module 42 Content
Module 43 Content
Module 44 Content
Module 45 Content
Module 46 Content
Module 47 Content
Module 48 Content
Module 49 Content
Module 50 Content

Cloudera DataFlow: Flow Management with Apache NiFi

One of the most critical functions of a data-driven enterprise is the ability to manage ingest and data flow across complex ecosystems. Does your team have the tools and skill sets to succeed at this? Apache NiFi provides this capability and our three-day Cloudera Dataflow: Flow Management with Apache Nifi course delivers the foundational training you'll need to succeed with NiFi. In addition to learning NiFi's key features and concepts, participants will gain hands-on experience creating, executing, managing, and optimizing NiFi dataflows throughout a variety of scenarios.

Additional Info

Certification Course & Certificate
Course Code CDFM
Price RM10200
Exam Price Exclude
Exam Code CDP-3001
Duration 3 Days
Principals Cloudera
Schedule
5-7 Feb 2024
13-15 May 2024
26-28 Aug 2024
4-6 Nov 2024
Audience
This course is designed for developers, data engineers, administrators, and others with an interest in learning NiFi's innovative no-code, graphical approach to data ingest.
Prerequisities
Although programming experience is not required, basic experience with Linux is presumed, and previous exposure to big data concepts and applications is helpful.
At Course Completion
During this course, you learn how to:
- Navigate the NiFi user interface
- Define, configure, organize, and manage dataflows
- Transform and trace data as it flows to its destination
- Track changes to dataflows with NiFi Registry
- Use the NiFi Expression Language to control dataflows
- Optimize dataflows for better performance and maintainability
- Connect dataflows with other systems, such as Apache Kafka, Apache Hive, and HDFS
Module 1 Title Introduction to Cloudera Flow Management
Module 1 Content
- Overview of Cloudera Flow Management and NiFi
- The NiFi User Interface
- Demonstration: NiFi User Interface
- Exercise: Build Your First Dataflow
Module 2 Title Processors
Module 2 Content
- Overview of Processors
- Processor Surface Panel
- Processor Configuration
- Exercise: Start Building a Dataflow Using Processors
Module 3 Title Connections
Module 3 Content
- Overview of Connections
- Connection Configuration
- Connector Context Menu
- Exercise: Connect Processors in a Dataflow
Module 4 Title Dataflows
Module 4 Content
- Command and Control of a Dataflow
- Processor Relationships
- Back Pressure
- Prioritizers
- Labels
- Exercise: Build a More Complex Dataflow
- Exercise: Creating a Fork Using Relationships
- Exercise: Set Back Pressure Thresholds
Module 5 Title Process Groups
Module 5 Content
- Anatomy of a Process Group
- Input and Output Ports
- Exercise: Simplify Dataflows Using Process Groups
Module 6 Title FlowFile Provenance
Module 6 Content
- Data Provenance Events
- FlowFile Lineage
- Replaying a FlowFile
- Exercise: Using Data Provenance
Module 7 Title Dataflow Templates
Module 7 Content
- Templates Overview
- Managing Templates
- Exercise: Creating, Using, and Managing Templates
Module 8 Title Apache NiFi Registry
Module 8 Content
- Apache NiFi Registry Overview
- Using the Registry
- Exercise: Versioning Flows Using NiFi Registry
Module 9 Title FlowFile Attributes
Module 9 Content
- FlowFile Attributes
- Routing on Attributes
- Exercise: Working with FlowFile Attributes
Module 10 Title NiFi Expression Language
Module 10 Content
- NiFi Expression Language Overview
- Syntax
- Expression Language Editor
- Setting Conditional Values
- Exercise: Using the NiFi Expression Language
Module 11 Title Dataflow Optimization
Module 11 Content
- Dataflow Optimization
- Control Rate
- Managing Compute
- Exercise: Building an Optimized Dataflow
Module 12 Title NiFi Architecture
Module 12 Content
- NiFi Architecture Overview
- Cluster Architecture
- Heartbeats
- Managing Clusters
Module 13 Title Site-to-Site Dataflows
Module 13 Content
- Site-to-Site Theory
- Site-to-Site Architecture
- Anatomy of a Remote Process Group
- Adding and Configuring Remote Process Groups
- Exercise: Building Site-to-Site Dataflows
Module 14 Title Cloudera Edge Management and MiNiFi
Module 14 Content
- Overview of MiNiFi
- Example Walk-through
Module 15 Title Monitoring and Reporting
Module 15 Content
- Monitoring from NiFi
- Overview of Reporting
- Examples of Common Reporting Tasks
- Exercise: Monitoring and Report
Module 16 Title Controller Services
Module 16 Content
- Controller Services Overview
- Common Controller Services
- Exercise: Adding Apache Hive Controller
Module 17 Title Integrating NiFi with the Cloudera Ecosystem
Module 17 Content
- NiFi Integration Architecture
- NiFi Ecosystem Processors
- A Closer Look at NiFi and Apache Hive
- A Closer Look at NiFi and Apache Kafka
- Exercise: Integrating Dataflows with Kafka and HDFS
Module 18 Title NiFi Security
Module 18 Content
- NiFi Security Overview
- Securing Access to the NiFi UI
- Authentication
- Authorization
- NiFi Registry Security
- NiFi Security Summary
Module 19 Content
Module 20 Content
Module 21 Content
Module 22 Content
Module 23 Content
Module 24 Content
Module 25 Content
Module 26 Content
Module 27 Content
Module 28 Content
Module 29 Content
Module 30 Content
Module 31 Content
Module 32 Content
Module 33 Content
Module 34 Content
Module 35 Content
Module 36 Content
Module 37 Content
Module 38 Content
Module 39 Content
Module 40 Content
Module 41 Content
Module 42 Content
Module 43 Content
Module 44 Content
Module 45 Content
Module 46 Content
Module 47 Content
Module 48 Content
Module 49 Content
Module 50 Content

Administrator Training CDP Private Cloud Base

Cloudera's four-day administrator training course for CDP Private Cloud Base provides participants with a comprehensive understanding of all the steps necessary to operate and maintain on-premises clusters using Cloudera Manager. From installation and configuration through load balancing and tuning, this Cloudera training course is the best preparation for the real-world challenges faced by administrators who run CDP Private Cloud Base.

This course is best suited to systems administrators who have at least basic Linux experience. Prior knowledge of CDP, nor earlier platforms such as Cloudera’s CDH or Hortonworks HDP, is not required.

Get certified

Upon completion of the course, attendees are encouraged to continue their studies and register for the CCA Administrator exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

Additional Info

Certification Course & Certificate
Course Code CDPA
Price 13600
Exam Price Exclude
Exam Code CDP-2001
Duration 4 Days
Principals Cloudera
Schedule
26-29 Feb 2024
11-14 Jun 2024
3-6 Sep 2024
11-14 Nov 2024
Audience
Prerequisities
At Course Completion
Through instructor-led discussion and interactive, hands-on exercises, you will learn to
- Install Cloudera Manager•Use Cloudera Manager to install a CDP Private Cloud Base cluster
- Configure and monitor the cluster using Cloudera Manager
- Understand, evaluate, and select the most appropriate data storage option
- Optimize cluster performance
- Perform routine cluster maintenance tasks
- Detect, troubleshoot, and repair problems with the cluster
Module 1 Title Cloudera Data Platform
Module 1 Content
- Industry Trends for BigData
- The Challenge to Become Data-Driven
- The Enterprise Data Cloud
- CDP Overview
- CDP Form Factors
Module 2 Title CDP Private Cloud Base Installation
Module 2 Content
- Installation Overview
- Cloudera Manager Installation
- CDP Runtime Overview
- Cloudera Manager Introduction
Module 3 Title Cluster Configuration
Module 3 Content
- Overview
- Configuration Settings
- Modifying Service Configurations
- Configuration Files
- Managing Role Instances
- Adding New Services
- Adding and Removing Hosts
Module 4 Title Data Storage
Module 4 Content
- Overview
- HDFS Topology and Roles
- HDFS Performance and Fault Tolerance
- HDFS and Hadoop Security Overview
- Working with HDFS
- HBase Overview
- Kudu Overview
- Cloud Storage Overview
Module 5 Title Data Ingest
Module 5 Content
- Data Ingest Overview
- File Formats
- Ingesting Data using File Transfer or REST Interfaces
- Importing Data from Relational Databases with Apache Sqoop
- Ingesting Data Using NiFi
- Best Practices for Importing Data
Module 6 Title Data Flow
Module 6 Content
- Overview of Cloudera Flow Management and NiFi
- NiFi Architecture
- Cloudera Edge Flow Management and MiNiFi
- Controller Services
- Apache Kafka Overview
- Apache Kafka Cluster Architecture
- Apache Kafka Command Line Tools
Module 7 Title Data Access and Discovery
Module 7 Content
- Apache Hive
- Apache Impala
- Apache Impala Tuning
- Search Overview
- Hue Overview
- Managing and Configuring Hue
- Hue Authentication and Authorization
- CDSW Overview
Module 8 Title Data Compute
Module 8 Content
- YARN Overview
- Running Applications on YARN
- Viewing YARN Applications
- YARN Application Logs
- MapReduce Applications
- YARN Memory and CPU Settings
- Tez Overview
- Hive on Tez
- ACID for Hive
- Spark Overview
- How Spark Applications Run on YARN
- Monitoring Spark Applications
- Phoenix Overview
Module 9 Title Managing Resources
Module 9 Content
- Configuring cgroups with CPU Scheduling
- The Capacity Scheduler
- Managing Queues
  Advanced Cluster Configuration
- Impala Query Scheduling
- Planning Your Cluster
- General Planning Considerations
- Choosing the Right Hardware
- Network Considerations
- CDP Private Cloud Considerations
- Configuring Nodes
Module 10 Title Advanced Cluster Configuration
Module 10 Content
- Configuring Service Ports
- Tuning HDFS and MapReduce
- Managing Cluster Growth
- Erasure Coding
- Enabling HDFS High Availability
Module 11 Title Cluster Maintenance
Module 11 Content
- Checking HDFS Status
- Copying Data Between Clusters
- Rebalancing Data in HDFS
- HDFS Directory Snapshots
- Host Maintenance
- Upgrading a Cluster
Module 12 Title Cluster Monitoring
Module 12 Content
- Cloudera Manager Monitoring Features
- Health Tests
- Events and Alerts
- Charts and Reports
- Monitoring Recommendations
Module 13 Title Cluster Troubleshooting
Module 13 Content
- Overview
- Troubleshooting Tools
- Misconfiguration Examples
Module 14 Title Security
Module 14 Content
- Data Governance with SDX
- Hadoop Security Concepts
- Hadoop Authentication Using Kerberos
- Hadoop Authorization
- Hadoop Encryption
- Securing a Hadoop Cluster
- Apache Ranger
- Apache Atlas
- Backup and Recovery
Module 15 Title Private Cloud / Public Cloud
Module 15 Content
- CDP Overview
- Private Cloud Capabilities
- Public Cloud Capabilities
- What is Kubernetes?
- WXM Overview
- Auto-scaling
Module 16 Content
Module 17 Content
Module 18 Content
Module 19 Content
Module 20 Content
Module 21 Content
Module 22 Content
Module 23 Content
Module 24 Content
Module 25 Content
Module 26 Content
Module 27 Content
Module 28 Content
Module 29 Content
Module 30 Content
Module 31 Content
Module 32 Content
Module 33 Content
Module 34 Content
Module 35 Content
Module 36 Content
Module 37 Content
Module 38 Content
Module 39 Content
Module 40 Content
Module 41 Content
Module 42 Content
Module 43 Content
Module 44 Content
Module 45 Content
Module 46 Content
Module 47 Content
Module 48 Content
Module 49 Content
Module 50 Content

Cloudera Data Scientist Training

This four-day workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges.

Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment.

Additional Info

Certification Course only
Course Code CDST
Price RM13600
Exam Price Exclude
Duration 4 Days
Principals Cloudera
Schedule
11-14 Mar 2024
4-7 Jun 2024
16-19 Dec 2024
Audience
Prerequisities
At Course Completion
The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful.

Workshop participants should have a basic understanding of Python or R and some experience exploring and analyzing data and developing statistical or machine learning models. Knowledge of Hadoop or Spark is not required.
Module 1 Title Overview of data science and machine learning at scale
Module 1 Content
Module 2 Title Overview of the Hadoop ecosystem
Module 2 Content
Module 3 Title Working with HDFS data and Hive tables using Hue
Module 3 Content
Module 4 Title Introduction to Cloudera Data Science Workbench
Module 4 Content
Module 5 Title Overview of Apache Spark 2
Module 5 Content
Module 6 Title Reading and writing data
Module 6 Content
Module 7 Title Inspecting data quality
Module 7 Content
Module 8 Title Cleansing and transforming data
Module 8 Content
Module 9 Title Summarizing and grouping data
Module 9 Content
Module 10 Title Combining, splitting, and reshaping data
Module 10 Content
Module 11 Title Exploring data
Module 11 Content
Module 12 Title Configuring, monitoring, and troubleshooting Spark applications
Module 12 Content
Module 13 Title Overview of machine learning in Spark MLlib
Module 13 Content
Module 14 Title Extracting, transforming, and selecting features
Module 14 Content
Module 15 Title Building and evaluating regression models
Module 15 Content
Module 16 Title Building and evaluating classification models
Module 16 Content
Module 17 Title Building and evaluating clustering models
Module 17 Content
Module 18 Title Cross-validating models and tuning hyperparameters
Module 18 Content
Module 19 Title Building machine learning pipelines
Module 19 Content
Module 20 Title Deploying machine learning models
Module 20 Content
Module 21 Content
Module 22 Content
Module 23 Content
Module 24 Content
Module 25 Content
Module 26 Content
Module 27 Content
Module 28 Content
Module 29 Content
Module 30 Content
Module 31 Content
Module 32 Content
Module 33 Content
Module 34 Content
Module 35 Content
Module 36 Content
Module 37 Content
Module 38 Content
Module 39 Content
Module 40 Content
Module 41 Content
Module 42 Content
Module 43 Content
Module 44 Content
Module 45 Content
Module 46 Content
Module 47 Content
Module 48 Content
Module 49 Content
Module 50 Content

Cloudera Certified Associate (CCA) Administrator

Individuals who earn the CCA Administrator certification have demonstrated the core systems and cluster administrator skills sought by companies and organizations deploying Cloudera in the enterprise.

Number of Questions: 8–12 performance-based (hands-on) tasks on pre-configured Cloudera Enterprise cluster.
Time Limit: 120 minutes
Passing Score: 70%
Language: English

Additional Info

Certification Certificate only
Price RM1327.50
Exam Price Exclude
Exam Code CCA-131
Duration 0.5 Day
CertificationInfo Cloudera Certified Associate (CCA) Administrator
Principals Cloudera
Extra Button Suggested Training Course
Audience
There are no prerequisites for taking any Cloudera certification exam; however, a background in system administration, or equivalent training is strongly recommended. The CCA Administrator exam (CCA131) follows the same objectives as Cloudera Administrator Training and the training course is an excellent part of preparation for the exam.
Prerequisities
Install

Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.
- Set up a local CDH repository
- Perform OS-level configuration for Hadoop installation
- Install Cloudera Manager server and agents
- Install CDH using Cloudera Manager
- Add a new node to an existing cluster
- Add a service using Cloudera Manager
Configure

Perform basic and advanced configuration needed to effectively administer a Hadoop cluster
- Configure a service using Cloudera Manager
- Create an HDFS user's home directory
- Configure NameNode HA
- Configure ResourceManager HA
- Configure proxy for Hiveserver2/Impala
Manage

Maintain and modify the cluster to support day-to-day operations in the enterprise
- Rebalance the cluster
- Set up alerting for excessive disk fill
- Define and install a rack topology script
- Install new type of I/O compression library in cluster
- Revise YARN resource assignment based on user feedback
- Commission/decommission a node
Secure

Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices
- Configure HDFS ACLs
- Install and configure Sentry
- Configure Hue user authorization and authentication
- Enable/configure log and query redaction
- Create encrypted zones in HDFS
Test

Benchmark the cluster operational metrics, test system configuration for operation and efficiency
- Execute file system commands via HTTPFS
- Efficiently copy data within a cluster/between clusters
- Create/restore a snapshot of an HDFS directory
- Get/set ACLs for a file or directory structure
- Benchmark the cluster (I/O, CPU, network)
Troubleshoot

Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios
- Resolve errors/warnings in Cloudera Manager
- Resolve performance problems/errors in cluster operation
- Determine reason for application failure
- Configure the Fair Scheduler to resolve application delays

Cloudera Certified Associate (CCA) Spark and Hadoop Developer

A CCA Spark and Hadoop Developer has proven their core skills to ingest, transform, and process data using Apache Spark and core Cloudera Enterprise tools.

Number of Questions: 8–12 performance-based (hands-on) tasks on Cloudera Enterprise cluster. See below for full cluster configuration
Time Limit: 120 minutes
Passing Score: 70%
Language: Englis

Additional Info

Certification Certificate only
Price RM1327.50
Exam Price Exclude
Exam Code CCA-175
Duration 0.5 Day
CertificationInfo Cloudera Certified Associate (CCA) Spark and Hadoop Developer
Principals Cloudera
Extra Button Suggested Training Course
Audience
There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training for Spark and Hadoop and the training course is an excellent preparation for the exam.
Prerequisities
Data Ingest

The skills to transfer data between external systems and your cluster. This includes the following:
- Import data from a MySQL database into HDFS using Sqoop
- Export data to a MySQL database from HDFS using Sqoop
- Change the delimiter and file format of data during import using Sqoop
- Ingest real-time and near-real-time streaming data into HDFS
- Process streaming data as it is loaded onto the cluster
- Load data into and out of HDFS using the Hadoop File System commands
Transform, Stage, and Store

Convert a set of data values in a given format stored in HDFS into new data values or a new data format and write them into HDFS.
- Load RDD data from HDFS for use in Spark applications
- Write the results from an RDD back into HDFS using Spark
- Read and write files in a variety of file formats
- Perform standard extract, transform, load (ETL) processes on data
Data Analysis

Use Spark SQL to interact with the metastore programmatically in your applications. Generate reports by using queries against loaded data.
- Use metastore tables as an input source or an output sink for Spark applications
- Understand the fundamentals of querying datasets in Spark
- Filter data using Spark
- Write queries that calculate aggregate statistics
- Join disparate datasets using Spark
- Produce ranked or sorted data
Configuration

This is a practical exam and the candidate should be familiar with all aspects of generating a result, not just writing code.
- Supply command-line options to change your application configuration, such as increasing available memory

Cloudera Certified Associate (CCA) Data Analyst

A CCA Data Analyst has proven their core analyst skills to load, transform, and model Hadoop data in order to define relationships and extract meaningful results from the raw input.

You are given eight to twelve customer problems with a unique large data set, a CDH cluster, and 120 minutes. For each problem, you must implement a technical solution with a high degree of precision that meets all the requirements. You may use any tool or combination of tools on the cluster (see list below) -- you get to pick the tool(s) that are right for the job. You must possess enough knowledge to analyze the problem and arrive at an optimal approach given the time allowed. You need to know what you should do and then do it on a live cluster, including a time limit and while being watched by a proctor.

Number of Questions: 8–12 performance-based (hands-on) tasks on CDH 5 cluster. See below for full cluster configuration
Time Limit: 120 minutes
Passing Score: 70%
Language: English

Additional Info

Certification Certificate only
Price RM1327.50
Exam Price Exclude
Exam Code CCA-159
Duration 0.5 Day
CertificationInfo Cloudera Certified Associate (CCA) Data Analyst
Principals Cloudera
Extra Button Suggested Training Course
Audience
Candidates for CCA Data Analyst can be SQL devlopers, data analysts, business intelligence specialists, developers, system architects, and database administrators. There are no prerequisites.
Prerequisities
Prepare the Data

Use Extract, Transfer, Load (ETL) processes to prepare data for queries.
- Import data from a MySQL database into HDFS using Sqoop
- Export data to a MySQL database from HDFS using Sqoop
- Move data between tables in the metastore
- Transform values, columns, or file formats of incoming data before analysis
Provide Structure to the Data

Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.
- Create tables using a variety of data types, delimiters, and file formats
- Create new tables using existing tables to define the schema
- Improve query performance by creating partitioned tables in the metastore
- Alter tables to modify existing schema
- Create views in order to simplify queries
Data Analysis

Use Query Language (QL) statements in Hive and Impala to analyze data on the cluster.
- Prepare reports using SELECT commands including unions and subqueries
- Calculate aggregate statistics, such as sums and averages, during a query
- Create queries against multiple data sources by using join commands
- Transform the output format of queries by using built-in functions
- Perform queries across a group of rows using windowing functions

Cloudera Data Analyst Training

Cloudera University’s four-day Data Analyst Training course will teach you to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

Advance Your Ecosystem Expertise

Apache Hive makes transformation and analysis of complex, multi-structured data scalable in Cloudera environments. Apache Impala enables real-time interactive analysis of the data stored in Hadoop using a native SQL environment. Together, they make multi-structured data accessible to analysts, database administrators, and others without Java programming expertise.

Additional Info

Certification Course & Certificate
Course Code CDAT
Price RM13600
Exam Price Exclude
Exam Code CDP-4001
Duration 4 Days
CertificationInfo Cloudera Certified Data Analyst
Principals Cloudera
Schedule
16-19 Jan 2024
26-29 Feb 2024 (Penang Date)
7-10 May 2024
16-19 Jul 2024 (Penang Date)
6-9 Aug 2024
22-25 Oct 2024
Audience
This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity. Prior knowledge of Apache Hadoop is not required.
Prerequisities
At Course Completion
Get Certified

Upon completion of the course, attendees are encouraged to continue their study and register for the CCA Data Analyst exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.
Module 1 Title Apache Hadoop Fundamentals
Module 1 Content
- The Motivation for Hadoop
- Hadoop Overview
- Data Storage: HDFS
- Distributed Data Processing: YARN, MapReduce, and Spark
- Data Processing and Analysis: Hive, and Impala
- Database Integration: Sqoop
- Other Hadoop Data Tools
- Exercise Scenario Explanation
Module 2 Title Introduction to Apache Hive and Impala
Module 2 Content
- What Is Hive?
- What Is Impala?
- Why Use Hive and Impala?
- Schema and Data Storage
- Comparing Hive and Impala to Traditional Databases
- Use Cases
Module 3 Title Querying with Apache Hive and Impala
Module 3 Content
- Databases and Tables
- Basic Hive and Impala Query Language Syntax
- Data Types
- Using Hue to Execute Queries
- Using Beeline (Hive's Shell)
- Using the Impala Shell
Module 4 Title Common Operators and Built-In Functions
Module 4 Content
- Operators
- Scalar Functions
- Aggregate Functions
Module 5 Title Data Management
Module 5 Content
- Data Storage
- Creating Databases and Tables
- Loading Data
- Altering Databases and Tables
- Simplifying Queries with Views
- Storing Query Results
Module 6 Title Data Storage and Performance
Module 6 Content
- Partitioning Tables
- Loading Data into Partitioned Tables
- When to Use Partitioning
- Choosing a File Format
- Using Avro and Parquet File Formats
Module 7 Title Working with Multiple Datasets
Module 7 Content
- UNION and Joins
- Handling NULL Values in Joins
- Advanced Joins
Module 8 Title Analytic Functions and Windowing
Module 8 Content
- Using Common Analytic Functions
- Other Analytic Functions
- Sliding Windows
Module 9 Title Complex Data
Module 9 Content
- Complex Data with Hive
- Complex Data with Impala
Module 10 Title Analyzing Text
Module 10 Content
- Using Regular Expressions with Hive and Impala
- Processing Text Data with SerDes in Hive
- Sentiment Analysis and n-grams
Module 11 Title Apache Hive Optimization
Module 11 Content
- Understanding Query Performance
- Bucketing
- Hive on Spark
Module 12 Title Apache Impala Optimization
Module 12 Content
- How Impala Executes Queries
- Improving Impala Performance
Module 13 Title Extending Apache Hive and Impala
Module 13 Content
- Custom SerDes and File Formats in Hive
- Data Transformation with Custom Scripts in Hive
- User-Defined Functions
- Parameterized Queries
Module 14 Title Choosing the Best Tool for the Job
Module 14 Content
- Comparing Hive, Impala, and Relational Databases
- Which to Choose?
Module 15 Title Conclusion
Module 15 Content
- How Impala Executes Queries
- Improving Impala Performance
Module 16 Title Extending Hive and Impala
Module 16 Content
- Custom SerDes and File Formats in Hive
- Data Transformation with Custom Scripts in Hive
- User-Defined Functions
- Parameterized Queries
Module 17 Title Choosing the Best Tool for the Job
Module 17 Content
- Comparing Pig, Hive, Impala, and Relational Databases
- Which to Choose?
Module 18 Title Conclusion
Module 18 Content
Module 19 Content
Module 20 Content
Module 21 Content
Module 22 Content
Module 23 Content
Module 24 Content
Module 25 Content
Module 26 Content
Module 27 Content
Module 28 Content
Module 29 Content
Module 30 Content
Module 31 Content
Module 32 Content
Module 33 Content
Module 34 Content
Module 35 Content
Module 36 Content
Module 37 Content
Module 38 Content
Module 39 Content
Module 40 Content
Module 41 Content
Module 42 Content
Module 43 Content
Module 44 Content
Module 45 Content
Module 46 Content
Module 47 Content
Module 48 Content
Module 49 Content
Module 50 Content

Cloudera Data Engineering: Developing Application with Apache Spark

This four-day hands-on training course teaches the key concepts and knowledge developers need to use Apache Spark in developing high-performance, parallel applications on the Cloudera Data Platform (CDP).

Hands-on exercises allow students to practice writing Spark applications that integrate with CDP core components, such as Hive and Kafka. Participants will learn how to use Spark SQL to query structured data, use Spark Streaming to perform real-time processing on streaming data, and work with “big data” stored in a distributed file system.

After taking this course, participants will be prepared to face real-world challenges and build applications that make fast and relevant decisions, implementing interactive analysis applied to a wide variety of use cases, architectures, and industries.

Additional Info

Certification Course & Certificate
Course Code CDTSH
Price RM13600
Exam Price Exclude
Exam Code CDP-3001
Duration 4 Days
Principals Cloudera
Schedule
20-23 Feb 2024
4-7 Mar 2024
2-5 Apr 2024
2-5 Jul 2024
7-10 Oct 2024
Audience
This course is designed for developers and data engineers.
Prerequisities
All students are expected to have basic Linux experience and basic proficiency with either Python or Scala programming languages. Basic knowledge of SQL is helpful. Prior knowledge of Spark and Hadoop is not required
At Course Completion
Through instructor-led discussion and interactive, hands-on exercises, you will learn how to:
- Distribute, store, and process data in a CDP cluster
- Write, configure, and deploy Apache Spark applications
- Use Spark interpreters and Spark applications to explore, process, and analyze distributed data
- Query data using Spark SQL, DataFrames, and Hive tables
- Use Spark Streaming together with Kafka to process a data stream
Module 1 Title Introduction to Zeppelin
Module 1 Content
- Why Notebooks?
- Zeppelin Notes
- Demo: Apache Spark In 5 Minutes
Module 2 Title HDFS Introduction
Module 2 Content
- HDFS Overview
- HDFS Components and Interactions
- Additional HDFS Interactions
- Ozone Overview
- Exercise: Working with HDFS
Module 3 Title YARN Introduction
Module 3 Content
- YARN Overview
- YARN Components and Interaction
- Working with YARN
- Exercise: Working with YARN
Module 4 Title Distributed Processing History
Module 4 Content
- The Disk Years: 2000 ->2010
- The Memory Years: 2010 ->2020
- The GPU Years: 2020 ->
Module 5 Title Working with DataFrames
Module 5 Content
- Introduction to DataFrames
- Exercise: Introducing DataFrames
- Exercise: Reading and Writing DataFrames
- Exercise: Working with Columns
- Exercise: Working with Complex Types
- Exercise: Working with Complex Types
- Exercise: Combining and Splitting DataFrames
- Exercise: Summarizing and Grouping DataFrames
- Exercise: Working with UDFs
- Exercise: Working with Windows
Module 6 Title Introduction to Apache Hive
Module 6 Content
- About Hive
Module 7 Title Hive and Spark Integration
Module 7 Content
- Hive and Spark Integration
- Exercise: Spark Integration with Hive
Module 8 Title Data Visualization with Zeppelin
Module 8 Content
- Introduction to Data Visualization with Zeppelin
- Zeppelin Analytics
- Zeppelin Collaboration
- Exercise: AdventureWorks
Module 9 Title Distributed Processing Challenges
Module 9 Content
- Shuffle
- Skew
- Order
Module 10 Title Spark Distributed Processing
Module 10 Content
- Spark Distributed Processing
- Exercise: Explore Query Execution Order
Module 11 Title Spark Distributed Persistence
Module 11 Content
- DataFrame and Dataset Persistence
- Persistence Storage Levels
- Viewing Persisted RDDs
- Exercise: Persisting DataFrames
Module 12 Title Writing, Configuring, and Running Spark Applications
Module 12 Content
- Writing a Spark Application
- Building and Running an Application
- Application Deployment Mode
- The Spark Application Web UI
- Configuring Application Properties
- Exercise: Writing, Configuring, and Running a Spark Application
Module 13 Title Introduction to Structured Streaming
Module 13 Content
- Introduction to Structured Streaming
- Exercise: Processing Streaming Data
Module 14 Title Message Processing with Apache Kafka
Module 14 Content
- What is Apache Kafka?
- Apache Kafka Overview
- Scaling Apache Kafka
- Apache Kafka Cluster Architecture
- Apache Kafka Command Line Tools
Module 15 Title Structured Streaming with Apache Kafka
Module 15 Content
- Receiving Kafka Messages
- Sending Kafka Messages
- ·Exercise: Working with Kafka Streaming Messages
Module 16 Title Aggregating and Joining Streaming DataFrames
Module 16 Content
- Streaming Aggregation
- Joining Streaming DataFrames
- Exercise: Aggregating and Joining Streaming DataFrames
Module 17 Title Appendix: Working with Datasets in Scala
Module 17 Content
- Working with Datasets in Scala
- Exercise: Using Datasets in Scala
Module 18 Content
Module 19 Content
Module 20 Content
Module 21 Content
Module 22 Content
Module 23 Content
Module 24 Content
Module 25 Content
Module 26 Content
Module 27 Content
Module 28 Content
Module 29 Content
Module 30 Content
Module 31 Content
Module 32 Content
Module 33 Content
Module 34 Content
Module 35 Content
Module 36 Content
Module 37 Content
Module 38 Content
Module 39 Content
Module 40 Content
Module 41 Content
Module 42 Content
Module 43 Content
Module 44 Content
Module 45 Content
Module 46 Content
Module 47 Content
Module 48 Content
Module 49 Content
Module 50 Content

PMP, Project Management Professional (PMP), CAPM, Certified Associate in Project Management (CAPM) are registered marks of the Project Management Institute, Inc.

Training with Iverson classes

Additional Info

Additional Info

Additional Info

Additional Info

Additional Info

Additional Info

Install

Configure

Manage

Secure

Test

Troubleshoot

Additional Info

Data Ingest

Transform, Stage, and Store

Data Analysis

Configuration

Additional Info

Prepare the Data

Provide Structure to the Data

Data Analysis

Additional Info

Additional Info