fbpx

Training with Iverson classes

Training is not a commodity – all training centres are not the same. Iverson Associates Sdn Bhd is the most established, the most reputable, and the top professional IT training provider in Malaysia. With a large pool of experienced and certified trainers, state-of-the-art facilities, and well-designed courseware, Iverson offers superior training, a more impactful learning experience and highly effective results.

At Iverson, our focus is on providing high-quality IT training to corporate customers, meeting their learning needs and helping them to achieve their training objectives. Iverson has the flexibility to provide training solutions whether for a single individual or the largest corporation in a well-paced or accelerated training programme.

Our courses continue to evolve along with the fast-changing technological advances. Our instructor-led training services are available on a public and a private (in-company) basis. Some of our courses are also available as online, on demand, and hybrid training.

This four-day instructor-led course begins by introducing Apache Kafka, explaining its key concepts and architecture, and discussing several common use cases. Building on this foundation, you will learn how to plan a Kafka deployment, and then gain hands-on experience by installing and configuring your own cloud-based, multi-node cluster running Kafka on the Cloudera Data Platform (CDP).

You will then use this cluster during more than 20 hands-on exercises that follow, covering a range of essential skills, starting with how to create Kafka topics, producers, and consumers, then continuing through progressively more challenging aspects of Kafka operations and development, such as those related to scalability, reliability, and performance problems. Throughout the course, you will learn and use Cloudera’s recommended tools for working with Kafka, including Cloudera Manager, Schema Registry, Streams Messaging Manager, and Cruise Control.

Additional Info

  • Certification Course & Certificate
  • Course Code CTAK
  • Price 13600
  • Exam Price Exclude
  • Exam Code CDP-4001
  • Duration 4 Days
  • Principals Cloudera
  • Schedule

    19-22 Mar 2024

    12-15 Aug 2024

    1-4 Oct 2024

  • Audience

    This course is designed for system administrators, data engineers, and developers. 

  • Prerequisities

    All students are expected to have basic Linux experience, and basic proficiency with the Java programming language is recommended. No prior experience with Apache Kafka is necessary.

  • At Course Completion

    During this course, you learn how to:

    • Plan, deploy and operate Kafka clusters
    • Create and manage topics
    • Develop producers and consumers
    • Use replication to improve fault tolerance
    • Use partitioning to improve scalability
    • Troubleshoot common problems and performance issues
  • Module 1 Title Kafka Overview
  • Module 1 Content
    • High-Level Architecture
    • Common Use Cases
    • Cloudera's Distribution of Apache Kafka
  • Module 2 Title Deploying Apache Kafka
  • Module 2 Content
    • System Requirements and Dependencies
    • Service Roles
    • Planning Your Deployment Deploying Kafka Services
    • Exercise: Preparing the Exercise Environment
    • Exercise: Installing the Kafka Service with Cloudera Manager
    • Exercise (optional): Create Metrics Dashboards
    • Exercise (optional): Using the CM API
  • Module 3 Title Kafka Command Line Basics
  • Module 3 Content
    • Create and Manage Topics
    • Running Producers and Consumers

     

  • Module 4 Title Using Streams Messaging Manager (SMM)
  • Module 4 Content
    • Streams Messaging Manager Overview 
    • Producers, Topics, and Consumers
    • Data Explorer
    • Brokers
    • Topic Management
    • Exercise: Managing Topics using the CLI
    • Exercise: Connecting Producers and Consumers from the Command Line
  • Module 5 Title Kafka Java API Basics
  • Module 5 Content
    • Overview of Kafka's APIs
    • Topic Management from the Java API
    • Exercise (optional): Managing Kafka Topics Using the Java API
    • Using Producers and Consumers from the Java API
    • Exercise: Developing Producers and Consumers with the Java API
  • Module 6 Title Improving Availability through Replication
  • Module 6 Content
    • Replication
    • Exercise: Observing Downtime Due to Broker Failure
    • Considerations for the Replication Factor
    • Exercise: Adding Replicas to Improve Availability

     

  • Module 7 Title Improving Application Scalability
  • Module 7 Content
    • Partitioning
    • How Messages are Partitioned
    • Exercise: Observing How Partitioning Affects Performance
    • Consumer Groups
    • Exercise: Implementing Consumer Groups
    • Consumer Rebalancing
    • Exercise: Using a Key to Control Partition Assignment
  • Module 8 Title Improving Application Reliability
  • Module 8 Content
    • Delivery Semantics
    • Demonstration (optional): ISRs vs. ACKs
    • Producer Delivery
    • Exercise: Idempotent Producer
    • Transactions
    • Exercise: Transactional Producers and Consumers
    • Handling Consumer Failure
    • Offset Management
    • Exercise: Detecting and Suppressing Duplicate Messages
    • Exercise: Handling Invalid Records
    • Handling Producer Failure
  • Module 9 Title Analyzing Kafka Clusters with SMM
  • Module 9 Content
    • End-to-End Latency
    • Notifiers 
    • Alert Policies 
    • Use Cases 
  • Module 10 Title Monitoring Kafka
  • Module 10 Content
    • Monitoring Overview
    • Monitoring using Cloudera Manager
    • Charts and Reports in CM
    • Monitoring Recommendations
    • Metrics for Troubleshooting
    • Diagnosing Service Failure
    • Exercise: Monitoring Kafka

     

  • Module 11 Title Managing Kafka
  • Module 11 Content
    • Managing Kafka Topic Storage
    • Demonstration (optional): Message Retention Period
    • Log Cleanup and Collection
    • Rebalancing Partitions
    • Cruise Control
    • Exercise: Installing Cruise Control
    • Exercise: Troubleshooting Kafka Topics
    • Unclean Leader Election
    • Exercise: Unclean Leader Election
    • Adding and Removing Brokers
    • Exercise: Adding and Removing Brokers
    • Best Practices
  • Module 12 Title Message Structure, Format, and Versioning
  • Module 12 Content
    • Message Structure
    • Schema Registry
    • Defining Schemas
    • Schema Evolution and Versioning
    • Schema Registry Client
    • Exercise: Using an Avro Schema
  • Module 13 Title Improving Application Performance
  • Module 13 Content
    • Message Size
    • Batching
    • Compression
    • Exercise: Observing How Compression Affects Performance

     

  • Module 14 Title Improving Kafka Service Performance
  • Module 14 Content
    • Performance Tuning Strategies for the Administrator
    • Cluster Sizing
    • Exercise: Planning Capacity Needed for a Use Case
  • Module 15 Title Securing the Kafka Cluster
  • Module 15 Content
    • Encryption
    • Authentication
    • Authorization
    • Auditing
  • Module 16 Content
  • Module 17 Content
  • Module 18 Content
  • Module 19 Content
  • Module 20 Content
  • Module 21 Content
  • Module 22 Content
  • Module 23 Content
  • Module 24 Content
  • Module 25 Content
  • Module 26 Content
  • Module 27 Content
  • Module 28 Content
  • Module 29 Content
  • Module 30 Content
  • Module 31 Content
  • Module 32 Content
  • Module 33 Content
  • Module 34 Content
  • Module 35 Content
  • Module 36 Content
  • Module 37 Content
  • Module 38 Content
  • Module 39 Content
  • Module 40 Content
  • Module 41 Content
  • Module 42 Content
  • Module 43 Content
  • Module 44 Content
  • Module 45 Content
  • Module 46 Content
  • Module 47 Content
  • Module 48 Content
  • Module 49 Content
  • Module 50 Content
RM13,600.00(+RM1,088.00 Tax)
* Training Dates:

This three-day hands-on training course delivers the key concepts and expertise developers need to improve the performance of their Apache Spark applications. During the course, participants will learn how to identify common sources of poor performance in Spark applications, techniques for avoiding or solving them, and best practices for Spark application monitoring.Apache Spark Application Performance Tuning presents the architecture and concepts behind Apache Spark and underlying data platform, then builds on this foundational understanding by teaching students how to tune Spark application code. The course format emphasizes instructor-led demonstrations illustrate both performance issues and the techniques that address them, followed by hands-on exercises that give students an opportunity to practice what they've learned through an interactive notebook environment. The course applies to Spark 2.4, but also introduces the Spark 3.0 Adaptive Query Execution framework.

Additional Info

  • Certification Course only
  • Course Code ASPT
  • Price RM10200
  • Exam Price Exclude
  • Duration 3 Days
  • Principals Cloudera
  • Schedule

    22-24 Jan 2024

    24-26 Apr 2024

    17-19 Jul 2024

    9-11 Sep 2024

  • Audience

    This course is designed for software developers, engineers, and data scientists who have experience developing Spark applications and want to learn how to improve the performance of their code. This is not an introduction to Spark.

  • Prerequisities

    Spark examples and hands-on exercises are presented in Python and the ability to program in this language is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful.

  • At Course Completion

    Students who successfully complete this course will be able to:

    • Understand  Apache  Spark's  architecture,  job  execution,  and  how  techniques  such  as  lazy  execution  and pipelining can improve runtime performance
    • Evaluate the performance characteristics of core data structures such as RDD and DataFrames
    • Select the file formats that will provide the best performance for your application
    • Identify and resolve performance problems caused by data skew
    • Use partitioning, bucketing, and join optimizations to improve SparkSQL performance
    • Understand the performance overhead of Python-based RDDs, DataFrames, and user-defined functions
    • Take advantage of caching for better application performance
    • Understand how the Catalyst and Tungsten optimizers work
    • Understand   how   Workload   XM   can   help   troubleshoot   and   proactively   monitor   Spark   applications performance
    • Learn about the new features in Spark 3.0 and specifically how the Adaptive Query Execution engine improves performance
  • Module 1 Title Spark Architecture
  • Module 1 Content
    • RDDs
    • DataFrames and Datasets
    • Lazy Evaluation
    • Pipelining
  • Module 2 Title Data Sources and Formats
  • Module 2 Content
    • Available Formats Overview
    • Impact on Performance
    • The Small Files Problem
  • Module 3 Title Inferring Schemas
  • Module 3 Content
    • The Cost of Inference
    • Mitigating Tactics
  • Module 4 Title Dealing With Skewed Data
  • Module 4 Content
    • Recognizing Skew
    • Mitigating Tactics
  • Module 5 Title Catalyst and Tungsten Overview
  • Module 5 Content
    • Catalyst Overview
    • Tungsten Overview

     

  • Module 6 Title Mitigating Spark Shuffles
  • Module 6 Content
    • Denormalization
    • Broadcast Joins
    • Map-Side Operations
    • Sort Merge Joins
  • Module 7 Title Partitioned and Bucketed Tables
  • Module 7 Content
    • Partitioned Tables
    • Bucketed Tables
    • Impact on Performance
  • Module 8 Title Improving Join Performance
  • Module 8 Content
    • Skewed Joins
    • Bucketed Joins
    • Incremental Joins
  • Module 9 Title Pyspark Overhead and UDFs
  • Module 9 Content
    • Pyspark Overhead
    • Scalar UDFs
    • Vector UDFs using Apache Arrow
    • Scala UDFs
  • Module 10 Title Caching Data for Reuse
  • Module 10 Content
    • Caching Options
    • Impact on Performance
    • Caching Pitfalls
  • Module 11 Title Workload XM (WXM) Introduction
  • Module 11 Content
    • WXM Overview
    • WXM for Spark Developers
  • Module 12 Title What's New in Spark 3.0?
  • Module 12 Content
    • Adaptive Number of Shuffle Partitions
    • Skew Joins
    • Convert Sort Merge Joins to Broadcast Joins
    • Dynamic Partition Pruning
    • Dynamic Coalesce Shuffle Partitions
  • Module 13 Content
  • Module 14 Content
  • Module 15 Content
  • Module 16 Content
  • Module 17 Content
  • Module 18 Content
  • Module 19 Content
  • Module 20 Content
  • Module 21 Content
  • Module 22 Content
  • Module 23 Content
  • Module 24 Content
  • Module 25 Content
  • Module 26 Content
  • Module 27 Content
  • Module 28 Content
  • Module 29 Content
  • Module 30 Content
  • Module 31 Content
  • Module 32 Content
  • Module 33 Content
  • Module 34 Content
  • Module 35 Content
  • Module 36 Content
  • Module 37 Content
  • Module 38 Content
  • Module 39 Content
  • Module 40 Content
  • Module 41 Content
  • Module 42 Content
  • Module 43 Content
  • Module 44 Content
  • Module 45 Content
  • Module 46 Content
  • Module 47 Content
  • Module 48 Content
  • Module 49 Content
  • Module 50 Content
RM10,200.00(+RM816.00 Tax)
* Training Dates:

One of the most critical functions of a data-driven enterprise is the ability to manage ingest and data flow across complex ecosystems. Does your team have the tools and skill sets to succeed at this? Apache NiFi provides this capability and our three-day Cloudera Dataflow: Flow Management with Apache Nifi course delivers the foundational training you'll need to succeed with NiFi. In addition to learning NiFi's key features and concepts, participants will gain hands-on experience creating, executing, managing, and optimizing NiFi dataflows throughout a variety of scenarios.

Additional Info

  • Certification Course & Certificate
  • Course Code CDFM
  • Price RM10200
  • Exam Price Exclude
  • Exam Code CDP-3001
  • Duration 3 Days
  • Principals Cloudera
  • Schedule

    5-7 Feb 2024

    13-15 May 2024

    26-28 Aug 2024

    4-6 Nov 2024

  • Audience

    This course is designed for developers, data engineers, administrators, and others with an interest in learning NiFi's innovative no-code, graphical approach to data ingest.

  • Prerequisities

    Although programming experience is not required, basic experience with Linux is presumed, and previous exposure to big data concepts and applications is helpful.

  • At Course Completion

    During this course, you learn how to:

    • Navigate the NiFi user interface
    • Define, configure, organize, and manage dataflows
    • Transform and trace data as it flows to its destination
    • Track changes to dataflows with NiFi Registry
    • Use the NiFi Expression Language to control dataflows
    • Optimize dataflows for better performance and maintainability
    • Connect dataflows with other systems, such as Apache Kafka, Apache Hive, and HDFS
  • Module 1 Title Introduction to Cloudera Flow Management
  • Module 1 Content
    • Overview of Cloudera Flow Management and NiFi
    • The NiFi User Interface
    • Demonstration: NiFi User Interface
    • Exercise: Build Your First Dataflow
  • Module 2 Title Processors
  • Module 2 Content
    • Overview of Processors
    • Processor Surface Panel
    • Processor Configuration
    • Exercise: Start Building a Dataflow Using Processors
  • Module 3 Title Connections
  • Module 3 Content
    • Overview of Connections
    • Connection Configuration
    • Connector Context Menu
    • Exercise: Connect Processors in a Dataflow
  • Module 4 Title Dataflows
  • Module 4 Content
    • Command and Control of a Dataflow
    • Processor Relationships
    • Back Pressure
    • Prioritizers
    • Labels
    • Exercise: Build a More Complex Dataflow
    • Exercise: Creating a Fork Using Relationships
    • Exercise: Set Back Pressure Thresholds
  • Module 5 Title Process Groups
  • Module 5 Content
    • Anatomy of a Process Group
    • Input and Output Ports
    • Exercise: Simplify Dataflows Using Process Groups
  • Module 6 Title FlowFile Provenance
  • Module 6 Content
    • Data Provenance Events
    • FlowFile Lineage
    • Replaying a FlowFile
    • Exercise: Using Data Provenance
  • Module 7 Title Dataflow Templates
  • Module 7 Content
    • Templates Overview
    • Managing Templates
    • Exercise: Creating, Using, and Managing Templates
  • Module 8 Title Apache NiFi Registry
  • Module 8 Content
    • Apache NiFi Registry Overview
    • Using the Registry
    • Exercise: Versioning Flows Using NiFi Registry
  • Module 9 Title FlowFile Attributes
  • Module 9 Content
    • FlowFile Attributes
    • Routing on Attributes
    • Exercise: Working with FlowFile Attributes
  • Module 10 Title NiFi Expression Language
  • Module 10 Content
    • NiFi Expression Language Overview
    • Syntax
    • Expression Language Editor
    • Setting Conditional Values
    • Exercise: Using the NiFi Expression Language
  • Module 11 Title Dataflow Optimization
  • Module 11 Content
    • Dataflow Optimization
    • Control Rate
    • Managing Compute
    • Exercise: Building an Optimized Dataflow
  • Module 12 Title NiFi Architecture
  • Module 12 Content
    • NiFi Architecture Overview
    • Cluster Architecture
    • Heartbeats
    • Managing Clusters
  • Module 13 Title Site-to-Site Dataflows
  • Module 13 Content
    • Site-to-Site Theory
    • Site-to-Site Architecture
    • Anatomy of a Remote Process Group
    • Adding and Configuring Remote Process Groups
    • Exercise: Building Site-to-Site Dataflows
  • Module 14 Title Cloudera Edge Management and MiNiFi
  • Module 14 Content
    • Overview of MiNiFi
    • Example Walk-through
  • Module 15 Title Monitoring and Reporting
  • Module 15 Content
    • Monitoring from NiFi
    • Overview of Reporting
    • Examples of Common Reporting Tasks
    • Exercise: Monitoring and Report
  • Module 16 Title Controller Services
  • Module 16 Content
    • Controller Services Overview
    • Common Controller Services
    • Exercise: Adding Apache Hive Controller
  • Module 17 Title Integrating NiFi with the Cloudera Ecosystem
  • Module 17 Content
    • NiFi Integration Architecture
    • NiFi Ecosystem Processors
    • A Closer Look at NiFi and Apache Hive
    • A Closer Look at NiFi and Apache Kafka
    • Exercise: Integrating Dataflows with Kafka and HDFS
  • Module 18 Title NiFi Security
  • Module 18 Content
    • NiFi Security Overview
    • Securing Access to the NiFi UI
    • Authentication
    • Authorization
    • NiFi Registry Security
    • NiFi Security Summary
  • Module 19 Content
  • Module 20 Content
  • Module 21 Content
  • Module 22 Content
  • Module 23 Content
  • Module 24 Content
  • Module 25 Content
  • Module 26 Content
  • Module 27 Content
  • Module 28 Content
  • Module 29 Content
  • Module 30 Content
  • Module 31 Content
  • Module 32 Content
  • Module 33 Content
  • Module 34 Content
  • Module 35 Content
  • Module 36 Content
  • Module 37 Content
  • Module 38 Content
  • Module 39 Content
  • Module 40 Content
  • Module 41 Content
  • Module 42 Content
  • Module 43 Content
  • Module 44 Content
  • Module 45 Content
  • Module 46 Content
  • Module 47 Content
  • Module 48 Content
  • Module 49 Content
  • Module 50 Content
RM10,200.00(+RM816.00 Tax)
* Training Dates:

Cloudera's four-day administrator training course for CDP Private Cloud Base provides participants with a comprehensive understanding of all the steps necessary to operate and maintain on-premises clusters using Cloudera Manager. From installation and configuration through load balancing and tuning, this Cloudera training course is the best preparation for the real-world challenges faced by administrators who run CDP Private Cloud Base.

This course is best suited to systems administrators who have at least basic Linux experience. Prior knowledge of CDP, nor earlier platforms such as Cloudera’s CDH or Hortonworks HDP, is not required.

Get certified

Upon completion of the course, attendees are encouraged to continue their studies and register for the CCA Administrator exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

Additional Info

  • Certification Course & Certificate
  • Course Code CDPA
  • Price 13600
  • Exam Price Exclude
  • Exam Code CDP-2001
  • Duration 4 Days
  • Principals Cloudera
  • Schedule

    26-29 Feb 2024

    11-14 Jun 2024

    3-6 Sep 2024

    11-14 Nov 2024

  • Audience
  • Prerequisities
  • At Course Completion

    Through instructor-led discussion and interactive, hands-on exercises, you will learn to

    • Install Cloudera Manager•Use Cloudera Manager to install a CDP Private Cloud Base cluster
    • Configure and monitor the cluster using Cloudera Manager
    • Understand, evaluate, and select the most appropriate data storage option
    • Optimize cluster performance
    • Perform routine cluster maintenance tasks
    • Detect, troubleshoot, and repair problems with the cluster
  • Module 1 Title Cloudera Data Platform
  • Module 1 Content
    • Industry Trends for BigData
    • The Challenge to Become Data-Driven
    • The Enterprise Data Cloud
    • CDP Overview
    • CDP Form Factors
  • Module 2 Title CDP Private Cloud Base Installation
  • Module 2 Content
    • Installation Overview
    • Cloudera Manager Installation
    • CDP Runtime Overview
    • Cloudera Manager Introduction
  • Module 3 Title Cluster Configuration
  • Module 3 Content
    • Overview
    • Configuration Settings
    • Modifying Service Configurations
    • Configuration Files
    • Managing Role Instances
    • Adding New Services
    • Adding and Removing Hosts
  • Module 4 Title Data Storage
  • Module 4 Content
    • Overview
    • HDFS Topology and Roles
    • HDFS Performance and Fault Tolerance
    • HDFS and Hadoop Security Overview
    • Working with HDFS
    • HBase Overview
    • Kudu Overview
    • Cloud Storage Overview
  • Module 5 Title Data Ingest
  • Module 5 Content
    • Data Ingest Overview
    • File Formats
    • Ingesting Data using File Transfer or REST Interfaces
    • Importing Data from Relational Databases with Apache Sqoop
    • Ingesting Data Using NiFi
    • Best Practices for Importing Data
  • Module 6 Title Data Flow
  • Module 6 Content
    • Overview of Cloudera Flow Management and NiFi
    • NiFi Architecture
    • Cloudera Edge Flow Management and MiNiFi
    • Controller Services
    • Apache Kafka Overview
    • Apache Kafka Cluster Architecture
    • Apache Kafka Command Line Tools
  • Module 7 Title Data Access and Discovery
  • Module 7 Content
    • Apache Hive
    • Apache Impala
    • Apache Impala Tuning
    • Search Overview
    • Hue Overview
    • Managing and Configuring Hue
    • Hue Authentication and Authorization
    • CDSW Overview
  • Module 8 Title Data Compute
  • Module 8 Content
    • YARN Overview
    • Running Applications on YARN
    • Viewing YARN Applications
    • YARN Application Logs
    • MapReduce Applications
    • YARN Memory and CPU Settings
    • Tez Overview
    • Hive on Tez
    • ACID for Hive
    • Spark Overview
    • How Spark Applications Run on YARN
    • Monitoring Spark Applications
    • Phoenix Overview
  • Module 9 Title Managing Resources
  • Module 9 Content
    • Configuring cgroups with CPU Scheduling
    • The Capacity Scheduler
    • Managing Queues
      Advanced Cluster Configuration
    • Impala Query Scheduling
    • Planning Your Cluster
    • General Planning Considerations
    • Choosing the Right Hardware
    • Network Considerations
    • CDP Private Cloud Considerations
    • Configuring Nodes
  • Module 10 Title Advanced Cluster Configuration
  • Module 10 Content
    • Configuring Service Ports
    • Tuning HDFS and MapReduce
    • Managing Cluster Growth
    • Erasure Coding
    • Enabling HDFS High Availability
  • Module 11 Title Cluster Maintenance
  • Module 11 Content
    • Checking HDFS Status
    • Copying Data Between Clusters
    • Rebalancing Data in HDFS
    • HDFS Directory Snapshots
    • Host Maintenance
    • Upgrading a Cluster
  • Module 12 Title Cluster Monitoring
  • Module 12 Content
    • Cloudera Manager Monitoring Features
    • Health Tests
    • Events and Alerts
    • Charts and Reports
    • Monitoring Recommendations
  • Module 13 Title Cluster Troubleshooting
  • Module 13 Content
    • Overview
    • Troubleshooting Tools
    • Misconfiguration Examples
  • Module 14 Title Security
  • Module 14 Content
    • Data Governance with SDX
    • Hadoop Security Concepts
    • Hadoop Authentication Using Kerberos
    • Hadoop Authorization
    • Hadoop Encryption
    • Securing a Hadoop Cluster
    • Apache Ranger
    • Apache Atlas
    • Backup and Recovery
  • Module 15 Title Private Cloud / Public Cloud
  • Module 15 Content
    • CDP Overview
    • Private Cloud Capabilities
    • Public Cloud Capabilities
    • What is Kubernetes?
    • WXM Overview
    • Auto-scaling
  • Module 16 Content
  • Module 17 Content
  • Module 18 Content
  • Module 19 Content
  • Module 20 Content
  • Module 21 Content
  • Module 22 Content
  • Module 23 Content
  • Module 24 Content
  • Module 25 Content
  • Module 26 Content
  • Module 27 Content
  • Module 28 Content
  • Module 29 Content
  • Module 30 Content
  • Module 31 Content
  • Module 32 Content
  • Module 33 Content
  • Module 34 Content
  • Module 35 Content
  • Module 36 Content
  • Module 37 Content
  • Module 38 Content
  • Module 39 Content
  • Module 40 Content
  • Module 41 Content
  • Module 42 Content
  • Module 43 Content
  • Module 44 Content
  • Module 45 Content
  • Module 46 Content
  • Module 47 Content
  • Module 48 Content
  • Module 49 Content
  • Module 50 Content
RM13,600.00(+RM1,088.00 Tax)
* Training Dates:

This four-day workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges.

 

Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment.

Additional Info

  • Certification Course only
  • Course Code CDST
  • Price RM13600
  • Exam Price Exclude
  • Duration 4 Days
  • Principals Cloudera
  • Schedule

    11-14 Mar 2024

    4-7 Jun 2024

    16-19 Dec 2024

  • Audience
  • Prerequisities
  • At Course Completion

    The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful.

    Workshop participants should have a basic understanding of Python or R and some experience exploring and analyzing data and developing statistical or machine learning models. Knowledge of Hadoop or Spark is not required.

     

  • Module 1 Title Overview of data science and machine learning at scale
  • Module 1 Content
  • Module 2 Title Overview of the Hadoop ecosystem
  • Module 2 Content
  • Module 3 Title Working with HDFS data and Hive tables using Hue
  • Module 3 Content
  • Module 4 Title Introduction to Cloudera Data Science Workbench
  • Module 4 Content
  • Module 5 Title Overview of Apache Spark 2
  • Module 5 Content
  • Module 6 Title Reading and writing data
  • Module 6 Content
  • Module 7 Title Inspecting data quality
  • Module 7 Content
  • Module 8 Title Cleansing and transforming data
  • Module 8 Content
  • Module 9 Title Summarizing and grouping data
  • Module 9 Content
  • Module 10 Title Combining, splitting, and reshaping data
  • Module 10 Content
  • Module 11 Title Exploring data
  • Module 11 Content
  • Module 12 Title Configuring, monitoring, and troubleshooting Spark applications
  • Module 12 Content
  • Module 13 Title Overview of machine learning in Spark MLlib
  • Module 13 Content
  • Module 14 Title Extracting, transforming, and selecting features
  • Module 14 Content
  • Module 15 Title Building and evaluating regression models
  • Module 15 Content
  • Module 16 Title Building and evaluating classification models
  • Module 16 Content
  • Module 17 Title Building and evaluating clustering models
  • Module 17 Content
  • Module 18 Title Cross-validating models and tuning hyperparameters
  • Module 18 Content
  • Module 19 Title Building machine learning pipelines
  • Module 19 Content
  • Module 20 Title Deploying machine learning models
  • Module 20 Content
  • Module 21 Content
  • Module 22 Content
  • Module 23 Content
  • Module 24 Content
  • Module 25 Content
  • Module 26 Content
  • Module 27 Content
  • Module 28 Content
  • Module 29 Content
  • Module 30 Content
  • Module 31 Content
  • Module 32 Content
  • Module 33 Content
  • Module 34 Content
  • Module 35 Content
  • Module 36 Content
  • Module 37 Content
  • Module 38 Content
  • Module 39 Content
  • Module 40 Content
  • Module 41 Content
  • Module 42 Content
  • Module 43 Content
  • Module 44 Content
  • Module 45 Content
  • Module 46 Content
  • Module 47 Content
  • Module 48 Content
  • Module 49 Content
  • Module 50 Content
RM13,600.00(+RM1,088.00 Tax)
* Training Dates:

Individuals who earn the CCA Administrator certification have demonstrated the core systems and cluster administrator skills sought by companies and organizations deploying Cloudera in the enterprise.

  • Number of Questions: 8–12 performance-based (hands-on) tasks on pre-configured Cloudera Enterprise cluster. 
  • Time Limit: 120 minutes
  • Passing Score: 70%
  • Language: English

Additional Info

  • Certification Certificate only
  • Price RM1327.50
  • Exam Price Exclude
  • Exam Code CCA-131
  • Duration 0.5 Day
  • CertificationInfo Cloudera Certified Associate (CCA) Administrator
  • Principals Cloudera
  • Audience

    There are no prerequisites for taking any Cloudera certification exam; however, a background in system administration, or equivalent training is strongly recommended. The CCA Administrator exam (CCA131) follows the same objectives as Cloudera Administrator Training and the training course is an excellent part of preparation for the exam.

  • Prerequisities

    Install

    Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.

     

    • Set up a local CDH repository
    • Perform OS-level configuration for Hadoop installation
    • Install Cloudera Manager server and agents
    • Install CDH using Cloudera Manager
    • Add a new node to an existing cluster
    • Add a service using Cloudera Manager

     

    Configure

    Perform basic and advanced configuration needed to effectively administer a Hadoop cluster

    • Configure a service using Cloudera Manager
    • Create an HDFS user's home directory
    • Configure NameNode HA
    • Configure ResourceManager HA
    • Configure proxy for Hiveserver2/Impala

    Manage

    Maintain and modify the cluster to support day-to-day operations in the enterprise

    • Rebalance the cluster
    • Set up alerting for excessive disk fill
    • Define and install a rack topology script
    • Install new type of I/O compression library in cluster
    • Revise YARN resource assignment based on user feedback
    • Commission/decommission a node

    Secure

    Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices

    • Configure HDFS ACLs
    • Install and configure Sentry
    • Configure Hue user authorization and authentication
    • Enable/configure log and query redaction
    • Create encrypted zones in HDFS

    Test

    Benchmark the cluster operational metrics, test system configuration for operation and efficiency

    • Execute file system commands via HTTPFS
    • Efficiently copy data within a cluster/between clusters
    • Create/restore a snapshot of an HDFS directory
    • Get/set ACLs for a file or directory structure
    • Benchmark the cluster (I/O, CPU, network)

    Troubleshoot

    Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios

    • Resolve errors/warnings in Cloudera Manager
    • Resolve performance problems/errors in cluster operation
    • Determine reason for application failure
    • Configure the Fair Scheduler to resolve application delays
RM1,252.36(+RM100.19 Tax)

A CCA Spark and Hadoop Developer has proven their core skills to ingest, transform, and process data using Apache Spark and core Cloudera Enterprise tools.

  • Number of Questions: 8–12 performance-based (hands-on) tasks on Cloudera Enterprise cluster. See below for full cluster configuration
  • Time Limit: 120 minutes
  • Passing Score: 70%
  • Language: Englis

Additional Info

  • Certification Certificate only
  • Price RM1327.50
  • Exam Price Exclude
  • Exam Code CCA-175
  • Duration 0.5 Day
  • CertificationInfo Cloudera Certified Associate (CCA) Spark and Hadoop Developer
  • Principals Cloudera
  • Audience

    There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training for Spark and Hadoop and the training course is an excellent preparation for the exam. 

  • Prerequisities

    Data Ingest

    The skills to transfer data between external systems and your cluster. This includes the following:

    • Import data from a MySQL database into HDFS using Sqoop

    • Export data to a MySQL database from HDFS using Sqoop

    • Change the delimiter and file format of data during import using Sqoop

    • Ingest real-time and near-real-time streaming data into HDFS

    • Process streaming data as it is loaded onto the cluster

    • Load data into and out of HDFS using the Hadoop File System commands

    Transform, Stage, and Store

    Convert a set of data values in a given format stored in HDFS into new data values or a new data format and write them into HDFS.

    • Load RDD data from HDFS for use in Spark applications

    • Write the results from an RDD back into HDFS using Spark

    • Read and write files in a variety of file formats

    • Perform standard extract, transform, load (ETL) processes on data

    Data Analysis

    Use Spark SQL to interact with the metastore programmatically in your applications. Generate reports by using queries against loaded data.

    • Use metastore tables as an input source or an output sink for Spark applications

    • Understand the fundamentals of querying datasets in Spark

    • Filter data using Spark

    • Write queries that calculate aggregate statistics

    • Join disparate datasets using Spark

    • Produce ranked or sorted data

    Configuration

    This is a practical exam and the candidate should be familiar with all aspects of generating a result, not just writing code.

    • Supply command-line options to change your application configuration, such as increasing available memory
RM1,252.36(+RM100.19 Tax)

A CCA Data Analyst has proven their core analyst skills to load, transform, and model Hadoop data in order to define relationships and extract meaningful results from the raw input.

You are given eight to twelve customer problems with a unique large data set, a CDH cluster, and 120 minutes. For each problem, you must implement a technical solution with a high degree of precision that meets all the requirements. You may use any tool or combination of tools on the cluster (see list below) -- you get to pick the tool(s) that are right for the job. You must possess enough knowledge to analyze the problem and arrive at an optimal approach given the time allowed. You need to know what you should do and then do it on a live cluster, including a time limit and while being watched by a proctor.

  • Number of Questions: 8–12 performance-based (hands-on) tasks on CDH 5 cluster. See below for full cluster configuration

  • Time Limit: 120 minutes

  • Passing Score: 70%

  • Language: English

Additional Info

  • Certification Certificate only
  • Price RM1327.50
  • Exam Price Exclude
  • Exam Code CCA-159
  • Duration 0.5 Day
  • CertificationInfo Cloudera Certified Associate (CCA) Data Analyst
  • Principals Cloudera
  • Audience

    Candidates for CCA Data Analyst can be SQL devlopers, data analysts, business intelligence specialists, developers, system architects, and database administrators. There are no prerequisites.

  • Prerequisities

    Prepare the Data

    Use Extract, Transfer, Load (ETL) processes to prepare data for queries.

    • Import data from a MySQL database into HDFS using Sqoop

    • Export data to a MySQL database from HDFS using Sqoop

    • Move data between tables in the metastore

    • Transform values, columns, or file formats of incoming data before analysis

    Provide Structure to the Data

    Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.

    • Create tables using a variety of data types, delimiters, and file formats

    • Create new tables using existing tables to define the schema

    • Improve query performance by creating partitioned tables in the metastore

    • Alter tables to modify existing schema

    • Create views in order to simplify queries

    Data Analysis

    Use Query Language (QL) statements in Hive and Impala to analyze data on the cluster.

    • Prepare reports using SELECT commands including unions and subqueries

    • Calculate aggregate statistics, such as sums and averages, during a query

    • Create queries against multiple data sources by using join commands

    • Transform the output format of queries by using built-in functions

    • Perform queries across a group of rows using windowing functions
RM1,252.36(+RM100.19 Tax)
* Training Dates:

Cloudera University’s four-day Data Analyst Training course will teach you to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.

Advance Your Ecosystem Expertise

Apache Hive makes transformation and analysis of complex, multi-structured data scalable in Cloudera environments. Apache Impala enables real-time interactive analysis of the data stored in Hadoop using a native SQL environment. Together, they make multi-structured data accessible to analysts, database administrators, and others without Java programming expertise.

Additional Info

  • Certification Course & Certificate
  • Course Code CDAT
  • Price RM13600
  • Exam Price Exclude
  • Exam Code CDP-4001
  • Duration 4 Days
  • CertificationInfo Cloudera Certified Data Analyst
  • Principals Cloudera
  • Schedule

    16-19 Jan 2024

    26-29 Feb 2024 (Penang Date)

    7-10 May 2024

    16-19 Jul 2024 (Penang Date)

    6-9 Aug 2024

    22-25 Oct 2024

  • Audience

    This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity. Prior knowledge of Apache Hadoop is not required.

  • Prerequisities
  • At Course Completion

    Get Certified

    Upon completion of the course, attendees are encouraged to continue their study and register for the CCA Data Analyst exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.

  • Module 1 Title Apache Hadoop Fundamentals
  • Module 1 Content
    • The Motivation for Hadoop
    • Hadoop Overview
    • Data Storage: HDFS
    • Distributed Data Processing: YARN, MapReduce, and Spark
    • Data Processing and Analysis: Hive, and Impala
    • Database Integration: Sqoop
    • Other Hadoop Data Tools
    • Exercise Scenario Explanation
       
  • Module 2 Title Introduction to Apache Hive and Impala
  • Module 2 Content
    • What Is Hive?
    • What Is Impala?
    • Why Use Hive and Impala?
    • Schema and Data Storage
    • Comparing Hive and Impala to Traditional Databases
    • Use Cases
  • Module 3 Title Querying with Apache Hive and Impala
  • Module 3 Content
    • Databases and Tables
    • Basic Hive and Impala Query Language Syntax
    • Data Types
    • Using Hue to Execute Queries
    • Using Beeline (Hive's Shell)
    • Using the Impala Shell
  • Module 4 Title Common Operators and Built-In Functions
  • Module 4 Content
    • Operators
    • Scalar Functions
    • Aggregate Functions
  • Module 5 Title Data Management
  • Module 5 Content
    • Data Storage
    • Creating Databases and Tables
    • Loading Data
    • Altering Databases and Tables
    • Simplifying Queries with Views
    • Storing Query Results
  • Module 6 Title Data Storage and Performance
  • Module 6 Content
    • Partitioning Tables
    • Loading Data into Partitioned Tables
    • When to Use Partitioning
    • Choosing a File Format
    • Using Avro and Parquet File Formats
  • Module 7 Title Working with Multiple Datasets
  • Module 7 Content
    • UNION and Joins
    • Handling NULL Values in Joins
    • Advanced Joins
  • Module 8 Title Analytic Functions and Windowing
  • Module 8 Content
    • Using Common Analytic Functions
    • Other Analytic Functions
    • Sliding Windows
  • Module 9 Title Complex Data
  • Module 9 Content
    • Complex Data with Hive
    • Complex Data with Impala
  • Module 10 Title Analyzing Text
  • Module 10 Content
    • Using Regular Expressions with Hive and Impala
    • Processing Text Data with SerDes in Hive
    • Sentiment Analysis and n-grams
  • Module 11 Title Apache Hive Optimization
  • Module 11 Content
    • Understanding Query Performance
    • Bucketing
    • Hive on Spark
  • Module 12 Title Apache Impala Optimization
  • Module 12 Content
    • How Impala Executes Queries
    • Improving Impala Performance
  • Module 13 Title Extending Apache Hive and Impala
  • Module 13 Content
    • Custom SerDes and File Formats in Hive
    • Data Transformation with Custom Scripts in Hive
    • User-Defined Functions
    • Parameterized Queries
  • Module 14 Title Choosing the Best Tool for the Job
  • Module 14 Content
    • Comparing Hive, Impala, and Relational Databases
    • Which to Choose?
  • Module 15 Title Conclusion
  • Module 15 Content
    • How Impala Executes Queries
    • Improving Impala Performance
  • Module 16 Title Extending Hive and Impala
  • Module 16 Content
    • Custom SerDes and File Formats in Hive
    • Data Transformation with Custom Scripts in Hive
    • User-Defined Functions
    • Parameterized Queries
  • Module 17 Title Choosing the Best Tool for the Job
  • Module 17 Content
    • Comparing Pig, Hive, Impala, and Relational Databases
    • Which to Choose?
  • Module 18 Title Conclusion
  • Module 18 Content
  • Module 19 Content
  • Module 20 Content
  • Module 21 Content
  • Module 22 Content
  • Module 23 Content
  • Module 24 Content
  • Module 25 Content
  • Module 26 Content
  • Module 27 Content
  • Module 28 Content
  • Module 29 Content
  • Module 30 Content
  • Module 31 Content
  • Module 32 Content
  • Module 33 Content
  • Module 34 Content
  • Module 35 Content
  • Module 36 Content
  • Module 37 Content
  • Module 38 Content
  • Module 39 Content
  • Module 40 Content
  • Module 41 Content
  • Module 42 Content
  • Module 43 Content
  • Module 44 Content
  • Module 45 Content
  • Module 46 Content
  • Module 47 Content
  • Module 48 Content
  • Module 49 Content
  • Module 50 Content
RM13,600.00(+RM1,088.00 Tax)
* Training Dates:

This four-day hands-on training course teaches the key concepts and knowledge developers need to use Apache Spark in developing high-performance, parallel applications on the Cloudera Data Platform (CDP).

Hands-on exercises allow students to practice writing Spark applications that integrate with CDP core components, such as Hive and Kafka. Participants will learn how to use Spark SQL to query structured data, use Spark Streaming to perform real-time processing on streaming data, and work with “big data” stored in a distributed file system.

After taking this course, participants will be prepared to face real-world challenges and build applications that make fast and relevant decisions, implementing interactive analysis applied to a wide variety of use cases, architectures, and industries.

Additional Info

  • Certification Course & Certificate
  • Course Code CDTSH
  • Price RM13600
  • Exam Price Exclude
  • Exam Code CDP-3001
  • Duration 4 Days
  • Principals Cloudera
  • Schedule

    20-23 Feb 2024

    4-7 Mar 2024

    2-5 Apr 2024

    2-5 Jul 2024

    7-10 Oct 2024

  • Audience

    This course is designed for developers and data engineers. 

  • Prerequisities

    All students are expected to have basic Linux experience and basic proficiency with either Python or Scala programming languages. Basic knowledge of SQL is helpful. Prior knowledge of Spark and Hadoop is not required

  • At Course Completion

    Through instructor-led discussion and interactive, hands-on exercises, you will learn how to:

    • Distribute, store, and process data in a CDP cluster
    • Write, configure, and deploy Apache Spark applications
    • Use Spark interpreters and Spark applications to explore, process, and analyze distributed data
    • Query data using Spark SQL, DataFrames, and Hive tables
    • Use Spark Streaming together with Kafka to process a data stream
  • Module 1 Title Introduction to Zeppelin
  • Module 1 Content
    • Why Notebooks?
    • Zeppelin Notes
    • Demo: Apache Spark In 5 Minutes
  • Module 2 Title HDFS Introduction
  • Module 2 Content
    • HDFS Overview
    • HDFS Components and Interactions
    • Additional HDFS Interactions
    • Ozone Overview
    • Exercise: Working with HDFS
  • Module 3 Title YARN Introduction
  • Module 3 Content
    • YARN Overview
    • YARN Components and Interaction
    • Working with YARN
    • Exercise: Working with YARN
  • Module 4 Title Distributed Processing History
  • Module 4 Content
    • The Disk Years: 2000 ->2010
    • The Memory Years: 2010 ->2020
    • The GPU Years: 2020 ->
  • Module 5 Title Working with DataFrames
  • Module 5 Content
    • Introduction to DataFrames
    • Exercise: Introducing DataFrames
    • Exercise: Reading and Writing DataFrames
    • Exercise: Working with Columns
    • Exercise: Working with Complex Types
    • Exercise: Working with Complex Types
    • Exercise: Combining and Splitting DataFrames
    •  Exercise: Summarizing and Grouping DataFrames
    • Exercise: Working with UDFs
    • Exercise: Working with Windows
  • Module 6 Title Introduction to Apache Hive
  • Module 6 Content
    • About Hive
  • Module 7 Title Hive and Spark Integration
  • Module 7 Content
    • Hive and Spark Integration
    • Exercise: Spark Integration with Hive
  • Module 8 Title Data Visualization with Zeppelin
  • Module 8 Content
    • Introduction to Data Visualization with Zeppelin
    • Zeppelin Analytics
    • Zeppelin Collaboration
    • Exercise: AdventureWorks
  • Module 9 Title Distributed Processing Challenges
  • Module 9 Content
    • Shuffle
    • Skew
    • Order
  • Module 10 Title Spark Distributed Processing
  • Module 10 Content
    • Spark Distributed Processing
    • Exercise: Explore Query Execution Order
  • Module 11 Title Spark Distributed Persistence
  • Module 11 Content
    • DataFrame and Dataset Persistence
    • Persistence Storage Levels
    • Viewing Persisted RDDs
    • Exercise: Persisting DataFrames
  • Module 12 Title Writing, Configuring, and Running Spark Applications
  • Module 12 Content
    • Writing a Spark Application
    • Building and Running an Application
    • Application Deployment Mode
    • The Spark Application Web UI
    • Configuring Application Properties
    • Exercise: Writing, Configuring, and Running a Spark Application
  • Module 13 Title Introduction to Structured Streaming
  • Module 13 Content
    • Introduction to Structured Streaming
    • Exercise: Processing Streaming Data
  • Module 14 Title Message Processing with Apache Kafka
  • Module 14 Content
    • What is Apache Kafka?
    • Apache Kafka Overview
    • Scaling Apache Kafka
    • Apache Kafka Cluster Architecture
    • Apache Kafka Command Line Tools
  • Module 15 Title Structured Streaming with Apache Kafka
  • Module 15 Content
    • Receiving Kafka Messages
    • Sending Kafka Messages
    • ·Exercise: Working with Kafka Streaming Messages
  • Module 16 Title Aggregating and Joining Streaming DataFrames
  • Module 16 Content
    • Streaming Aggregation
    • Joining Streaming DataFrames
    • Exercise: Aggregating and Joining Streaming DataFrames
  • Module 17 Title Appendix: Working with Datasets in Scala
  • Module 17 Content
    • Working with Datasets in Scala
    • Exercise: Using Datasets in Scala
  • Module 18 Content
  • Module 19 Content
  • Module 20 Content
  • Module 21 Content
  • Module 22 Content
  • Module 23 Content
  • Module 24 Content
  • Module 25 Content
  • Module 26 Content
  • Module 27 Content
  • Module 28 Content
  • Module 29 Content
  • Module 30 Content
  • Module 31 Content
  • Module 32 Content
  • Module 33 Content
  • Module 34 Content
  • Module 35 Content
  • Module 36 Content
  • Module 37 Content
  • Module 38 Content
  • Module 39 Content
  • Module 40 Content
  • Module 41 Content
  • Module 42 Content
  • Module 43 Content
  • Module 44 Content
  • Module 45 Content
  • Module 46 Content
  • Module 47 Content
  • Module 48 Content
  • Module 49 Content
  • Module 50 Content
RM13,600.00(+RM1,088.00 Tax)
* Training Dates:

PMP, Project Management Professional (PMP), CAPM, Certified Associate in Project Management (CAPM) are registered marks of the Project Management Institute, Inc.

We are using cookies to give you the best experience on our site. By continuing to use our website without changing the settings, you are agreeing to use of cookies.
Ok Decline