Training is not a commodity – all training centres are not the same. Iverson Associates Sdn Bhd is the most established, the most reputable, and the top professional IT training provider in Malaysia. With a large pool of experienced and certified trainers, state-of-the-art facilities, and well-designed courseware, Iverson offers superior training, a more impactful learning experience and highly effective results.
At Iverson, our focus is on providing high-quality IT training to corporate customers, meeting their learning needs and helping them to achieve their training objectives. Iverson has the flexibility to provide training solutions whether for a single individual or the largest corporation in a well-paced or accelerated training programme.
Our courses continue to evolve along with the fast-changing technological advances. Our instructor-led training services are available on a public and a private (in-company) basis. Some of our courses are also available as online, on demand, and hybrid training.
This four-day workshop covers data science and machine learning workflows at scale using Apache Spark 2 and other key components of the Hadoop ecosystem. The workshop emphasizes the use of data science and machine learning methods to address real-world business challenges.
Using scenarios and datasets from a fictional technology company, students discover insights to support critical business decisions and develop data products to transform the business. The material is presented through a sequence of brief lectures, interactive demonstrations, extensive hands-on exercises, and discussions. The Apache Spark demonstrations and exercises are conducted in Python (with PySpark) and R (with sparklyr) using the Cloudera Data Science Workbench (CDSW) environment.
Available upon request
The workshop is designed for data scientists who currently use Python or R to work with smaller datasets on a single machine and who need to scale up their analyses and machine learning models to large datasets on distributed clusters. Data engineers and developers with some knowledge of data science and machine learning may also find this workshop useful.
Workshop participants should have a basic understanding of Python or R and some experience exploring and analyzing data and developing statistical or machine learning models. Knowledge of Hadoop or Spark is not required.
Cloudera Data Science Workbench Training prepares learners to complete exploratory data science and machine learning projects using Cloudera Data Science Workbench (CDSW).
Get hands-on experience
Through narrated demonstrations and hands-on exercises, learners gain familiarity with CDSW and develop the skills required to:
What to Expect
This course is designed for learners at organizations using CDSW under a Cloudera Enterprise license or a trial license. The learner must have access to a CDSW environment on a Cloudera cluster running Apache Spark 2. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required.
Self-paced Online
Individuals who earn the CCA Administrator certification have demonstrated the core systems and cluster administrator skills sought by companies and organizations deploying Cloudera in the enterprise.
There are no prerequisites for taking any Cloudera certification exam; however, a background in system administration, or equivalent training is strongly recommended. The CCA Administrator exam (CCA131) follows the same objectives as Cloudera Administrator Training and the training course is an excellent part of preparation for the exam.
Demonstrate an understanding of the installation process for Cloudera Manager, CDH, and the ecosystem projects.
Perform basic and advanced configuration needed to effectively administer a Hadoop cluster
Maintain and modify the cluster to support day-to-day operations in the enterprise
Enable relevant services and configure the cluster to meet goals defined by security policy; demonstrate knowledge of basic security practices
Benchmark the cluster operational metrics, test system configuration for operation and efficiency
Demonstrate ability to find the root cause of a problem, optimize inefficient execution, and resolve resource contention scenarios
A CCA Spark and Hadoop Developer has proven their core skills to ingest, transform, and process data using Apache Spark and core Cloudera Enterprise tools.
There are no prerequisites required to take any Cloudera certification exam. The CCA Spark and Hadoop Developer exam (CCA175) follows the same objectives as Cloudera Developer Training for Spark and Hadoop and the training course is an excellent preparation for the exam.
The skills to transfer data between external systems and your cluster. This includes the following:
Import data from a MySQL database into HDFS using Sqoop
Export data to a MySQL database from HDFS using Sqoop
Change the delimiter and file format of data during import using Sqoop
Ingest real-time and near-real-time streaming data into HDFS
Process streaming data as it is loaded onto the cluster
Convert a set of data values in a given format stored in HDFS into new data values or a new data format and write them into HDFS.
Load RDD data from HDFS for use in Spark applications
Write the results from an RDD back into HDFS using Spark
Read and write files in a variety of file formats
Use Spark SQL to interact with the metastore programmatically in your applications. Generate reports by using queries against loaded data.
Use metastore tables as an input source or an output sink for Spark applications
Understand the fundamentals of querying datasets in Spark
Filter data using Spark
Write queries that calculate aggregate statistics
Join disparate datasets using Spark
This is a practical exam and the candidate should be familiar with all aspects of generating a result, not just writing code.
A CCA Data Analyst has proven their core analyst skills to load, transform, and model Hadoop data in order to define relationships and extract meaningful results from the raw input.
You are given eight to twelve customer problems with a unique large data set, a CDH cluster, and 120 minutes. For each problem, you must implement a technical solution with a high degree of precision that meets all the requirements. You may use any tool or combination of tools on the cluster (see list below) -- you get to pick the tool(s) that are right for the job. You must possess enough knowledge to analyze the problem and arrive at an optimal approach given the time allowed. You need to know what you should do and then do it on a live cluster, including a time limit and while being watched by a proctor.
Number of Questions: 8–12 performance-based (hands-on) tasks on CDH 5 cluster. See below for full cluster configuration
Time Limit: 120 minutes
Passing Score: 70%
Language: English
Candidates for CCA Data Analyst can be SQL devlopers, data analysts, business intelligence specialists, developers, system architects, and database administrators. There are no prerequisites.
Use Extract, Transfer, Load (ETL) processes to prepare data for queries.
Import data from a MySQL database into HDFS using Sqoop
Export data to a MySQL database from HDFS using Sqoop
Move data between tables in the metastore
Transform values, columns, or file formats of incoming data before analysis
Use Data Definition Language (DDL) statements to create or alter structures in the metastore for use by Hive and Impala.
Create tables using a variety of data types, delimiters, and file formats
Create new tables using existing tables to define the schema
Improve query performance by creating partitioned tables in the metastore
Alter tables to modify existing schema
Create views in order to simplify queries
Use Query Language (QL) statements in Hive and Impala to analyze data on the cluster.
Prepare reports using SELECT commands including unions and subqueries
Calculate aggregate statistics, such as sums and averages, during a query
Create queries against multiple data sources by using join commands
Transform the output format of queries by using built-in functions
Cloudera University’s four-day Data Analyst Training course will teach you to apply traditional data analytics and business intelligence skills to big data. This course presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.
Advance Your Ecosystem Expertise
Apache Hive makes transformation and analysis of complex, multi-structured data scalable in Cloudera environments. Apache Impala enables real-time interactive analysis of the data stored in Hadoop using a native SQL environment. Together, they make multi-structured data accessible to analysts, database administrators, and others without Java programming expertise.
18-21 Jan 2021
8-11 Mar 2021
22-25 Mar 2021 (Penang)
17-20 May 2021
21-24 Jun 2021
6-9 Sep 2021
25-28 Oct 2021
6-9 Dec 2021 (Penang)
This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Some knowledge of SQL is assumed, as is basic Linux command-line familiarity. Prior knowledge of Apache Hadoop is not required.
Get Certified
Upon completion of the course, attendees are encouraged to continue their study and register for the CCA Data Analyst exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.
Cloudera University’s one-day Python training course will teach you the key language concepts and programming techniques you need so that you can concentrate on the subjects covered in Cloudera’s developer courses without also having to learn a complex programming language and a new programming paradigm on the fly.
12 Mar 2021
25 Jun 2021
10 Sep 2021
29 Oct 2021
Prior knowledge of Hadoop is not required. Since this course is intended for developers who do not yet have the prerequisite skills writing code in Scala, basic programming experience in at least one commonly-used programming language (ideally Java, but Ruby, Perl, Scala, C, C++, PHP, or Javascript will suffice) is assumed.
NOTE: This course does not teach Big Data concepts, nor does it cover how to use Cloudera software. Instead, it is meant as a precursor for one of our developer-focused training courses that provide those skills, such as Developer Training for Spark and Hadoop I or Developer Training for Apache Spark.
Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification
Cloudera University’s three-day training course for Apache HBase enables participants to store and access massive quantities of multi-structured data and perform hundreds of thousands of operations per second.
Advance Your Ecosystem Expertise
Apache HBase is a distributed, scalable, NoSQL database built on Apache Hadoop. HBase can store data in massive tables consisting of billions of rows and millions of columns, serve data to many users and applications in real time, and provide fast, random read/write access to users and applications.
18-20 Jan 2021
17-19 May 2021
23-25 Aug 2021
22-24 Nov 2021
This course is appropriate for developers and administrators who intend to use HBase. Prior experience with databases and data modeling is helpful, but not required. Prior knowledge of Java is helpful. Prior knowledge of Hadoop is not required, but Cloudera Developer Training for Apache Hadoop provides an excellent foundation for this course.
Take your knowledge to the next level
This four-day hands-on training course delivers the key concepts and expertise participants need to ingest and process data on a Hadoop cluster using the most up-to-date tools and techniques. Employing Hadoop ecosystem projects such as Spark (including Spark Streaming and Spark SQL), Flume, Kafka, and Sqoop, this training course is the best preparation for the real-world challenges faced by Hadoop developers. With Spark, developers can write sophisticated parallel applications to execute faster decisions, better decisions, and interactive actions, applied to a wide variety of use cases, architectures, and industries.
Get hands-on experience
Through expert-led discussion and interactive, hands-on exercises, participants will learn how to:
What to expect
This course is designed for developers and engineers who have programming experience, but prior knowledge of Hadoop is not required
Get certified
Upon completion of the course, attendees are encouraged to continue their study and register for the CCA Spark and Hadoop Developer exam. Certification is a great differentiator. It helps establish you as a leader in the field, providing employers and customers with tangible evidence of your skills and expertise.
18-21 Jan 2021
8-11 Feb 2021
3-7 May 2021
21-24 Jun 2021
26-29 Jul 2021
4-7 Oct 2021
8-11 Nov 2021
Introduction to Apache Hadoop and the Hadoop Ecosystem
Apache Spark Streaming: Introduction to DStreams
Apache Spark Streaming: Processing Multiple Batches
Apache Spark Streaming: Data Sources
Data Sources
Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification
Cloudera University’s four-day administrator training course for Apache Hadoop provides participants with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster using Cloudera Manager. From installation and configuration through load balancing and tuning, Cloudera’s training course is the best preparation for the real-world challenges faced by Hadoop administrators.
22-25 Feb 2021
15-18 Mar 2021
5-8 Apr 2021
12-15 Jul 2021
16-19 Aug 2021
11-14 Oct 2021
8-11 Nov 2021
This course is best suited to systems administrators and IT managers who have basic Linux experience. Prior knowledge of Apache Hadoop is not required.
PMP, Project Management Professional (PMP), CAPM, Certified Associate in Project Management (CAPM) are registered marks of the Project Management Institute, Inc.