Big Data Hadoop

Get in depth knowledge of MapReduce, HIVE, PIG, HBASE, SQOOP, FLUME and in addition get an overview of Data Visualisation tools and become a Big Data Expert

Admissions Open

Running regular batches. Contact us to get further details.

About Course

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Hadoop is popular framework for storing and processing Big Data.

Our Big Data Hadoop course is designed to help you understand the Hadoop software framework. The focus of the programme is to train on the various latest components of Hadoop like MapReduce, HIVE, PIG, HBASE, SQOOP, FLUME etc.


Features & Benefits


- 32 hours of online instuctor led training or classroom training
- Taught by Industry experts
- Learn In-demand skills and the software used in the industry
- Real world case studies
- Placement assistance

Is it good for me?

This course will make you comfortable with reviewing and working with Big Data. In addition to the Hadoop framework tools, you get introduction to Spark, Scala and data visualisation tools as part of this course.

Mode of Delivery

Online instructor led sessions and Classroom training

Duration and Schedule

Duration: 8 weeks
Duration of class: 2 hours
Days: Saturdays & Sundays

Course Fees

INR 24,500 inclusive of taxes

Course Contents

1. BigData Inroduction

2. Hadoop Introduction

3. Hadoop components

4. HDFS Introduction

5. MapReduce Introduction 

1. HDFS Design

2. Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)

3. Rack Awareness

4. Read/Write from HDFS

5. HDFS Federation and High Availability (Hadoop 2.x.x)

6. Parallel Copying using DistCp

7. HDFS Command Line Interface 

1. File Read Cycle from HDFS
- DistributedFileSystem
- FSDataInputStream

2. Failure or Error Handling When File Reading Fails

3. File Write Cycle from HDFS
- FSDataOutputStream

4. Failure or Error Handling while File write fails

1. JobTracker and TaskTracker

2. Topology Hadoop cluster

3. Example of MapReduce
- Map Function
- Reduce Function
4. Java Implementation of MapReduce

5. DataFlow of MapReduce

6. Use of Combiner 

1. How MapReduce Works

2. Anatomy of MapReduce Job (MR-1)

3. Submission & Initialization of MapReduce Job 

4. Assigning & Execution of Tasks

5. Monitoring & Progress of MapReduce Job

6. Completion of Job

7. Handling of MapReduce Job
- Task Failure
- TaskTracker Failure
- JobTracker Failure

1. Limitation of Current Architecture (Classic)

2. What are the Requirements?

3. YARN Architecture

4. JobSubmission and Job Initialization

5. Task Assignment and Task Execution

6. Progress and Monitoring of the Job

Module 7: Failure Handling in YARN
- Task Failure
- Application Master Failure
- Node Manager Failure
- Resource Manager Failure

Module 8: Apache Pig
1. What is Pig?

2. Introduction to Pig Data Flow Engine

3. Pig and MapReduce in Detail

4. When should Pig Used?

5. Pig and Hadoop Cluster

6. Pig Interpreter and MapReduce

7. Pig Relations and Data Types

8. PigLatin Example in Detail

9. Debugging and Generating Example in Apache Pig

Module 9: Apache Hive
1. What is Hive?

2. Architecture of Hive

3. Hive Services

4. Hive Clients

5. how Hive Differs from Traditional RDBMS

6. Introduction to HiveQL

7. Data Types and File Formats in Hive

8. File Encoding

9. Common problems while working with Hive

Module 10: Apache Hive Advanced
1. HiveQL

2. Managed and External Tables

3. Understand Storage Formats

4. Querying Data
- Sorting and Aggregation
- MapReduce In Query
- Joins, SubQueries and Views

5. Writing User Defined Functions (UDFs)

3. Data types and schemas

4. Querying Data

5. HiveODBC

6. User-Defined Functions

Module 11 : HBase Introduction
1. Fundamentals of HBase

2. Usage Scenario of HBase

3. Use of HBase in Search Engine

4. HBase DataModel
- Table and Row
- Column Family and Column Qualifier
- Cell and its Versioning
- Regions and Region Server

5. HBase Designing Tables

6. HBase Data Coordinates

7. Versions and HBase Operation
- Get/Scan
- Put
- Delete

Module 12 : Apache Sqoop
1. Sqoop Tutorial

2. How does Sqoop Work

3. Sqoop JDBCDriver and Connectors

4. Sqoop Importing Data

5. Various Options to Import Data
- Table Import
- Binary Data Import
- SpeedUp the Import
- Filtering Import
- Full DataBase Import Introduction to Sqoop

Module 13 : Apache Flume
1. Data Acquisition : Apache Flume Introduction

2. Apache Flume Components

3. POSIX and HDFS File Write

4. Flume Events

5. Interceptors, Channel Selectors, Sink Processor

Module 14 : Apache Oozie
1. Introduction to Oozie

2. Creating different jobs
- Workflow
- Co-ordinator
- Bundle

3. Creating and scheduling jobs for different components

Module 15: Introduction to Spark and Scala
Module 16: Introduction to Data Visualization tools
Module 17: Projects
Module 18: CV creation and Interview Preparation 


Career Options

· Chief Data Officer

· Data Analyst

· Big Data Visualizer

· Big Data Solutions Architect

· Big Data Engineer

· Big Data Researcher

· Database Manager

· Data Warehouse Manager

· Business Intelligence Analyst

· Business Data Analyst


Fill up the form below to enroll for our course.


69/6A Third Floor,
Rama Road Industrial Area
New Delhi, 110015


Email: skillingarena at skillingarena dot com                    
Phone: +91 955 591 3030  

This page was designed with Mobirise