Data Science

Become an expert in gathering, collating and analysing data. Learn statistical analysis and visualisation tools. Get in depth knowledge of working with huge amounts of data using Hadoop framework tools

Admissions Open

Running regular batches. Contact us to get further details.

About Course

Data science is a field of scientific methods, processes, algorithms and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.

Our Data Science course is designed to help you understand the ways to collect and analyse data. You will learn techniques to get useful insights from data. In addition to the statistical data analysis, you will learn Hadoop framework tools and get an introduction to Spark, Scala and data visualisation tools as part of this course.


Features & Benefits


- 72 hours of online instuctor led training or classroom training
- Taught by Industry experts
- Learn In-demand skills and the software used in the industry
- Real world case studies
- Placement assistance

Is it good for me?

This course will make you comfortable with reviewing and working with large amounts data.  Data Scientists are sought after and are in high demand. Doing this course will help you get you ready for most desirable IT job today.

Mode of Delivery

Online instructor led sessions and Classroom training

Duration and Schedule

Duration: 18 weeks
Duration of class: 2 hours
Days: Saturdays & Sundays

Course Fees

INR 55,500 inclusive of taxes

Course Contents

1. Introduction to Statistics and Business Analytics - Sample Vs. Population, Variables and Types of Data, Primary & Secondary Data, Data Collection and Sampling Techniques

1. Descriptive Statistics - Measure of Central Tendency - Mean, Median, Mode, Measure of Variance - Range, Inter Quartile Range, Variance & Standard Deviation, Coefficient of Variation, Dispersion, Kurtosis, Skewness, Chebyshev's Theorem, Measures of Positions - Percentile, Deciles, Quartiles.

2. Introduction to Random Variables (Discrete and Continuous Random Variables), Exploratory Data Analysis, Frequency Tables and Frequency Distributions, Type of Graphs

3. Inferential Statistics & Hypothesis Testing - Formulation of Hypothesis Statement, p-value, Type I and Type II Errors, Z-Test, t-Test, Chi-Square Test

4. Introduction to Statistical Estimation and Confidence Interval

5. Probability Theory - Introduction to Probability Theory & Counting Rules

6. Probability Distributions - Discrete and Cumulative Probability Distribution, Sampling Distribution, Binomial Distribution, Standard Normal Distribution, Poisson Distribution.

7. Chi-Square, F- Distribution, and ANOVA (One - Way and Two-Way ANOVA)

8. Correlation and OLS/Multiple Regression (Logistic and Linear Regression) 

1. Statistical Analysis using Excel

2. Introduction to R software, and Statistical Analysis using R

3. Introduction to Tableau & its application in Analytics

4. Introduction to Python and its application in Analytics

Module 1: Introduction to BigData, Hadoop (HDFS and MapReduce) 

1. BigData Inroduction

2. Hadoop Introduction

3. Hadoop components

4. HDFS Introduction

5. MapReduce Introduction 

Module 2: Deep Dive in HDFS

1. HDFS Design

2. Fundamental of HDFS (Blocks, NameNode, DataNode, Secondary Name Node)

3. Rack Awareness

4. Read/Write from HDFS

5. HDFS Federation and High Availability (Hadoop 2.x.x)

6. Parallel Copying using DistCp

7. HDFS Command Line Interface 

Module 3: HDFS File Operation Lifecycle

1. File Read Cycle from HDFS
- DistributedFileSystem
- FSDataInputStream

2. Failure or Error Handling When File Reading Fails

3. File Write Cycle from HDFS
- FSDataOutputStream

4. Failure or Error Handling while File write fails

Module 4: Understanding MapReduce

1. JobTracker and TaskTracker

2. Topology Hadoop cluster

3. Example of MapReduce
- Map Function
- Reduce Function
4. Java Implementation of MapReduce

5. DataFlow of MapReduce

6. Use of Combiner

Module 5: MapReduce Internals

1. How MapReduce Works

2. Anatomy of MapReduce Job (MR-1)

3. Submission & Initialization of MapReduce Job

4. Assigning & Execution of Tasks

5. Monitoring & Progress of MapReduce Job

6. Completion of Job

7. Handling of MapReduce Job
- Task Failure
- TaskTracker Failure
- JobTracker Failure  

Module 6:  MapReduce-2 (YARN : Yet Another Resource Negotiator Hadoop 2.x.x )

1. Limitation of Current Architecture (Classic)

2. What are the Requirements?

3. YARN Architecture

4. JobSubmission and Job Initialization

5. Task Assignment and Task Execution

6. Progress and Monitoring of the Job

Module 7: Failure Handling in YARN
- Task Failure
- Application Master Failure
- Node Manager Failure
- Resource Manager Failure

Module 8: Apache Pig
1. What is Pig?

2. Introduction to Pig Data Flow Engine

3. Pig and MapReduce in Detail

4. When should Pig Used?

5. Pig and Hadoop Cluster

6. Pig Interpreter and MapReduce

7. Pig Relations and Data Types

8. PigLatin Example in Detail

9. Debugging and Generating Example in Apache Pig

Module 9: Apache Hive
1. What is Hive?

2. Architecture of Hive

3. Hive Services

4. Hive Clients

5. how Hive Differs from Traditional RDBMS

6. Introduction to HiveQL

7. Data Types and File Formats in Hive

8. File Encoding

9. Common problems while working with Hive

Module 10: Apache Hive Advanced
1. HiveQL

2. Managed and External Tables

3. Understand Storage Formats

4. Querying Data
- Sorting and Aggregation
- MapReduce In Query
- Joins, SubQueries and Views

5. Writing User Defined Functions (UDFs)

3. Data types and schemas

4. Querying Data

5. HiveODBC

6. User-Defined Functions

Module 11 : HBase Introduction
1. Fundamentals of HBase

2. Usage Scenario of HBase

3. Use of HBase in Search Engine

4. HBase DataModel
- Table and Row
- Column Family and Column Qualifier
- Cell and its Versioning
- Regions and Region Server

5. HBase Designing Tables

6. HBase Data Coordinates

7. Versions and HBase Operation
- Get/Scan
- Put
- Delete

Module 12 : Apache Sqoop
1. Sqoop Tutorial

2. How does Sqoop Work

3. Sqoop JDBCDriver and Connectors

4. Sqoop Importing Data

5. Various Options to Import Data
- Table Import
- Binary Data Import
- SpeedUp the Import
- Filtering Import
- Full DataBase Import Introduction to Sqoop

Module 13 : Apache Flume
1. Data Acquisition : Apache Flume Introduction

2. Apache Flume Components

3. POSIX and HDFS File Write

4. Flume Events

5. Interceptors, Channel Selectors, Sink Processor

Module 14 : Apache Oozie
1. Introduction to Oozie

2. Creating different jobs
- Workflow
- Co-ordinator
- Bundle

3. Creating and scheduling jobs for different components

Module 15: Introduction to Spark and Scala
Module 16: Introduction to Data Visualization tools
Module 17: Projects
Module 18: CV creation and Interview Preparation


Career Options

· Chief Data Officer

· Data Scientist

· Data Analyst

· Big Data Visualizer

· Business Intelligence Analyst

· Business Data Analyst


Fill up the form below to enroll for our course.


69/6A Third Floor,
Rama Road Industrial Area
New Delhi, 110015


Email: skillingarena at skillingarena dot com                    
Phone: +91 955 591 3030  

Mobirise site builder - Click for more