## Why Learn Data Science?

By reading this, you’ve already taken your first steps on the path the becoming a data scientist. Here are a few reasons to stick along!

- Data Science is fast becoming one of most sought after professions in India and around the world.
- More than 1.5 Lakh job openings for Data Scientists projected in 2020, increasing by 62% from 2019.
- Data is everywhere, it is a universal currency. Learning how to gain insights from data is an invaluable skill to have.

## What is Data Science?

Data is the new oil and Data Science is its combustion engine! While there are many definitions as to what data science really is, we have found it best to describe it as a field revolving around 5 data-related operations.

**Collection**

Data Collection is the process of gathering data (Numerical, text, video, audio etc), influenced by two major factors namely, the question that needs to be answered by the data scientist and the environment that the data scientist is working in!

**Storing**

Storing data involves maintaining the collected data for use during the data science pipeline. Structured data is typically stored in relational-databases and aggregated in data-warehouses. With the advent of Big-Data, Data Lakes are now used to store multimodal structured and unstructured data.

**Processing**

Data Processing is a set of 3 main sub-processes. Data Wrangling (Extraction, transformation, and loading of the data), Data Cleaning (Handling Missing Values, Outliers, etc) and Data Scaling, Normalization and Standardization.

**Describing**

Data Description has two aspects. Data Visualising involves representing processed data using graphs, charts, diagrams, and other visualizations. Data Summarisation involves calculating various summary statistics like the mean, median, mode, standard deviation, and variance.

**Modelling**

Statistical Modelling of data involves modelling the underlying data distribution and relations in the data and then making inferences on top of the model. Algorithmic modelling involves using large volumes of data and optimization techniques to best estimate the distribution and relations of the data, eg Machine Learning and Deep Learning.