Pelar una naranja a la vista del cliente

Hoy en la Escuela de Hostelería de Las Palmas hemos aprendido como pelar fruta a la vista del cliente. Pinzas de camarero en una muletilla sobre un plato. Cuchillo jamonero. Tabla de cortar. Platos…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Data Preprocessing In Machine Learning

Data Preprocessing is a very important part of machine learning because indeed we build a machine learning model you always have data preprocessing face to work on.

We have to process the data in the right way so that the machine learning model that we built can be train in the right way on data and give results with high accuracy.

Step 1: Importing the libraries

Step 2: Importing the data

Step 3: Taking care of missing data

Step 4: Encoding Categorical data

Step 5: Splitting the dataset into the Training set and Test set

Step 6: Feature Scaling

NumPy is a general-purpose array-processing package. It provides a high-performance multidimensional array object, and tools for working with these arrays.

It is the fundamental package for scientific computing with Python. It contains among other thing:

Matplotlib is the library that allow us to plot very nice chart.

Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

Pandas is an open source, this library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Pandas stands for “Python Data Analysis Library”.

What’s cool about Pandas is that it takes data (like a CSV or TSV file, or a SQL database) and creates a Python object with rows and columns called data frame that looks very similar to table in a statistical software.

Importing dataset on python is done by using pandas library.

.

.

Generally we don’t have missing data in the dataset for simple reasons as it cause error when training Machine Learning Model and therefore ,we should take care of them.

There are certain ways to handle the missing data:

Scikit learn is an amazing data science library which contain large tools and many Machine Learning model. So for handling the missing data there is a class in scikit learn , i.e. Imputer . Using Imputer class can solve the problem of missing data .

An ordinal encoding involves mapping each unique label to an integer value.

The two most popular techniques are an integer encoding and a one hot encoding.

It will be difficult for the machine learning model to compute some correlation between columns you know feature and the outcome which is dependent variable and therefore have to turn there categories into number .

As such, it is sometimes referred to simply as an integer encoding.

This type of encoding is really only appropriate if there is a known relationship between the categories.

In Machine learning we usually split our data into two subsets: training data and testing data , we fit our model on the train data, in order to make predictions on the test data.

(That’s all for data preprocessing lecture. Stay tuned for further blogs.)

The training data set contains the known output and the model learns on this data in order to be generalized to other data later on. We have the test dataset in order to test our model’s prediction on this dataset or say subset .

Feature scaling in machine learning is one of the most critical steps during the preprocessing of data before creating a machine learning model.

Scaling can make a difference between a weak machine learning model and a better one.

The most common techniques of feature scaling are Normalization and Standardization.

These are the general 6 steps of preprocessing the data before using it for machine learning.

That’s all for data preprocessing in Machine Learning . Stay tuned for further blogs.

Thankyou

Implementation of data preprocessing on Breast Cancer dataset ~

Add a comment

Related posts:

The essential tools you need to manage your business finances in 2019

From how you access your financial data, to where you store it. From how you collect the money you’re owed to how you spend it. There exist a wealth of business finance tools and services that allow…

5 Best Homeopathic Medicines for Allergic Rhinitis

Allergic Rhinitis is an allergic disorder which causes inflammation of the mucous membrane lining the nasal cavity. It is an allergy in which the body starts over reacting to seemingly harmless…

Daftar Game Judi Slot Gacor Gampang Maxwin Di Situs 88CASH

Dengan banyaknya game slot gacor yang tersedia, tentu akan membuat anda bingung dalam memilih salah satu situs yang dapat memberikan keuntungan besar saat dimainkan. Nah hadirnya situs slot online…