Skip to content

Details

(日本語下記) Hi! I am Dai from Team AI. In our actual business, pre-processing data is pretty important .

But the approaches of data pre-processing are not well organized as resource.

Especially for beginners, it is hard to choose right way of pre-processing.

In this study group, we will discuss which way is the best for each use case.

We will start your problem and solve it by the power of community.

Then we will make a useful blog post as information resources.

Our goal is to solve data scientist’s problem by organizing the technical information.

Data Cleaning — This is the first step which is implemented in Data Preprocessing. In this step, the main focus is on handling missing data, noisy data, detection, and removal of outliers, minimizing duplication and computed biases within the data.

Data Integration — This process is used when data is gathered from various data sources and data is combined together to form consistent data. This consistent data after performing data cleaning is used for analysis.

Data Transformation — This step is used to convert the raw data into a specified format according to the need of the model. The options used for transformation of data are given below:

Normalization — In this method, numerical data is converted into specified range i.e. between 0 and 1 so that scaling of data can be performed.

Aggregation — The concept can be derived from the word itself, this method is used to combine the features into one. For example combining two categories can be used to form a new category.

Generalization — In this case, lower level attributes are converted into a higher level.

Data Reduction — After the transformation and scaling of data duplication i.e. redundancy within the data is removed and organize the data in an efficient manner.

Sounds interesting? Please just come to our "Team AI Base" in Shibuya and work together. Are you a beginner? Don't worry. We will take care of you.

Let's have a fun together in building a good AI.

Members are also interested in