# 人工智能數據準備

## 預處理數據

``````import numpy as np
sklearn import preprocessing``````

• NumPy - 基本上NumPy是一種通用的數組處理軟件包，設計用於高效處理任意記錄的大型多維數組而不犧牲小型多維數組的速度。
• sklearn.preprocessing - 此包提供了許多常用的實用函數和變換器類，用於將原始特徵向量更改爲更適合機器學習算法的表示形式。

``````input_data = np.array([2.1, -1.9, 5.5],
[-1.5, 2.4, 3.5],
[0.5, -7.9, 5.6],
[5.9, 2.3, -5.8]])``````

## 數據預處理技術

``````data_binarized = preprocessing.Binarizer(threshold = 0.5).transform(input_data)
print("\nBinarized data:\n", data_binarized)``````

``````[[ 1. 0. 1.]
[ 0. 1. 1.]
[ 0. 0. 1.]
[ 1. 1. 0.]]``````

``````print("Mean = ", input_data.mean(axis = 0))
print("Std deviation = ", input_data.std(axis = 0))``````

``````Mean = [ 1.75       -1.275       2.2]
Std deviation = [ 2.71431391  4.20022321  4.69414529]``````

``````data_scaled = preprocessing.scale(input_data)
print("Mean =", data_scaled.mean(axis=0))
print("Std deviation =", data_scaled.std(axis = 0))``````

``````Mean = [ 1.11022302e-16 0.00000000e+00 0.00000000e+00]
Std deviation = [ 1.             1.             1.]``````

``````data_scaler_minmax = preprocessing.MinMaxScaler(feature_range=(0,1))
data_scaled_minmax = data_scaler_minmax.fit_transform(input_data)
print ("\nMin max scaled data:\n", data_scaled_minmax)``````

``````[ [ 0.48648649  0.58252427   0.99122807]
[   0.          1.           0.81578947]
[   0.27027027  0.           1.        ]
[   1.          0. 99029126  0.        ]]``````

### 正常化

L1標準化

``````# Normalize data
data_normalized_l1 = preprocessing.normalize(input_data, norm = 'l1')
print("\nL1 normalized data:\n", data_normalized_l1)``````

``````L1 normalized data:
[[ 0.22105263  -0.2          0.57894737]
[ -0.2027027    0.32432432   0.47297297]
[  0.03571429  -0.56428571   0.4       ]
[  0.42142857   0.16428571  -0.41428571]]``````

L2標準化

``````# Normalize data
data_normalized_l2 = preprocessing.normalize(input_data, norm = 'l2')
print("\nL2 normalized data:\n", data_normalized_l2)``````

``````L2 normalized data:
[[ 0.33946114  -0.30713151   0.88906489]
[ -0.33325106   0.53320169   0.7775858 ]
[  0.05156558  -0.81473612   0.57753446]
[  0.68706914   0.26784051  -0.6754239 ]]``````

## 標記數據

``````import numpy as np
from sklearn import preprocessing``````

``````# Sample input labels
input_labels = ['red','black','red','green','black','yellow','white']``````

``````# Creating the label encoder
encoder = preprocessing.LabelEncoder()
encoder.fit(input_labels)``````

``LabelEncoder()``

``````# encoding a set of labels
test_labels = ['green','red','black']
encoded_values = encoder.transform(test_labels)
print("\nLabels =", test_labels)``````

``Labels = ['green', 'red', 'black']``

``print("Encoded values =", list(encoded_values))``

``Encoded values = [1, 2, 0]``

``````# decoding a set of values
encoded_values = [3,0,4,1]
decoded_list = encoder.inverse_transform(encoded_values)
print("\nEncoded values =", encoded_values)``````

``````Encoded values = [3, 0, 4, 1]
print("\nDecoded labels =", list(decoded_list))``````

``Decoded labels = ['white', 'black', 'yellow', 'green']``