# 人工智能無監督學習：聚類

## 數據聚類算法

K-Means算法
K均值聚類算法是衆所周知的數據聚類算法之一。 我們需要假設簇的數量已經是已知的。 這也被稱爲平面聚類。 它是一種迭代聚類算法。 該算法需要遵循以下步驟 -

``````import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
from sklearn.cluster import KMeans``````

``````from sklearn.datasets.samples_generator import make_blobs

X, y_true = make_blobs(n_samples = 500, centers = 4,
cluster_std = 0.40, random_state = 0)``````

``````plt.scatter(X[:, 0], X[:, 1], s = 50);
plt.show()``````

``kmeans = KMeans(n_clusters = 4)``

``````kmeans.fit(X)
y_kmeans = kmeans.predict(X)
plt.scatter(X[:, 0], X[:, 1], c = y_kmeans, s = 50, cmap = 'viridis')

centers = kmeans.cluster_centers_``````

``````plt.scatter(centers[:, 0], centers[:, 1], c = 'black', s = 200, alpha = 0.5);
plt.show()``````

• 首先，需要從分配給它們自己的集羣的數據點開始。
• 現在，它計算質心並更新新質心的位置。
• 通過重複這個過程，向簇的頂點靠近，即朝向更高密度的區域移動。
• 該算法停止在質心不再移動的階段。

``````import numpy as np
from sklearn.cluster import MeanShift
import matplotlib.pyplot as plt
from matplotlib import style
style.use("ggplot")``````

``from sklearn.datasets.samples_generator import make_blobs``

``````centers = [[2,2],[4,5],[3,10]]
X, _ = make_blobs(n_samples = 500, centers = centers, cluster_std = 1)
plt.scatter(X[:,0],X[:,1])
plt.show()``````

``````ms = MeanShift()
ms.fit(X)
labels = ms.labels_
cluster_centers = ms.cluster_centers_``````

``````print(cluster_centers)
n_clusters_ = len(np.unique(labels))
print("Estimated clusters:", n_clusters_)
[[ 3.23005036 3.84771893]
[ 3.02057451 9.88928991]]
Estimated clusters: 2``````

``````colors = 10*['r.','g.','b.','c.','k.','y.','m.']
for i in range(len(X)):
plt.plot(X[i][0], X[i][1], colors[labels[i]], markersize = 10)
plt.scatter(cluster_centers[:,0],cluster_centers[:,1],
marker = "x",color = 'k', s = 150, linewidths = 5, zorder = 10)
plt.show()``````

## 測量羣集性能

• 得分爲+1分 - 得分接近+1表示樣本距離相鄰集羣很遠。
• 得分爲0分 - 得分0表示樣本與兩個相鄰羣集之間的決策邊界處於或非常接近。
• 得分爲-1分 - 得分爲負分數表示樣本已分配到錯誤的羣集。