具體包括的算法如下:
Model1Angle-basedOutlierDetector(ABOD)
Model2Cluster-basedLocalOutlierFactor(CBLOF)
Model3FeatureBagging
Model4Histogram-baseOutlierDetection(HBOS)
Model5IsolationForest
Model6KNearestNeighbors(KNN)
Model7AverageKNN
Model8MedianKNN
Model9LocalOutlierFactor(LOF)
Model10MinimumCovarianceDeterminant(MCD)
Model11One-classSVM(OCSVM)
Model12PrincipalComponentAnalysis(PCA)
這些算法主要都是無監督的方式來實現的異常離群點值檢測的方法。
同時也提供了對所有算法的比較:
其核心代碼如下:
fori,(clf_name,clf)inenumerate(classifiers.items()):
print()
print(i+1,'fitting',clf_name)
#fitthedataandtagoutliers
clf.fit(X)
scores_pred=clf.decision_function(X)*-1
y_pred=clf.predict(X)
threshold=stats.scoreatpercentile(scores_pred,
100*outliers_fraction)
n_errors=(y_pred!=ground_truth).sum()
#plotthelevelslinesandthepoints
Z=clf.decision_function(np.c_[xx.ravel(),yy.ravel()])*-1
Z=Z.reshape(xx.shape)
subplot=plt.subplot(3,4,i+1)
subplot.contourf(xx,yy,Z,levels=np.linspace(Z.min(),threshold,7),
cmap=plt.cm.Blues_r)
a=subplot.contour(xx,yy,Z,levels=[threshold],
linewidths=2,colors='red')
subplot.contourf(xx,yy,Z,levels=[threshold,Z.max()],
colors='orange')
b=subplot.scatter(X[:-n_outliers,0],X[:-n_outliers,1],c='white',
s=20,edgecolor='k')
c=subplot.scatter(X[-n_outliers:,0],X[-n_outliers:,1],c='black',
s=20,edgecolor='k')
subplot.axis('tight')
subplot.legend(
[a.collections[0],b,c],
['learneddecisionfunction','trueinliers','trueoutliers'],
prop=matplotlib.font_manager.FontProperties(size=10),
loc='lowerright')
subplot.set_xlabel("%d.%s(errors:%d)"%(i+1,clf_name,n_errors))
subplot.set_xlim((-7,7))
subplot.set_ylim((-7,7))