700字范文 > 特征工程特征选择 reliefF算法

特征工程特征选择 reliefF算法

时间：2022-04-24 04:14:39

相关推荐

特征工程特征选择 reliefF算法

背景运行效果

背景

由于最近在整理一些特征工程的内容，当整理到特征选择算法的时候有reliefF算法，感觉算法挺常见的，应该在sklearn上能找到，但是找了下API，没找到，可能是太简单了？或者是名字不对？总之，有事不懂就先github，上面有人写了reliefF算法的一个实现，个人感觉实现是没问题的，使用的评价标准应该是KNN

运行效果

github原文连接

但是源代码在最后一行有问题，transform方法只返回一个特征，应该是需要全返回的

return X[:, self.top_features[self.n_features_to_keep]]

应该是

return X[:, :self.top_features[self.n_features_to_keep]]

使用sklearn上关于RFE特征选择的方法的运行示例测试：

导入环境

import matplotlib as mplimport matplotlib.pyplot as pltimport numpy as npimport sklearnimport pandas as pdimport osimport sysimport timeprint("-------运行环境如下-------")print(sys.version_info)for module in mpl, np, pd, sklearn:print(module.__name__, module.__version__)

-------运行环境如下-------

sys.version_info(major=3, minor=7, micro=7, releaselevel=‘final’, serial=0)

matplotlib 3.3.1

numpy 1.19.1

pandas 1.1.1

sklearn 0.23.2

数据集

from sklearn.datasets import make_friedman1from sklearn.feature_selection import RFEfrom sklearn.svm import SVRX, y = make_friedman1(n_samples=5000, n_features=100, random_state=0)n_features_to_keep = 10

reliefF

from sklearn.neighbors import KDTreeclass ReliefF(object):"""Feature selection using data-mined expert knowledge.Based on the ReliefF algorithm as introduced in:Kononenko, Igor et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55"""def __init__(self, n_neighbors=100, n_features_to_keep=n_features_to_keep):"""Sets up ReliefF to perform feature selection.Parameters----------n_neighbors: int (default: 100)The number of neighbors to consider when assigning feature importance scores.More neighbors results in more accurate scores, but takes longer.Returns-------None"""self.feature_scores = Noneself.top_features = Noneself.tree = Noneself.n_neighbors = n_neighborsself.n_features_to_keep = n_features_to_keepdef fit(self, X, y):"""Computes the feature importance scores from the training data.Parameters----------X: array-like {n_samples, n_features}Training instances to compute the feature importance scores fromy: array-like {n_samples}Training labelsReturns-------None"""self.feature_scores = np.zeros(X.shape[1])self.tree = KDTree(X)for source_index in range(X.shape[0]):distances, indices = self.tree.query(X[source_index].reshape(1, -1), k=self.n_neighbors + 1)# First match is self, so ignore itfor neighbor_index in indices[0][1:]:similar_features = X[source_index] == X[neighbor_index]label_match = y[source_index] == y[neighbor_index]# If the labels match, then increment features that match and decrement features that do not match# Do the opposite if the labels do not matchif label_match:self.feature_scores[similar_features] += 1.self.feature_scores[~similar_features] -= 1.else:self.feature_scores[~similar_features] += 1.self.feature_scores[similar_features] -= 1.self.top_features = np.argsort(self.feature_scores)[::-1]def transform(self, X):"""Reduces the feature set down to the top `n_features_to_keep` features.Parameters----------X: array-like {n_samples, n_features}Feature matrix to perform feature selection onReturns-------X_reduced: array-like {n_samples, n_features_to_keep}Reduced feature matrix"""return X[:, self.top_features[:self.n_features_to_keep]]rel = ReliefF()rel.fit(X, y)print(rel.top_features)

[99 36 26 27 28 29 30 31 32 33 34 35 37 98 38 39 40 41 42 43 44 45 46 47

25 24 23 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

21 48 49 50 75 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

96 97 76 74 51 73 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

70 71 72 0]

不过这里提一句，方法可能有问题，或者是我了解不够，或者是测试的数据集使用有问题，因为fit结束后feature_scores的值全是5000

RFE

estimator = SVR(kernel="linear")rfe = RFE(estimator, n_features_to_select=n_features_to_keep, step=5)rfe = rfe.fit(X, y)print(rfe.ranking_

[ 1 1 19 1 1 13 12 4 14 7 1 10 8 18 7 19 14 5 10 4 13 14 7 17

3 16 18 8 3 8 6 6 13 18 2 5 12 17 12 2 17 10 9 11 7 15 9 16

9 2 8 4 18 5 15 4 2 6 3 9 10 2 1 1 4 16 12 11 13 11 8 11

5 17 6 1 1 16 19 13 19 19 15 3 11 1 14 12 10 6 17 7 14 18 3 15