2、L1-based feature selection

3、Tree-based feature selection




SelectFromModel is a meta-transformer that can be used along with any estimator that has a coef_ or feature_importances_ attribute after fitting. The features are considered unimportant and removed, if the corresponding coef_ or feature_importances_ values are below the provided threshold parameter. Apart from specifying the threshold numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are “mean”, “median” and float multiples of these like “0.1*mean”.

SelectFromModel是一个元转换器,可以与任何在拟合后具有coef_或feature_importances_属性的estimator 一起使用。如果相应的coef_或feature_importances_值低于提供的阈值参数,则认为这些特性不重要并将其删除。除了以数字方式指定阈值外,还有使用字符串参数查找阈值的内置启发式方法。可用的试探法是“平均数”、“中位数”和这些数的浮点倍数,如“0.1*平均数”。



# Author: Manoj Kumar <mks542@nyu.edu># License: BSD 3 clauseprint(__doc__)import matplotlib.pyplot as pltimport numpy as npfrom sklearn.datasets import load_bostonfrom sklearn.feature_selection import SelectFromModelfrom sklearn.linear_model import LassoCV# Load the boston dataset.X, y = load_boston(return_X_y=True)# We use the base estimator LassoCV since the L1 norm promotes sparsity of features.clf = LassoCV()# Set a minimum threshold of 0.25sfm = SelectFromModel(clf, threshold=0.25)sfm.fit(X, y)n_features = sfm.transform(X).shape[1]# Reset the threshold till the number of features equals two.# Note that the attribute can be set directly instead of repeatedly# fitting the metatransformer.while n_features > 2:sfm.threshold += 0.1X_transform = sfm.transform(X)n_features = X_transform.shape[1]# Plot the selected two features from X.plt.title("Features selected from Boston using SelectFromModel with ""threshold %0.3f." % sfm.threshold)feature1 = X_transform[:, 0]feature2 = X_transform[:, 1]plt.plot(feature1, feature2, 'r.')plt.xlabel("Feature number 1")plt.ylabel("Feature number 2")plt.ylim([np.min(feature2), np.max(feature2)])plt.show()

>>> from sklearn.svm import LinearSVC>>> from sklearn.datasets import load_iris>>> from sklearn.feature_selection import SelectFromModel>>> X, y = load_iris(return_X_y=True)>>> X.shape(150, 4)>>> lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)>>> model = SelectFromModel(lsvc, prefit=True)>>> X_new = model.transform(X)>>> X_new.shape(150, 3)

>>> from sklearn.ensemble import ExtraTreesClassifier>>> from sklearn.datasets import load_iris>>> from sklearn.feature_selection import SelectFromModel>>> X, y = load_iris(return_X_y=True)>>> X.shape(150, 4)>>> clf = ExtraTreesClassifier(n_estimators=50)>>> clf = clf.fit(X, y)>>> clf.feature_importances_ array([ 0.04..., 0.05..., 0.4..., 0.4...])>>> model = SelectFromModel(clf, prefit=True)>>> X_new = model.transform(X)>>> X_new.shape(150, 2)



class SelectFromModel Found at: sklearn.feature_selection.from_modelclass SelectFromModel(BaseEstimator, SelectorMixin, MetaEstimatorMixin):"""Meta-transformer for selecting features based on importance weights... versionadded:: 0.17Parameters----------estimator : objectThe base estimator from which the transformer is built.This can be both a fitted (if ``prefit`` is set to True)or a non-fitted estimator. The estimator must have either a``feature_importances_`` or ``coef_`` attribute after fitting.threshold : string, float, optional default NoneThe threshold value to use for feature selection. Features whoseimportance is greater or equal are kept while the others arediscarded. If "median" (resp. "mean"), then the ``threshold`` value isthe median (resp. the mean) of the feature importances. A scalingfactor (e.g., "1.25*mean") may also be used. If None and if theestimator has a parameter penalty set to l1, either explicitlyor implicitly (e.g, Lasso), the threshold used is 1e-5.Otherwise, "mean" is used by default.prefit : bool, default FalseWhether a prefit model is expected to be passed into the constructordirectly or not. If True, ``transform`` must be called directlyand SelectFromModel cannot be used with ``cross_val_score``,``GridSearchCV`` and similar utilities that clone the estimator.Otherwise train the model using ``fit`` and then ``transform`` to dofeature selection.norm_order : non-zero int, inf, -inf, default 1Order of the norm used to filter the vectors of coefficients below``threshold`` in the case where the ``coef_`` attribute of theestimator is of dimension 2.Attributes----------estimator_ : an estimatorThe base estimator from which the transformer is built.This is stored only when a non-fitted estimator is passed to the``SelectFromModel``, i.e when prefit is False.threshold_ : floatThe threshold value used for feature selection."""def __init__(self, estimator, threshold=None, prefit=False, norm_order=1):self.estimator = estimatorself.threshold = thresholdself.prefit = prefitself.norm_order = norm_orderdef _get_support_mask(self):# SelectFromModel can directly call on transform.if self.prefit:estimator = self.estimatorelif hasattr(self, 'estimator_'):estimator = self.estimator_else:raise ValueError('Either fit SelectFromModel before transform or set "prefit=''True" and pass a fitted estimator to the constructor.')scores = _get_feature_importances(estimator, self.norm_order)threshold = _calculate_threshold(estimator, scores, self.threshold)return scores >= thresholddef fit(self, X, y=None, **fit_params):"""Fit the SelectFromModel meta-transformer.Parameters----------X : array-like of shape (n_samples, n_features)The training input samples.y : array-like, shape (n_samples,)The target values (integers that correspond to classes inclassification, real numbers in regression).**fit_params : Other estimator specific parametersReturns-------self : objectReturns self."""if self.prefit:raise NotFittedError("Since 'prefit=True', call transform directly")self.estimator_ = clone(self.estimator)self.estimator_.fit(X, y, **fit_params)return self@propertydef threshold_(self):scores = _get_feature_importances(self.estimator_, self.norm_order)return _calculate_threshold(self.estimator, scores, self.threshold)@if_delegate_has_method('estimator')def partial_fit(self, X, y=None, **fit_params):"""Fit the SelectFromModel meta-transformer only once.Parameters----------X : array-like of shape (n_samples, n_features)The training input samples.y : array-like, shape (n_samples,)The target values (integers that correspond to classes inclassification, real numbers in regression).**fit_params : Other estimator specific parametersReturns-------self : objectReturns self."""if self.prefit:raise NotFittedError("Since 'prefit=True', call transform directly")if not hasattr(self, "estimator_"):self.estimator_ = clone(self.estimator)self.estimator_.partial_fit(X, y, **fit_params)return self
