Feature Selection

An Introduction to Variable and Feature Selection

This paper provides an introduction to the topic of variable and feature selection, which has become increasingly important in fields with high-dimensional datasets such as text processing, gene expression analysis, and combinatorial chemistry. The authors discuss the objectives of variable selection, which include improving prediction performance, reducing measurement and storage requirements, and gaining a better understanding of the underlying processes. The paper covers a range of aspects related to these problems, including defining objective functions, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment. The authors also provide a checklist of steps that can be taken to solve a feature selection problem.

Filter Methods for Feature Selection – A Comparative Study

This paper presents a comparative study of several filter methods for feature selection, including ReliefF, Correlation-based Feature Selection (CFS), Fast Correlated Based Filter (FCBF), and INTERACT. The authors applied these filter methods to synthetic datasets with varying numbers of relevant features, levels of noise in the output, feature interactions, and sample sizes. The goal was to determine the effectiveness of each filter method under different conditions and to identify the best filter method to use as part of a hybrid feature selection approach.

Penalized feature selection and classification in bioinformatics

This paper provides a review of several recently developed penalized feature selection and classification techniques for bioinformatics studies with high-dimensional input variables. The authors discuss classification objective functions, penalty functions, and computational algorithms for these embedded feature selection methods. The goal is to make researchers aware of these applicable techniques for high-dimensional bioinformatics data, which can help avoid overfitting, generate more reliable classifiers, and provide insights into underlying causal relationships.

Feature Engineering

This paper provides an overview of feature engineering, which is an essential step in the machine learning process. Feature engineering aims to transform raw data into meaningful features that can improve the performance of machine learning models, in terms of both accuracy and interpretability. The paper discusses various univariate and multivariate feature engineering techniques, including transformations, dimensionality reduction methods, and representation learning approaches. It also covers feature engineering for structured, time series, and unstructured data. The success of machine learning often depends heavily on the success of feature engineering, and there is no single “gold standard” set of techniques, so it is important to experiment with different approaches.