Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy
Every once in a while you read a paper with a title that leans more towards an abstract because it is so long and descriptive. That is this paper.
mRMR is a feature selection technique.
Feature selection is an important problem for pattern classification systems.
We study how to select good features according to the maximal statistical dependency criterion based on mutual information.
This is a bit of a long paper I will just scratch the surface of. This is the essence of the paper though. Maximal statistical dependency: selecting features that you can statistically depend on (see what i did there) being useful for informing you about the response variable. Suppose you have 1000 tabular features, how could you possibly search the space to determine the top 10 features that not only contribute to the response variable, but also are not co-dependent features? mRMR is the answer.
we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR)
- minimal redundancy because you want to spread out the types of information you select.
- maximal relevance because you want features that are relevant.
In feature selection, it has been recognized that the combinations of individually good features do not necessarily lead to good classification performance.
This motivates the papers investigation of not only maximal relevance, but minimal redundancy.
Note that mRMR as a framework is only a heuristic, and there is a balancing act to determine which way to lean. This paper is
- investigating the theoretical analysis of mRMR
- combining mRMR with other features selection techniques
- comprehensive experimentation of different feature selection techniques.
mRMR Balancing
Finding the optimal relevance () save the minimum redundancy is computed below:
Conclusion
A main gist of the paper is a thorough analysis of mRMR and the best ways to use it. I didn’t cover much of the details so check out the paper if interested. I have been primarily concerned with understanding mRMR and some intuition behind its uses.