Feature selection methods have been issued in the context of data classification due to redundant and irrelevant features. The above features slow the overall system performance, and wrong decisions are more likely to be made with extensive data sets. Several methods have been used to solve the feature selection problem for classification, but most are specific to be used only for a particular data set. Thus, this paper proposes wide-ranging approaches to solve maximum feature selection problems for data sets. The proposed algorithm analytically chooses the optimal feature for classification by utilizing mutual information (MI) and linear correlation coefficients (LCC). It considers linearly and nonlinearly dependent data features for the same. The proposed feature selection algorithm suggests various features used to build a substantial feature subset for classification, effectively reducing irrelevant features. Three different datasets are used to evaluate the performance of the proposed algorithm with classifiers which requires a higher degree of features to have better accuracy and a lower computational cost. We considered probability value (p value <0.05) for feature selection in experiments on different data sets, then the number of features is selected (such as 7, 5, and 6 features from mobile, heart, and diabetes data set, respectively). Various accuracy is considered with different classifiers; for example, classifier Nearest_Neighbors made accuracy such as 0.92225, 0.88333, 0.86250 for mobile, heart, and diabetes data sets, respectively. The proposed model is adequate as per the evaluation of several real-world data sets.
All Science Journal Classification (ASJC) codes
- Media Technology
- Hardware and Architecture
- Computer Networks and Communications