Think of working with a dataset with hundreds of features. Intuitively you can understand the hardship you must deal with while visualizing the dataset or training your model with it. This is because of the dimensions it would take as a higher number of features will lead to higher dimensions. And higher dimensions are not suitable for visualizing the dataset for a more intuitive understanding of the problem.
Overfitting will be another issue related to high dimensionality. This is because many of the features are somehow correlated to some fashion. Hence most of the features are redundant. For example, if you have to predict the weather where rainfall and humidity are two of the features, you can see two of them are somehow correlated. To avoid overfitting you need to reduce the features for the sake of getting better prediction accuracy.
This is where dimensionality reduction techniques come into play. This is simply reducing the dimension of your feature set. This technique allows you to find a small set of most impactful features among a large number of features. With this small set of principal features, you can run your prediction algorithms easily with better accuracy.
With less number of dimensions the space required to store the data is reduced
Training time also accelerated due to lower dimensions
Some algorithms such as Decision Tree and SVM do not perform well with higher dimensions. So, we need fewer dimensions for better accuracy with these models.
It removes the multicollinearity problem that happens due to highly correlated features in the dataset.
It reduces the complexity of visualizing the data. As you can understand that a plot is more intuitive in 2D than in a 3D form.
There are two different dimensionality reduction techniques:
Feature Selection Methods
Feature Extraction Methods
Feature selection methods use the statistical relationship of input variables to the output variable. The methods mainly find the correlations among the features and try to select the most independent features that have no colinearity. It selects the features with the highest importance.
Some of the common feature selection techniques are-
Filter Methods: In this method, each feature is ranked based on some univariate metrics. Then it selects the highest-ranking features.
Some of the common filtering methods are-
information gain or mutual information
Wrapper Methods: Wrapper methods selects the features based on specific classifier performance. With a greedy approach, it evaluates all the possible combinations of features against the evaluation criterion.
Some of the popular wrapper methods are-
recursive feature elimination
sequential feature selection algorithms
Embedded Methods: These methods perform feature selection during the model training period. This is the reason why they are called embedded methods. Here the model can perform both feature selection and training at the same time.
Some of the popular methods are-
L1 (LASSO) regularization
decision tree based feature selection
This technique tries to reduce the number of features by creating new features from the existing ones. Then it discards the original features. The newly built set of features contain most of the crucial information of the dataset.
Some of the popular feature extraction methods are-
Quadratic Discriminant Analysis (QDA)