While working on different Machine Learning techniques for Data Analysis, we deal with hundreds or thousands of variables. Most of the variables are correlated with each other. In such cases, fitting the model to the dataset results in poor accuracy of the Model. Principal Component Analysis and Factor Analysis techniques are used to deal with such scenarios.
- What is Principal Component Analysis?
- Objectives
- Assumptions
- When to use PCA?
- How does PCA algorithm work?
- Steps of dimentionality reduction
- Applications of PCA
What is Principal Component Analysis (PCA)?
Principal Component Analysis (PCA) is an unsupervised statistical technique algorithm. PCA is a “dimensionality reduction” method. It reduces the number of variables that are correlated to each other into fewer independent variables without losing the essence of these variables. It provides an overview of linear relationships between inputs and variables.
Objectives
- PCA helps in Dimensionality reduction. Converts set of correlated variables to non-correlated variables.
- It finds a sequence of linear combinations of variables.
- PCA also serves as a tool for better data visualization of high dimensional data. We can create a heat map to show the correlation between each component.
- It is often used to help in dealing with multi- collinearity before a model is developed.
- It describes that data is a good story teller of its own.
- These models are useful in data interpretation and variable selection.
AIM of PCA is that the Factors should be uncorrelated.
Also Read: Linear Regression in Machine Learning
Assumptions of PCA
- Independent variables are highly correlated to each other.
- Variables included are metric level or nominal level.
- Features are low dimensional in nature.
- Independent variables are numeric in nature.
- Bartlett-Test: The Bartlett test is statistically significant as.
H0: Variables are uncorrelated.
H1: Variables are correlated.
When to use PCA?
- Whenever we want to ensure that variables in data are independent to each other.
- When we want to reduce the number of variables in a data set with many variables in it.
- When we want to interpret data and variable selection out of it.
How does PCA Algorithm work?
Suppose we have 2 variables X1 and X2. Data is spread across the area where regression determines a line of best fit.
The origin will shift to the point where variation in X1 and X2 are maximum, so PC1 is a new component and another will be perpendicular to it but in multidimensional space as PC2. So that PC1 and PC2 are not correlated to each other.
Now, the objective is to find such directions in data where PC1 and PC2 can be made perpendicular to each other. This is what PCA does. This process will repeat in the form of Matrix multiplication if the dataset has many variables. It is an NxM matrix.
Now PCA will take linear combination as:
PC1=a1X1+a2X2
PC2=b1X1+b2X2
For example, if I want to study the variables affecting winning candidates in the election commission during past elections in India. Variables like candidate education, age, criminal cases, assets, liabilities, religion, area population, Institutes, Police Stations, and so on. There must be a lot of correlation among the variables. Based on the principle of correlation, I collapse them into underlying dimensions. So the analysis will obtain the first factor which has the highest variance reduction, then, the second factor with the next highest reduction and third highest and so on.
Also read: Applications of Machine Learning
Let’s walk through the steps of dimensionality reduction.
Step 1
Read the data file in tool and standardize the data set for missing values computation, outlier analysis is done properly, data is numeric in nature etc.
Step 2
Finding out what is the kind of correlation that exists among each one of the variables?
Construct Covariance matrix of data.
There will be evidence from the correlation matrix that few variables are slightly correlated, and few are highly correlated, and few are not.
Step 3
How many Factors needs to be identified?
The answer is Eigen Value – It is a basis for selecting the number of Factors.
How to compute Eigenvalue?
Correlation Matrix decomposition of PCA will result in Eigenvalues & Eigenvectors.
Eigenvectors
Eigenvectors are a list of coefficients which shows how much each input variable contributes to each new derived variable. If we square and add each Eigenvector then we get Eigenvalue.
Taking an example below for 1 factor as:
(0.0027891)^2+ (0.241268)^2 + (0.025736)^2 + (-0.116926)^2 + (-0.161015)^2 + (0.308333)^2 + (0.281047)^2 +( 0.274605)^2 + (-0.370362)^2+( 0.370362)^2+( 0.414049)^2+( 0.339159)^2+( 0.269873)^2+( 0.026549)^2+( -0.033950)^2 + (0.128716)^2 = 3.506e+00 (Eigenvalue 1)
After calculations from Eigenvector, below Eigenvalues are identified.
EigenValues –
[1] 3.506e+00 2.535e+00 1.778e+00 1.390e+00 1.077e+00 1.019e+00 9.941e-01 7.846e-01 6.767e-01 6.278e-01 4.997e-01 4.438e-01 3.505e-01 3.172e-01 5.902e-07 -3.331e-16 |
Similarly the calculations are the same for other Eigenvalues as well.
Eigenvalue
Represents the proportion of variance explained by each PC. Also represents the largest variance reduction. Sum of all Eigenvalues equals the sum of the variances of all input variables as variance summarization.
Variance Summarized
It is an Eigenvalue where the maximum variance reduction takes place and then next reduction and so on.
Now to identify how many PCs to take into consideration we will use scree plot.
Plotting Scree Plot to get EigenValue – Consider all the variables in data to plot scree plot.
Scree Plot
Plot scree Factors on x axis and Eigenvalue in y axis. As shown the below figure.
Factors can be extracted based on two methods: Kaiser Normalization rule and Elbow method.
Elbow Method
According to Elbow method, we consider the values where the elbow is created in the scree plot and pick up that value. As per the below scree plot, the elbow of the curve is at value 3 and also at value 5 which is making confusion to pick one value.
The disadvantage of the Elbow method is that a scree plot can create multiple elbows. This makes it confusing to select one eigenvalue.
Kaiser Guttmann rule
According to Kaiser rule, value less than 1 should be omitted in the scree plot and the retained values are always greater than 1. This rule is mostly used as a well-known method to identify Eigenvalues from a scree plot.
As per below scree plot, the dimensions to be considered are 5, as the value is less than 1.
Step 4
Unrotate PCA – Now compute the command ‘principal’. This command executes principal component factor analysis, it will extract the uncorrelated PCs where we provide rotate as ‘none’. We are identifying the correlation without applying rotation.
According to above fig. unrotated principal component analysis , we can see the correlations are between -1 and +1.
Factor loadings – Loadings measures the correlation between each input variable and the factors.
Factor loadings can be both positive and negative. Loadings can be interpreted for correlation coefficients ranging between -1 and +1. When the values are closer to 1 irrespective of the sign, positive or negative, the correlation between each input variable and the factors is stronger.
For example- From the above Unrotated PCA fig. correlation between inputs and Factor PC2 are +0.94 and -0.938 respectively are stronger, irrespective of positive or negative sign.
Also Read: Analysis of Variance
Data Interpretation in PCA
- For interpretation, the loadings values should be greater than 0.5
- Loadings can be interpreted for correlation coefficients ranging between -1 and +1.
- Correlated values must be closer to +1 or -1.
Interpreting Unrotated PCA
The variance in Education is 24%. Means Education is explained by 5 Factors together PC1, PC2, PC3,PC4, PC5 by 24% . This is called Communality as common variance captured by each variable.
What is Communality?
The proportion of common or shared variance present in a variable is called Communality. In other words, it measures the percentage of variance in original variables captured by Factor equations.
Similarly, the variance in Candidate Age is 38% . Means Candidate Age is explained by 5 Factors PC1, PC2, PC3,PC4,PC5 by 38%.
Same for other variables.
Currently by looking into above observations, it is difficult to interpret factors into a meaningful manner from Unrotated so will apply Varimax or Orthogonal Rotation in principal components.
Step 5
Apply Orthogonal Rotation to interpret factors into a more meaningful manner.
Orthogonal Rotation –
As per the below Orthogonal Rotation diagram, here the purpose is push all the lower correlation of PC1, PC2, PC3, PC4 and PC5 loadings closer to zero and push all the higher correlation closer to 1 by applying varimax or orthogonal rotation as shown in below orthogonal rotation fig..
Factors are independent and Uniqueness remains the same. And it causes factor loadings to be more clearly differentiated to interpret more clearly.
But the constraint is that Communality should remain the same after the Orthogonal Rotation is applied.
Fig. Rotated PCA
According to above fig. Rotated principal component analysis ,In Rotational Factor R1 Population. Size, Police Station, Educational. Institute and Industry Size are Correlated to each other.
In Rotational Factor R2, Rural % and urban % are correlated to each other.
In Rotational Factor R3, Literacy Rate and so on.
Now we have 16 Factors reduced to 5 Dimensions.
Step 6
Drop unimportant features by calculating Factor Scores-
Factor scores are computed based on the Factor equations by regression method or other methods.
Factor scores impute the values based on factor loadings and inputs. As per our election case study, factor scores are 5 and helps in screening of election commission winning candidature.
We generate a score and try to find which Factor score is more important than another and second important and so on.
Some of the Applications of Principal Component Analysis (PCA)
- Principal Component Analysis can be used in Image compression. Image can be resized as per the requirement and patterns can be determined.
- Principal Component Analysis helps in Customer profiling based on demographics as well as their intellect in the purchase.
- PCA is a technique that is widely used by researchers in the food science field.
- It can also be used in the Banking field in many areas like applicants applied for loans, credit cards, etc.
- Customer Perception towards brands.
- It can also be used in the Finance field to analyze stocks quantitatively, forecasting portfolio returns, also in the interest rate implantation.
- PCA is also applied in Healthcare industries in multiple areas like patient insurance data where there are multiple sources of data and with a huge number of variables that are correlated to each other. Sources are like hospitals, pharmacies, etc.
Principal Component Analysis (PCA) performs well in identifying all influencing factors affecting results in individual areas. Also correlating factors associated with candidate win/lose. Not only in the election commission, the PCA technique is used in many applications and different industries and multiple areas and fields.
If you found this helpful and wish to learn more, upskill with Great Learning’s Artificial Intelligence and Machine Learning course today!