Widget HTML Atas

SVD and PCA Explained with a Real-Life Example

SVD and PCA Explained with a Real-Life Example

Imagine you’re working with a dataset that has many features (columns), but some of those features are redundant or less useful. You want to compress the data while keeping the important patterns. That’s where PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) help.

Real-Life Analogy: Finding the Best View

Imagine you're holding a 3D object (like a toy or sculpture). You want to take a photo of it that best shows its shape and structure. You try different camera angles. Finally, you find one where the object looks most "spread out" — revealing the most about it. That perfect camera angle? That’s what PCA finds in your data — the direction(s) that show the most variation or "spread".

Let’s Start with Data (Toy Example)

Suppose we have 5 students and their scores in Math and Physics:

Student Math Physics
A 90 86
B 72 75
C 84 80
D 60 65
E 96 90

Notice: Math and Physics scores are correlated — students good in Math are also good in Physics.

Problem:

  • We have 2 features (Math and Physics).

  • But they are redundant.

  • Can we represent this data using just one feature that captures most of the info?


Step-by-Step: PCA with SVD (Conceptually)

Step 1: Center the Data

We subtract the mean from each column:

X~=Xmean\tilde{X} = X - \text{mean}
Student Math (centered) Physics (centered)
A 12 9.2
B -6 -1.8
C 6 3.2
D -18 -11.8
E 18 13.2

Now the average of each column is 0.

Step 2: Apply SVD

We apply SVD on the centered data matrix X~\tilde{X}:

X~=UΣVT\tilde{X} = U \Sigma V^T

This gives us:

  • VV: the directions (principal components),

  • Σ\Sigma: the importance of each direction (singular values),

  • UΣU \Sigma: the new coordinates of the data in the rotated system.

In our case, VV will give 1st principal component direction — a line that best captures the variation.



Step 3: Choose Top Components

Usually, only the first few singular values are large. So, we keep top 1 or 2 components that explain most variance.In this example, the 1st principal component may capture 98–99% of the variance. That means we can reduce 2D data to 1D.

Step 4: Project the Data

Now we project each student’s scores onto this new line:

Z=X~VZ = \tilde{X} \cdot V

So instead of saying:

"Student A scored 90 in Math and 86 in Physics",
we now say:
"Student A has a PCA score of 15.2".

That’s a single number representing both subjects — simpler but still meaningful!


No comments for "SVD and PCA Explained with a Real-Life Example"