Understanding Cost Functions

By Md. Babul Hasan (NoYoN) 24 Jul, 2025 Post a Comment

Understanding Cost Functions

In machine learning — especially in regression problems — we build models to predict values. But how do we know if our model is doing well or poorly? The answer lies in the cost function, which tells us how far our predictions are from reality.

In this post, we will walk through:

What is a cost function?
Different types of error measurements
Why we needed new ones
Which one is better in which case

What is a Cost Function?

A cost function is a formula used to measure how wrong the predictions of a model are.
It compares the predicted values ( $\hat{y}$ ) to the actual values ( $y$ ) from the dataset.

The goal of training a model is to minimize the cost function — i.e., make predictions as close to actual results as possible.

Step-by-Step Evolution of Error Metrics

Let’s now go step-by-step through the order in which error/cost functions evolved, and why.

Error

Formula:

\text{Error}_i = \hat{y}_i - y_i

This is the raw difference between prediction and actual.
Problem: Some errors are positive, some negative. If you add them, they cancel out!

Absolute Error (AE)

Formula:

AE_i = |\hat{y}_i - y_i|

By taking the absolute value, we remove the cancellation problem.
But we’re still dealing with individual errors.

Mean Absolute Error (MAE)

Formula:

MAE = \frac{1}{n} \sum_{i=1}^{n} |\hat{y}_i - y_i|

Now, we average all the absolute errors to get an overall measure of error.
Pros: Easy to understand, doesn’t penalise big errors too harshly
Cons: Not smooth mathematically (harder to use with gradient descent because it's non-differentiable at zero)

Squared Error (SE)

Formula:

SE_i = (\hat{y}_i - y_i)^2

We square the error to remove negatives and highlight bigger mistakes more.
E.g., error of 2 becomes 4, but error of 5 becomes 25.

Mean Squared Error (MSE)

Formula:

MSE = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2

This is one of the most commonly used cost functions.

Always positive
Smooth and differentiable (perfect for gradient descent)
Penalizes larger errors heavily
Sensitive to outliers (because of squaring)

Root Mean Squared Error (RMSE)

Formula:

RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2}

Just like MSE, but takes the square root so the final value is in the same unit as the target variable.

Easy to interpret (same unit as y)
Keeps benefits of MSE
Still sensitive to outliers

Which One Should You Use?

Use MAE when all errors matter equally and you're okay with small optimization challenges.
Use MSE when you want to penalize large errors more (good for gradient-based learning).
Use RMSE when you want interpretability (in real units).

Programming & Tech Support

Widget HTML Atas

Understanding Cost Functions

Understanding Cost Functions

What is a Cost Function?

Step-by-Step Evolution of Error Metrics

Error

Absolute Error (AE)

Mean Absolute Error (MAE)

Squared Error (SE)

Mean Squared Error (MSE)

Root Mean Squared Error (RMSE)

Which One Should You Use?

No comments for "Understanding Cost Functions"

Post a Comment