Parameters vs Hyperparameters

In machine learning, the terms “parameters” and “hyperparameters” are often used, but they refer to distinct aspects of a model and its training process:

1. Model Parameters are the internal variables or coefficients of a machine learning model that are learned directly from the training data during the learning process.

They define the specific mappings between input features and output predictions.
Model parameters are optimized and updated by the learning algorithm as it processes the data.
Learned from data: The algorithm automatically adjusts these values to minimize a loss function (e.g., error).
Internal to the model: They are part of the model’s structure and are what the model “knows” after training.
Not set manually: You don’t explicitly choose their values beforehand; the training process determines them.
Often saved with the model: When you save a trained model, its learned parameters are typically stored.

2. Hyperparameters are external configuration variables whose values are set by the user before the model training process begins.

They control the overall learning process, the model’s architecture, and its training behavior.
Hyperparameters are not learned from the data; they guide how the learning takes place.
Set manually (or tuned): You explicitly define their values. Finding the optimal values often involves a process called “hyperparameter tuning” or “hyperparameter optimization.”
External to the model: They influence the training algorithm but are not part of the final learned model itself.
Influence training behavior: They dictate aspects like model complexity, learning speed, and regularization.
Affect model parameters: The choice of hyperparameters directly impacts the values that the model parameters ultimately learn.

In essence:

Parameters are what the model learns.
Hyperparameters are how the model learns.

You tune hyperparameters to help the model learn the best possible parameters for your specific dataset and task, ultimately aiming for strong generalization performance on unseen data.