How to compare different models

How to compare different classification models using logloss and how to pick the best one

LOG loss is useful when we have to compare models, It compares the model mainly in two ways by their outputs and their probabilistic outcome.

* To calculate LOG loss the classifier assigns the probability to each class.

* LOG loss starts to measures the uncertainity of the model of every sample and it compares with the true labels and in return penalises the false classification.

* LOG loss has the ability to get defined for two or more labels

* LOG loss nearer to 0 means higher accuracy away from zero means lower accuracy. LOG loss has the range between 0 to infinity.

If there are N samples belonging to M classes :

1.) yij , indicates whether sample i belongs to class j or not

2.) pij , indicates the probability of sample i belonging to class j

The negative sign negates log(yij^) output which is always negative. yij^ outputs a probability (0 - 1). log(x) is nagative if 0 < x < 1.

Table of Contents

from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict import pandas as pd import numpy as np import seaborn as sns from sklearn.linear_model import LogisticRegression

Step 2- Importing and preparing the dataset.

We will import the dataset directly through seaborn library.

iris = sns.load_dataset('iris') X=iris.drop(columns='species') y=iris['species'] Xtrain, Xtest, ytrain, ytest= train_test_split(X,y, test_size=0.3, random_state=20)

Step 3- Fitting the Model.

We will start the fit the Machine Learning Model.

# Logistic Regression clf_logreg = LogisticRegression() # fit model clf_logreg.fit(Xtrain, ytrain)

Step 4- Calculating the LOG LOSS.

we will calculate the LOG LOSS score.

logloss_logreg = cross_val_score(clf_logreg, Xtrain, ytrain, scoring = 'neg_log_loss').mean() print(logloss_logreg)

How to compare different models
Figure 1: Image from the author.

In this article, we will discuss the performance metrics that we must use when we have to compare multiple machine learning models. Performance metrics are the backbone of every machine learning model. They will tell us how accurate we are training and evaluating our model.

In regression-based machine learning problems, it is common to use correlation coefficient (R), Root Mean Square Error (RMSE) or MSE, and Bias as the performance metrics to evaluate the performance of the trained machine learning model (Singh et al. 2022). The formulas for calculating R, RMSE, and MSE are given below;

How to compare different models

where SSE is the sum of squares of errors, SST is the sum of squares of the total, yobs is the observed, and ysat is the predicted values. However, these metrics are good to evaluate the performance of a single machine learning model. For comparison of multiple machine learning models (or with other benchmark algorithms), we need some other performance metrics for a robust conclusion.

According to a recent research article (Singh et al., 2021), we need to add some additional performance metrics for comparing two or more machine learning models. They suggested that for multi-model comparison, it is recommended to use Akaike’s Information Criterion (AIC), corrected AIC (AICc), and Bayesian Information Criterion (BIC). All these metrics penalise the machine learning model for a high number of parameters. The model with a lower value of AIC, AICc, and BIC is preferred. We will briefly discuss these criteria (a detailed description can be found in the corresponding references).

  1. Akaike’s Information Criterion (AIC) by (Akaike 1969) [2]
How to compare different models

2. Corrected AIC (AICc) by (Hurvich and Tsai, 1989) [3]

How to compare different models

3. Bayesian Information Criterion (BIC) by (Schwarz 1978) [4]

How to compare different models

where ntrain is the number of training samples, and p is the number of parameters that the machine learning model evaluates internally.

Hence, for a more robust comparison of multiple machine learning models, we can use AIC, AICc, BIC along with R, RMSE, and bias (Singh et al., 2021).

How to compare different models
Figure 2: Image from the author.

References

[1]. Singh Abhilash, Kumar Gaurav, Atul Kumar Rai, and Zafar Beg “Machine learning to estimate surface roughness from satellite images,” Remote Sensing, MDPI, 13 (19), 2021, DOI: 10.3390/rs13193794.

[2]. Akaike, H. (1969), “Fitting Autoregressive Models for Prediction”. Annals of the Institute of Statistical Mathematics, 21, 243–247.

[3]. Hurvich, C.M., and Tsai, C.L. (1989), “Regression and time-series model selection in small samples”. Biometrika, 76, 297–307.

[4]. Schwarz, G. (1978), “Estimating the Dimension of a Model”. Annals of Statistics, 6, 461–464.

[5] Singh, A., Amutha, J., Nagar, J., Sharma, S., & Lee, C. C. (2022). LT-FS-ID: Log-Transformed Feature Learning and Feature-Scaling-Based Machine Learning Algorithms to Predict the k-Barriers for Intrusion Detection Using Wireless Sensor Network. Sensors, 22(3), 1070.

Note: If you have any queries, please write to me () or visit my web page.

Don’t forget to subscribe to my YouTube channel.