Evaluating Malaria Cell Classification Results and The Importance Of Precision As A Metric | by tekjanice | Jan, 2024


In this study, a CNN model has been created to classify cell images as either parasitised or uninfected. The dataset was intentionally made imbalanced, containing approximately ¼ infected cells, and ¾ of uninfected cells to test my theory on whether having a balanced dataset is crucial when building a CNN model to maintain high accuracy, based on Buolamwini’s research paper on AI biases. Due to the fact that it is imbalanced, we can expect this model to have a classification bias towards the majority class– in this case, uninfected cells. The key question that we should ultimately consider is, should accuracy rate be the sole metric when determining a model’s performance, particularly in cases of dataset imbalances?

Steps taken:

In the development of the CNN model for malaria cell classification, I begin by structuring the dataset into two distinct training and testing dataset, containing 1386 parasitised cell images and 12480 uninfected cell images for the training data whilst containing 147 parasitised cell images and 1300 uninfected cell images for the testing data. Thus, there is an imbalanced proportion of the dataset that consists of about ¼ infected cells and ¾ uninfected cells.

After this, these cell images were visualised and their dimensions were analysed, revealing a variation in size that calls for standardisation. By standardising the image sizes, it can ensure that all images are converted into a uniform size so that they can be fed into the CNN model consistently. To standardise and enhance the model’s ability to generalise, data augmentation techniques were employed. For instance, I used a range of transformations such as rotations and translations through trial and error, and experimented with different values to finally reach a high accuracy rate. These techniques allow the model to learn from a more diverse set of images, making it more robust when it encounters variations in new and unseen data.

Then, to build the model. The CNN architecture was constructed with some layers like convolutional layers for feature extraction to analyse features such as edges and textures, max-pooling layers for dimensionality reduction, dense layers for pattern recognition, and dropout layers for regularisation to prevent overfitting of the model.

Moreover, I have also introduced the EarlyStopping mechanism to terminate training when the model starts to overfit. This is triggered when the validation loss stops improving for some predefined epochs. In my model, I have implemented patience = 2 to indicate that the training process would stop if the validation loss does not improve after 2 continuous epochs.

To train the model, I created 2 generators using flow_from_directory — train_image_gen for the training data and test_image_gen for the validation data. These generators read images from their respective directories in batches, apply the defined transformations, and provide them to the model during training and validation. With these generators in place, the training process involves feeding the data from train_image_gen to the model and using test_image_gen for validation.

After training this model, the model is evaluated using a test set that consists of 13,864 images, maintaining the original imbalance ratio. This is crucial in order to assess the model’s ability to predict in real world conditions with an imbalance dataset.

Results Evaluation:

The evaluation results based on the provided images showed that the model achieved an overall high accuracy rate of 96%. However, the precision for the minority class (parasitised cells) was much lower compared to the majority class (uninfected cells). For example, the precision rate for detecting parasitised cells was 76%, while it was 99% for uninfected cells. This large discrepancy clearly shows the model’s tendency to favour the majority class (uninfected cells).

To interpret this further, the precision rate for parasitised cells is only 76%, this means that out of all the cells that the model has predicted as infected, it is only 76% correct of the time, while 24% of the cells were falsely predicted as infected, thus 24% of false positives.

Moreover, the recall for parasitised cells was high, which indicates that the model was quite sensitive to the presence of parasites. The F1-score, which combines precision and recall, was also lower for parasitised cells compared to uninfected cells, further highlighting the impact of the imbalanced dataset on the model’s performance.


The results illustrate that relying solely on accuracy as a metric on models trained on imbalanced datasets can be misleading. Although the model showed a high accuracy rate of 96%, it is less precise in identifying the less represented class (parasitised cell). In real world applications, this could potentially lead to misdiagnosis of infections, therefore leading to costly implications for patients.

More Info:

True Positives (TP): The number of infected cells correctly identified by the model.

False Positives (FP): The number of uninfected cells incorrectly identified as infected by the model.

True Positives (TP): The number of infected cells correctly identified by the model.

True Negatives (TN): The number of uninfected cells correctly identified by the model.

I believe there are 2 main causes for the low precision rate in my model:

Imbalanced dataset- as the model is trained largely on uninfected cells, it is less adept at identifying infected cells, thus increasing false positives (incorrectly identifying uninfected cells as infected) and reducing precision rate.

Inadequate representation in data augmentation- data augmentation techniques did not sufficiently represent some variations and details of the infected cells, thereby more likely to misidentify uninfected cells as infected, reducing precision rate.

On the other hand, I believe there is 1 main cause for the high accuracy rate despite using an imbalanced dataset:

Effectiveness of model at predicting majority class- due to the majority class in the imbalanced dataset, the model became proficient at identifying the majority class (uninfected cells), therefore accurately identifying uninfected cells which increases the accuracy rate overall.

Overall, precision is especially important in instances of an imbalanced dataset, particularly in cases where false positives are high. This means that the model wrongly identifies an uninfected cell as infected. Similarly, other metrics like recall are quite important too as it correctly identifies positive instances, that is, catching all cases of disease correctly.

Check my full code below:

By Janice Tek

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.