Better Multi-Modal Disease Prediction | by Mustafa Sinan Cetin | Intel Analytics Software | Nov, 2023

Photo by National Cancer Institute on Unsplash

Using Intel’s Optimized End-to-End Reference Kit to Improve AI Processes

Breast cancer is the second most common cancer among women in United States. Given the substantial number of patients screened, a rapid assessment of a patient’s cancer risk is essential for effective treatment planning. This requires quick analysis of all available resources, such as categorized contrast-enhanced mammography (CESM) images and radiologist notes. However, current screening programs using digital mammography are associated with an expensive workflow. Despite the implementation of double reading protocols, up to 25% of cancers remain undetected. This highlights the need for computer-aided detection systems to improve the quality and cost-efficiency of breast cancer screening.

Most research has focused on developing a solution that is the result of one single model or from a particular modality of the data. But the single model and single modal approach to classification are limited by the complexity of the data domains and difficulties associated with obtaining human data (i.e., privacy, regulation, and the cost of data collection). This study aims to improve the accuracy and efficiency of breast cancer diagnosis by using a multi-modal approach that incorporates both image and text data, as demonstrated in the Multi-Modal Disease Prediction reference kit. The goal of this reference kit is to minimize an expert’s involvement in categorizing samples as normal, benign, or malignant by developing and optimizing a decision support system that automatically categorizes the CESM images with the help of radiologist notes.

The dataset used in this study consists of 1,003 subtracted high-resolution CESM images with annotations for 326 female patients. The images were gathered from the Radiology Department of the National Cancer Institute, Cairo University. They represent each side with two views (Top Down and Angled Top View) consisting of subtracted CESM images (Figure 1). To improve computational efficiency and reduce redundancies in the images, only segmented regions that are defined by domain experts are used for fine-tuning and inference.

Figure 1: Samples of subtracted CESM images from the dataset
Figure 1. Examples of subtracted CESM images from the dataset (Source: Khaled et al., Categorized contrast enhanced mammography dataset for diagnostic and artificial intelligence research, Nature, 9:122, 2022)

Expert radiologists manually annotated the images according to the standardized descriptors of the American College of Radiology Breast Imaging Reporting and Data System (ACR BIRADS) 2013 lexicon. Medical reports, written by radiologists, are provided for each case along with manual segmentation annotation for the abnormal findings in each image (Figure 2). However, annotation notes contain extraneous information, such as patient details (name, ID, and date of study) and image series. These fields were discarded as they are not pertinent to identifying and categorizing lesions. For each patient, annotation notes were merged and consolidated into a single comma-separated values (CSV) file along with the corresponding image ID.

Figure 2. Annotation examples from the dataset

To fine-tune and inference the CESM images, the reference kit uses Transfer Learning Tool (TLT)-based vision workflow (TLTVW), which is optimized for image fine-tuning and inference, along with TensorFlow Hub’s ResNet-50 model, to fine-tune a new convolutional neural network model with the subtracted CESM image dataset.

The reference kit uses the Hugging Face finetuning and Inference Optimization workflow (HFIOW) as a part of the Intel Extension for Transformers toolkit, which is specifically designed for document classification tasks. This natural language processing workflow uses several libraries/tools, including Intel Neural Compressor and Hugging Face model repository and Application Programming Interface (APIs) for ClinicalBert models. The ClinicalBert model, pretrained on a large English language corpus of MIMIC-III data using Masked-Language-Modeling task, is fine-tuned with the CESM breast cancer annotation dataset to generate a new BERT model, which is used for fine-tuning and inference of radiologist notes in the reference kit.

The ensemble method uses a weighted score per model/class to enhance the accuracy and reliability of prediction results. It averages F1 scores that are computed for each class during the fine-tuning phase. The fine-tuned models are then executed with the test data, and the prediction scores of the test data are multiplied by the weighted F1 scores for each class. These inference scores from the test data are referred to as “corrected prediction scores.” The corrected prediction scores for each domain are then added across the classes, and the final prediction is the class with the highest prediction score (Figure 3).

Figure 3. Diagram of the components and process

Our primary objective was to assess the classification accuracy of the image, text, and ensemble methods. We conducted this evaluation using the shuffle-split cross-validation method with 100 iterations. To create our training and testing datasets, we partitioned the data, allocating 60% for training and 40% for testing in each iteration. Initially, we investigated the classification performance of the image and text domains individually. Subsequently, we employed ensemble methods to make predictions, examining the changes in classification performance for each iteration. We also reported the standard deviation between the classification algorithms, using the average classification accuracy scores from 100 runs for these three methods.

Image Classification

The Resnet_v1_50 model was used for fine-tuning in our study. This model is readily available in the TensorFlow framework and is designed to process input images of dimension 224 x 224 x 3. It is pretrained on the widely used ImageNet dataset. The dataset was loaded using Intel TLT. To enhance the model’s robustness, the data underwent shuffling and augmentation techniques, including horizontal and vertical flips and rotation. During the fine-tuning procedure, the preexisting classification layer was excluded, and instead, two fully connected layers were incorporated, possessing dimensions of 1,024 and 512, respectively. Subsequently, a final classification layer was appended to this architecture. The fine-tuning process was conducted with a batch size of 24 for 30 epochs. The fp32 data type was employed for both the fine-tuning and inference stages. Furthermore, the default parameters of the TLTVF were uses for the remaining parameters.

Table 1 displays the average confusion matrices for image classification across all iterations. The resulting prediction outcomes of each iteration were saved to facilitate subsequent employment in the ensemble process.

Table 1. Average confusion matrix scores of the image model

Text Classification

We used HFIOW to evaluate the performance of the annotation data for classification. This workflow uses Intel Neural Compressor and other libraries and tools plus the Hugging Face model repository and APIs for ClinicalBert models. It uses pretrained ClinicalBERT embeddings along with a classification head to incorporate medical domain knowledge and fine-tuned features, and the saved model is used to predict disease probabilities from new annotation strings. To fine-tune the model, the final layer is removed and a new layer is added with three output classes (normal, benign, malignant). All the weights are left to be fine-tuned during training. We use a maximum total input sequence length of 128 and a batch size of 64. Additionally, we use fp32 data type for finetune/inference with the epochs number of 8, and other parameters are used with default parameters of the HFIOW.

Table 2 displays the average confusion matrices for NLP classification across all iterations. The prediction results are saved for use in the ensemble process.

Table 2. Average confusion matrix scores of the NLP model


Our models were assessed for classification accuracy, and the findings reveal that the NLP model exhibited superior performance over the image model in classifying all categories except ‘benign.’ As our approach incorporates a weighted score for each model and class, accentuating the influence of the superior model, it substantially elevates the accuracy and dependability of the prediction outcomes. Consequently, our ensemble method yielded a more resilient and high-performing model, achieving an accuracy of 80% (3) (Figure 4).

Figure 4. Average classification accuracy for image and text domains for each method

The Multi-Modal Disease Prediction reference kit is an Intel-optimized end-to-end solution that uses a multi-modal approach to predict breast cancer by incorporating both image and text data. The reference kit is designed to improve the quality and cost-efficiency of breast cancer screening by using a decision support system to reduce the involvement of experts in categorizing samples as normal, benign, or malignant.

Notice and Disclaimer

This reference implementation shows how to train a model to examine and evaluate a diagnostic theory and the associated performance of Intel technology solutions using very limited, non-diverse datasets to train the model. The model was not developed with any intention of clinical deployment and therefore lacks the requisite breadth and depth of quality information in its underlying datasets or the scientific rigor necessary to be considered for use in actual diagnostic applications. Accordingly, while the model may serve as a foundation for additional research and development of more robust models, Intel expressly recommends and requests that this model not be used in clinical implementations or as a diagnostic tool.

Performance varies by use, configuration, and other factors. Learn more at

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.

Source link

Be the first to comment

Leave a Reply

Your email address will not be published.