Project Overview
In this project, we have tried to make the challenge of creating a skin cancer detection system using deep learning easier. Based on the dataset of skin cancer images, we fine-tune two advanced pre-trained models of EfficientNetB4 and DenseNet121, as well as a basic CNN to classify skin lesions into different types such as melanoma, pigmented benign keratosis, etc. With EfficientNetB4, we obtained an accuracy of over 80%.
The outcome is forecasts that help doctors in the actual process of diagnosing and may help eliminate deadly diseases at an early stage. Whether you’re a young professional interested in healthcare or an everyday internet user wondering how AI can improve human life, this project demonstrates how machine learning can help.
Prerequisites
Before we jump into the code, here’s what you’ll need:
- An understanding of Python programming and usage of Google Colab
- Basic knowledge about deep learning and medical images.
- Comfortable using frameworks like Tensorflow, Keras, Numpy, OpenCV, and Seaborn to handle data and build models and visualize data and performance of models
- Skin cancer dataset.
Once you organize these tools, you will notice how almost all of them can be used in the following step. Also, do not stress if you are not a Python master—through the tutorial, you will understand every line of the code!
Approach
The approach involves building, training, and evaluating deep learning models on a skin cancer dataset. We use image-processing techniques and deep-learning architectures to classify skin lesions into different types of cancer. By using pre-trained models like DenseNet and EfficientNet, we enhance the performance of the classification system while also improving accuracy.
The major steps involve:
- Obtaining and preparing data (augmentation, resizing, normalizing)
- Training and measuring the performance of several architectures
- Visualizing performance with confusion matrices and accuracy plots
Workflow and Methodology
This project can be divided into the following basic steps:
- Data Collection: We collected the skin cancer dataset labeled with different cancer names from Kaggle
- Data preprocess: To improve the model performance and achieve higher accuracy, we applied different preprocessing techniques. First, we augmented the dataset to create a balanced dataset. Then we resized and normalized the images in 0 to 1 pixel values.
- Model Selection: In this project, there are three models used (Custom CNN, EfficientNetB4, and DenseNet21).
- Training and Testing: Each of the Models has been trained on the preprocessed dataset and later, tested on the dataset that was not used during training.
- Model Evaluation: The evaluation of the model's performance is done by evaluating accuracy, precision, recall, confusion matrix, etc.
The methodology includes
Data Preprocessing: The images are resized, normalized, and augmented to improve the performance of models based on them.
Model Training: Each model is trained with 100 epochs to enhance the level of performance.
Evaluation: Standard metrics (accuracy, precision, recall, f1-score, and confusion matrix) are applied to assess the efficiency of the models.
Dataset Collection
The dataset we used had 2,500 images which were scaled through augmentation to 4,500 images. The dataset was divided in the following manner 80/20 which means 80% of the data was used for training the model and 20% each for validation of the model.
Data Preparation
The dataset was pre-processed by resizing the images to a size of 128 * 128 pixels and scaling the pixels to the range 0 to 255. To increase the variability of the dataset, primarily data augmentation techniques were applied.
Data Preparation Workflow
- Load Dataset from Google Drive
- Rotation, flipping, and changes in contrast, among others, are employed to increase the diversity of the datasets.
- Process and Resize as per Standards used in the model. This helps to standardize the input of the models.
- Further, the collected dataset has to be split into training and validation sets.