- Medical Image Segmentation With UNET
- Real-Time License Plate Detection Using YOLOv8 and OCR Model
- Real-Time Human Pose Detection With YOLOv8 Models
- Customer Service Chatbot Using LLMs
- Document Summarization Using Sentencepiece Transformers
- Semantic Search Using Msmarco Distilbert Base & Faiss Vector Database
- Question Answer System Training With Distilbert Base Uncased
- Image Generation Model Fine Tuning With Diffusers Models
- Predictive Analytics on Business License Data Using Deep Learning
- Complete CNN Image Classification Models for Real Time Prediction
- Linear Regression Modeling for Soccer Player Performance Prediction in the EPL
- Insurance Pricing Forecast Using XGBoost Regressor
- Chatbots with Generative AI Models
- Nutritionist Generative AI Doctor using Gemini
- Cervical Cancer Detection Using Deep Learning
- Skin Cancer Detection Using Deep Learning
- Blood Cell Classification Using Deep Learning
- Glaucoma Detection Using Deep Learning
- Leaf Disease Detection Using Deep Learning
- Banana Leaf Disease Detection using Vision Transformer model
- Vegetable classification with Parallel CNN model
- Crop Disease Detection Using YOLOv8
- Automatic Eye Cataract Detection Using YOLOv8
- Voice Cloning Application Using RVC
- Learn to Build a Polynomial Regression Model from Scratch
- Loan Eligibility Prediction using Gradient Boosting Classifier
- BigMart Sales Prediction ML Project in Python
- Word2Vec and FastText Word Embedding with Gensim in Python
- Build Regression (Linear, Ridge, Lasso) Models in NumPy Python
- Build a Customer Churn Prediction Model using Decision Trees
- Build Regression Models in Python for House Price Prediction
- Credit Card Default Prediction Using Machine Learning Techniques
- Topic modeling using K-means clustering to group customer reviews
- NLP Project for Beginners on Text Processing and Classification
- Skip Gram Model Python Implementation for Word Embeddings
- Sentiment Analysis for Mental Health Using NLP & ML
- Time Series Analysis and Prediction of Healthcare Trends Using Gaussian Process Regression
- Build a Hybrid Recommender System in Python using LightFM
- Time Series Forecasting Using Multiple Linear Regression Model
- Build an Autoregressive and Moving Average Time Series Model
- Time Series Forecasting with ARIMA and SARIMAX Models in Python
- Build Multi-Class Text Classification Models with RNN and LSTM
- Build A Book Recommender System With TF-IDF And Clustering(Python)
- Multi-Modal Retrieval-Augmented Generation (RAG) with Text and Image Processing
- PyTorch Project to Build a GAN Model on MNIST Dataset
- Build ARCH and GARCH Models in Time Series using Python
- Human Action Recognition Using Image Preprocessing
- Time Series Analysis with Facebook Prophet Python and Cesium
- Build a Face Recognition System Using FaceNet in Python
- Build a Collaborative Filtering Recommender System in Python
- guest-post-30
- Image Segmentation using Mask R CNN with PyTorch
- Fusion Retrieval: Combining Vector Search and BM25 for Enhanced Document Retrieval
- HyDE-Powered Document Retrieval Using DeepSeek
- Graph-Enhanced Retrieval-Augmented Generation (GRAPH-RAG)
- Context Enrichment Window Around Chunks Using LlamaIndex
- Document Augmentation through Question Generation for Enhanced Retrieval
- Enhancing Document Retrieval with Contextual Overlapping Windows
- Corrective Retrieval-Augmented Generation (RAG) with Dynamic Adjustments
- Optimizing Chunk Sizes for Efficient and Accurate Document Retrieval Using HyDE Evaluation
- Sign language recognition
- ffdwf
ffdwf
tyty
Project Overview
The goal of this project is to build a computer vision system that can recognize and classify hand signs from the American Sign Language (ASL) alphabet using image data. The system is designed to take grayscale images of hand gestures and predict the corresponding letter. This project is especially important in bridging the communication gap between the deaf/mute community and others who do not understand sign language.
We use a popular dataset called Sign Language MNIST, which includes thousands of labeled hand sign images. Using this data, we train various convolutional neural networks (CNNs) from scratch and also apply transfer learning using a pre-trained deep learning model (ResNet50) to improve accuracy.
This kind of system has the potential to be developed further into real-time applications, such as translating sign language into text or speech, which can be used in schools, hospitals, and public services.
Prerequisites
Before you dive into this project, it's important to have some foundational knowledge and tools ready:
- Programming Skills:
- Basic understanding of Python syntax and functions.
- Some familiarity with data structures like lists, dictionaries, and arrays.
- Mathematics and Machine Learning:
- Understanding of how neural networks work.
- Basics of model training, such as epochs, loss functions, accuracy, and overfitting.
- Image Processing:
- Knowing how image data is represented in arrays.
- Understanding grayscale vs. color images and how images are preprocessed.
- Deep Learning Libraries:
- Some experience with TensorFlow or Keras (for building and training models).
- Using libraries like matplotlib, pandas, numpy for data handling and visualization.
- Development Environment:
- Familiarity with Google Colab or Jupyter Notebook.
- Ability to install and use Python packages.
Approach
We followed a step-by-step and structured approach for the project:
- Dataset Understanding:
- We used the Sign Language MNIST dataset, which contains labeled grayscale images of hand signs for 24 letters (excluding J and Z).
- Data Preprocessing:
- The raw data was in a CSV format, with pixel values for each image.
- We reshaped and normalized the data to make it suitable for CNN models.
- Model Building:
- We started with a basic CNN model.
- Then we built improved versions by adding Dropout, Batch Normalization, and other techniques.
- We used Transfer Learning by importing the ResNet50 model and fine-tuning it for our dataset.
- Evaluation and Comparison:
- We evaluated each model using metrics like accuracy and loss.
- Confusion matrices were used to analyze which signs were commonly misclassified.
- Visualizations of training curves and prediction samples helped compare performance.
- Final Output:
- The final model was able to recognize hand signs with high accuracy, demonstrating the power of deep learning for image classification tasks.
Workflow and Methodologies
The project followed a structured step-by-step process from data handling to model evaluation:
1. Data Loading and Exploration
- Loaded the training and test datasets using Pandas.
- Extracted image pixel values and labels from CSV files.
- Reshaped image data into 28x28 grayscale image arrays.
- Visualized sample images using Matplotlib to understand data distribution.
2. Data Preprocessing
- Normalized pixel values to a 0–1 range by dividing by 255.
- Converted labels to one-hot encoded format for classification.
- Split the training data into training and validation sets to monitor performance.
3. Data Augmentation
- Applied transformations like rotation, zoom, shift, and horizontal flip using ImageDataGenerator.
- Augmentation helped increase dataset variety and reduce overfitting.
4. Model Training
- Trained three CNN models with increasing complexity:
- Model 1: Basic CNN using Conv2D and MaxPooling.
- Model 2: Added Dropout and training callbacks like EarlyStopping and ReduceLROnPlateau.
- Model 3: Introduced BatchNormalization for improved training stability.
5. Transfer Learning (Model 4)
- Implemented transfer learning using ResNet50 pre-trained on ImageNet.
- Replaced the top layers with custom layers for 26-class classification.
- Froze base layers to retain learned features, fine-tuned only the top layers.
6. Model Evaluation
- Visualized training and validation accuracy/loss over epochs.
- Created confusion matrices to evaluate per-class predictions.
- Displayed actual vs. predicted images to assess real-world performance.
Data Collection and Preparation
Data Collection
The project used the Sign Language MNIST dataset, which is publicly available on Kaggle. The dataset consists of:
- Training set: 27,455 grayscale images
- Test set: 7,172 grayscale images
- Image size: 28x28 pixels
- Classes: 0 to 25 (excluding letters J and Z due to their dynamic gestures)
Data Preparation Workflow
- Loaded data from CSV files using Pandas.
- Separated image pixel values and labels.
- Reshaped the flat pixel arrays into 28x28 image matrices.
- Normalized pixel values to improve training consistency.
- Split the training set into training and validation subsets.
- Augmented data to enhance generalization and robustness.
from google.colab import drive drive.mount('/content/drive')
Installing Required Python Packages
Installs Kaggle API, TensorFlow for deep learning, and Keras Tuner for hyperparameter tuning.
Project Setup and Library Imports
Now, import key libraries needed for building and training deep learning models in Google Colab. It connects Google Drive for easy data access, uses pandas and numpy for data manipulation, and TensorFlow Keras for creating CNN models. Pretrained architectures like ResNet50, MobileNetV2, and DenseNet121 are included for transfer learning. Additional tools like train-test splitting, performance evaluation with confusion matrices, visualization using matplotlib, and hyperparameter tuning with Keras Tuner ensure a smooth and effective model development process.
from google.colab import drive import pandas as pd import numpy as np import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization from tensorflow.keras.applications import MobileNetV2, ResNet50, DenseNet121 from tensorflow.keras.layers import GlobalAveragePooling2D from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau from sklearn.model_selection import train_test_split from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt from sklearn.model_selection import GridSearchCV from keras_tuner import HyperModel, RandomSearch
Load Dataset from Google Drive
We load the training and testing data for the sign language recognition project from CSV files stored in Google Drive. The data is read into pandas DataFrames for easy handling and analysis.
test_data = pd.read_csv('/content/drive/MyDrive/Badhon/Sign language recognition/sign_mnist_test.csv') train_data = pd.read_csv('/content/drive/MyDrive/Badhon/Sign language recognition/sign_mnist_train.csv') train_data.shape test_data.shape
Loading, Processing, and Visualizing Training Data
Now, load the training dataset from a CSV file, extract the labels and image pixel data, reshape the flat pixel arrays into 28x28 grayscale images, and normalize pixel values to a 0-1 range. It then displays the first 10 images with their corresponding labels using matplotlib for a quick visual check of the dataset.
# Load the training data train_data = pd.read_csv('/content/drive/MyDrive/Badhon/Sign language recognition/sign_mnist_train.csv') # Extract labels and image data labels = train_data['label'].values images = train_data.iloc[:, 1:].values # Reshape and normalize the images images = images.reshape(-1, 28, 28).astype('float32') / 255.0 # Select the first 10 images to display num_images_to_display = 10 selected_indices = np.arange(num_images_to_display) plt.figure(figsize=(12, 6)) for i, idx in enumerate(selected_indices): plt.subplot(2, 5, i + 1) # 2 rows, 5 columns plt.imshow(images[idx], cmap='gray') plt.title(f'Label: {labels[idx]}') plt.axis('off') plt.tight_layout() plt.show()
STEP 2:
Data preprocessing
Prepares the data for training by reshaping the image pixel values into 28x28 grayscale images with a single channel and normalizing the pixel values between 0 and 1. It separates the features (X_train, X_test) from the labels (y_train, y_test) for both training and testing datasets, making the data ready for input into a neural network model.
# Preprocess data X_train = train_data.iloc[:, 1:].values.reshape(-1, 28, 28, 1) / 255.0 y_train = train_data.iloc[:, 0].values X_test = test_data.iloc[:, 1:].values.reshape(-1, 28, 28, 1) / 255.0 y_test = test_data.iloc[:, 0].values
Label Remapping for Consistent Classification
Now, remap the original labels, which range from 0 to 8 and 10 to 24 (skipping 9), into a continuous range from 0 to 23. It creates a dictionary to map old labels to new ones and applies this mapping to both training and testing labels. The process ensures the labels are sequential and suitable for classification tasks without gaps.
# Remap labels to 0-23 unique_labels = sorted(np.unique(y_train)) # [0-8, 10-24] label_mapping = {old: new for new, old in enumerate(unique_labels)} y_train_mapped = np.array([label_mapping[label] for label in y_train]) y_test_mapped = np.array([label_mapping[label] for label in y_test])
Train-Validation Split
This code divides the training data into two parts: training and validation sets. It uses 80% of the data for training and 20% for validation to help monitor the model’s performance during training. The random_state=42 ensures the split is reproducible.
# Split into train and validation sets X_train, X_val, y_train_mapped, y_val_mapped = train_test_split( X_train, y_train_mapped, test_size=0.2, random_state=42 )
Data Augmentation Setup
This code sets up data augmentation to artificially expand the training dataset by applying random transformations like small rotations, shifts, and zooms. These changes help the model generalize better and reduce overfitting during training.
from tensorflow.keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator( rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.1 )
STEP 3:
Simple CNN Model Training Function
This function train_model1() builds and trains a basic convolutional neural network (CNN) for sign language classification. It includes two Conv2D layers with ReLU activation and max-pooling, followed by a flattening layer and two dense layers. The output layer uses softmax for multi-class classification with 24 classes. The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss. Training uses augmented data and runs for 25 epochs with validation on a separate set. The function returns the trained model and its training history.
def train_model1(): model = Sequential([ Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), MaxPooling2D((2, 2)), Conv2D(64, (3, 3), activation='relu'), MaxPooling2D((2, 2)), Flatten(), Dense(128, activation='relu'), Dense(24, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) history = model.fit( datagen.flow(X_train, y_train_mapped, batch_size=32), epochs=25, validation_data=(X_val, y_val_mapped) ) return model, history
Model 1 Training Initialization
Prints a message indicating the start of Model 1 training, then calls the train_model1() function. The function builds and trains a basic CNN on the sign language dataset and returns the trained model (model1) along with its training history (history1) for performance analysis.
print("Training Model 1...") model1, history1 = train_model1()
CNN Model 2
Improved CNN Model with Dropout and Callbacks
The function train_model2() defines an enhanced CNN model for sign language recognition. It adds a Dropout layer after the dense layer to help prevent overfitting. The data augmentation parameters remain the same for rotation, shifting, and zooming. This model uses two callbacks during training:
EarlyStopping: Stops training if validation accuracy doesn’t improve for 5 epochs and restores the best model weights.
ReduceLROnPlateau: Reduces learning rate by half if validation loss plateaus for 3 epochs, helping the model converge better.
def train_model2(): datagen = ImageDataGenerator( rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.1 ) model = Sequential([ Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), MaxPooling2D((2, 2)), Conv2D(64, (3, 3), activation='relu'), MaxPooling2D((2, 2)), Flatten(), Dense(128, activation='relu'), Dropout(0.5), Dense(24, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) early_stopping = EarlyStopping(monitor='val_accuracy', patience=5, restore_best_weights=True) reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.00001) history = model.fit( datagen.flow(X_train, y_train_mapped, batch_size=32), epochs=25, validation_data=(X_val, y_val_mapped), callbacks=[early_stopping, reduce_lr] ) return model, history
Model 2 Training Initialization
It displays a message that Model 2 training is starting and then runs the train_model2() function, which builds an improved CNN with dropout and callbacks. It returns the trained model (model2) and its training history (history2) for further evaluation.
print("Training Model 2...") model2, history2 = train_model2()
CNN Model 3:
CNN with Batch Normalization
This function defines and trains a more advanced Convolutional Neural Network (CNN) with batch normalization layers after each convolution. Batch normalization helps stabilize and speed up training by normalizing layer inputs. The model uses three convolutional blocks followed by a dense output layer for classifying 24 sign language letters. After compiling the model with the Adam optimizer and sparse categorical crossentropy loss, it is trained using augmented image data for 25 epochs. The function returns the trained model and its training history.
def train_model3(): model = Sequential([ Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), BatchNormalization(), MaxPooling2D((2, 2)), Conv2D(64, (3, 3), activation='relu'), BatchNormalization(), MaxPooling2D((2, 2)), Conv2D(128, (3, 3), activation='relu'), BatchNormalization(), Flatten(), Dense(256, activation='relu'), Dense(24, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) history = model.fit( datagen.flow(X_train, y_train_mapped, batch_size=32), epochs=25, validation_data=(X_val, y_val_mapped) ) return model, history
Model 3 Training Initialization
This line prints a message indicating that Model 3 training has started. It then calls the train_model3() function, which builds and trains a CNN model with batch normalization layers. The result includes the trained model (model3) and its training history (history3) for performance evaluation and visualization.
print("Training Model 3...") model3, history3 = train_model3()
STEP 4:
Evaluate models on test set
Evaluating Model 1
First, evaluate Model 1 on the test dataset. It calculates the test loss and accuracy using the evaluate() function. Then, it predicts the class labels on the test images and prints the loss and accuracy. The predictions are generated using predict() and converted to class labels using argmax().
test_loss1, test_acc1 = model1.evaluate(X_test, y_test_mapped) y_pred1 = np.argmax(model1.predict(X_test), axis=1) print(f"Model 1 - Test Loss: {test_loss1:.4f}, Test Accuracy: {test_acc1:.4f}")
Evaluating Model 2
We test Model 2 on the test dataset. It calculates and prints the test loss and accuracy using evaluate(). The predictions are made using predict(), and argmax() converts the predicted probabilities to class labels.
test_loss2, test_acc2 = model2.evaluate(X_test, y_test_mapped) y_pred2 = np.argmax(model2.predict(X_test), axis=1) print(f"Model 2 - Test Loss: {test_loss2:.4f}, Test Accuracy: {test_acc2:.4f}")
Evaluating Model 3
Evaluates Model 3 using the test dataset. It calculates the test loss and accuracy with evaluate(), then uses predict() followed by argmax() to get the predicted class labels. Finally, it prints the test results. (Note: The print statement should say "Model 3" instead of "Model 2.")
test_loss3, test_acc3 = model3.evaluate(X_test, y_test_mapped) y_pred3 = np.argmax(model3.predict(X_test), axis=1) print(f"Model 2 - Test Loss: {test_loss3:.4f}, Test Accuracy: {test_acc3:.4f}")
Plotting Actual vs. Predicted Sign Letters
We defined a mapping from numeric labels to letters for the sign language classes. The function plot_predictions randomly selects 10 test images and displays them with their true and predicted labels. Correct predictions are shown in green, while incorrect ones are in red. This layout helps visually assess the model’s performance on sample test data.
# Define label-to-letter mapping for visualization letters = 'ABCDEFGHIKLMNOPQRSTUVWXY' # 24 classes label_to_letter = {i: letters[i] for i in range(24)} # Function to plot actual vs. predicted images def plot_predictions(model, y_pred, title): num_images = 10 indices = np.random.choice(len(X_test), num_images, replace=False) fig, axes = plt.subplots(2, 5, figsize=(15, 6)) axes = axes.ravel() for i, idx in enumerate(indices): img = X_test[idx].reshape(28, 28) true_letter = label_to_letter[y_test_mapped[idx]] pred_letter = label_to_letter[y_pred[idx]] axes[i].imshow(img, cmap='gray') axes[i].set_title(f'True: {true_letter}\nPred: {pred_letter}', color='green' if true_letter == pred_letter else 'red') axes[i].axis('off') plt.suptitle(title) plt.show()
Confusion Matrix Calculation
By matching the true test labels against the predicted labels, you can calculate the confusion matrices for the three trained models. The model confirms to the cell diagonal whether it has properly recognized the different classes of sign languages by displaying the right and wrong prediction statistics for each item in the set.
cm1 = confusion_matrix(y_test_mapped, y_pred1) cm2 = confusion_matrix(y_test_mapped, y_pred2) cm3 = confusion_matrix(y_test_mapped, y_pred3)
Visualizing Confusion Matrices
Utilizes the Seaborn library for the purpose of making confusion matrix heatmaps of each of the models simultaneously. The heatmaps help the user to see from the graphics which categories are best and worst performing in the context of different sign languages by showing the number of right and wrong predictions of the classes.
import seaborn as sns fig, axes = plt.subplots(1, 3, figsize=(20, 6)) sns.heatmap(cm1, annot=True, fmt='d', ax=axes[0], cmap='Blues') axes[0].set_title('Model 1 Confusion Matrix') sns.heatmap(cm2, annot=True, fmt='d', ax=axes[1], cmap='Blues') axes[1].set_title('Model 2 Confusion Matrix') sns.heatmap(cm3, annot=True, fmt='d', ax=axes[2], cmap='Blues') axes[2].set_title('Model 3 Confusion Matrix') plt.show()
Model Prediction Visualization
Display sample test images with their true and predicted labels for each of the three models. It helps visually assess how well each model is recognizing the sign language letters by showing correct predictions in green and incorrect ones in red.
plot_predictions(model1, y_pred1, 'Model 1 Predictions') plot_predictions(model2, y_pred2, 'Model 2 Predictions') plot_predictions(model3, y_pred3, 'Model 3 Predictions')
STEP 5:
Transfer Learning with ResNet50
Now, build a transfer learning model using ResNet50 pretrained on ImageNet. It freezes the base layers to keep learned features and adds custom dense layers for classifying 24 sign language gestures. The input images are resized from 28x28 grayscale to 32x32 RGB format by repeating channels and resizing. The model is compiled with the Adam optimizer and trained for 25 epochs using the processed training and validation data. This approach leverages powerful pretrained features to improve accuracy on the sign language dataset.
def build_resnet_model(input_shape, num_classes): base_model = ResNet50(weights='imagenet', include_top=False, input_shape=input_shape) # Freeze the base model layers for layer in base_model.layers: layer.trainable = False model = Sequential([ base_model, GlobalAveragePooling2D(), Dense(256, activation='relu'), Dense(num_classes, activation='softmax') ]) return model input_shape = (32, 32, 3) num_classes = 24 resnet_model = build_resnet_model(input_shape, num_classes) resnet_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) resnet_model.summary() X_train_resized = tf.image.resize(np.repeat(X_train, 3, axis=-1), (32, 32)).numpy() X_val_resized = tf.image.resize(np.repeat(X_val, 3, axis=-1), (32, 32)).numpy() X_test_resized = tf.image.resize(np.repeat(X_test, 3, axis=-1), (32, 32)).numpy() #Train the model history_resnet = resnet_model.fit( X_train_resized, y_train_mapped, epochs=25, # Adjust the number of epochs as needed batch_size=32, validation_data=(X_val_resized, y_val_mapped) )
ResNet50 Model Evaluation on Test Data
Evaluates the trained ResNet50 model on the resized test dataset and prints the test loss and accuracy. It also predicts the test labels for further analysis.
test_loss4, test_acc4 = resnet_model.evaluate(X_test_resized, y_test_mapped) # Use X_test_resized y_pred4 = np.argmax(resnet_model.predict(X_test_resized), axis=1) # Use X_test_resized print(f"Resnet Model - Test Loss: {test_loss4:.4f}, Test Accuracy: {test_acc4:.4f}")
ResNet50 Model Accuracy and Loss Visualization
This code plots the training and validation accuracy and loss curves of the ResNet50 model over epochs, helping visualize model performance and detect overfitting or underfitting trends.
plt.figure(figsize=(10, 5)) plt.plot(history_resnet.history['accuracy']) plt.plot(history_resnet.history['val_accuracy']) plt.title('Model accuracy') plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.legend(['Train', 'Validation'], loc='upper left') plt.show() plt.figure(figsize=(10, 5)) plt.plot(history_resnet.history['loss']) plt.plot(history_resnet.history['val_loss']) plt.title('Model loss') plt.ylabel('Loss') plt.xlabel('Epoch') plt.legend(['Train', 'Validation'], loc='upper left') plt.show()
ResNet50 Confusion Matrix Visualization
Generates a heatmap of the confusion matrix for the ResNet50 model, clearly showing how well each sign language class was predicted compared to the actual labels.
cm4 = confusion_matrix(y_test_mapped, y_pred4) plt.figure(figsize=(8, 6)) sns.heatmap(cm4, annot=True, fmt='d', cmap='Blues') plt.title('ResNet50 Confusion Matrix') plt.xlabel('Predicted Labels') plt.ylabel('True Labels') plt.show()
ResNet50 Actual vs Predicted Sign Letters
This function visually compares true and predicted sign language letters using the ResNet50 model. It displays 10 random test images, showing predictions in green if correct and red if incorrect, providing a clear snapshot of model performance.
def plot_predictions(model, X_test, y_pred, y_true, title): num_images = 10 indices = np.random.choice(len(X_test), num_images, replace=False) fig, axes = plt.subplots(2, 5, figsize=(15, 6)) axes = axes.ravel() for i, idx in enumerate(indices): img = X_test[idx].reshape(32, 32, 3) #Reshape to 32x32x3 true_letter = label_to_letter[y_true[idx]] pred_letter = label_to_letter[y_pred[idx]] axes[i].imshow(img) axes[i].set_title(f'True: {true_letter}\nPred: {pred_letter}', color='green' if true_letter == pred_letter else 'red') axes[i].axis('off') plt.suptitle(title) plt.show() plot_predictions(resnet_model, X_test_resized, y_pred4, y_test_mapped, 'ResNet50 Predictions')
Accuracy Comparison of CNN and ResNet50 Models
This bar chart visualizes the test accuracy of four different models—Model 1, Model 2, Model 3, and ResNet50—enabling a quick comparison of their performance on the sign language recognition task.
model_names = ['Model 1', 'Model 2', 'Model 3', 'ResNet50'] accuracies = [test_acc1, test_acc2, test_acc3, test_acc4] plt.figure(figsize=(10, 6)) plt.bar(model_names, accuracies, color=['skyblue', 'lightcoral', 'lightgreen', 'lightsalmon']) plt.xlabel("Models") plt.ylabel("Accuracy") plt.title("Accuracy Comparison of Different Models") plt.ylim(0, 1) # Set y-axis limit to 0-1 for accuracy plt.show()
Conclusion
Successfully demonstrated how deep learning techniques, especially CNNs, can be used to recognize American Sign Language from images. We trained multiple models and compared their performance. The ResNet50 model gave the best accuracy because of its depth and pre-learned features.
This project helps in understanding image classification and shows a real-world application that can be expanded into a translator tool for sign language. If it is improved more to allow for effective video processing in real time, it could become a useful tool for communication.
Challenges New Coders Might Face
- Handling CSV Image Data: Understanding how to convert flat pixel values into images can be tricky for beginners.
- Input Shape Errors: Incorrect reshaping or dimension mismatches are common when feeding data into CNNs or transfer learning models.
- Overfitting: If models are not well regularized, they may seem to perform well on training data but perform poorly on data that they have never seen.
- Using Callbacks Properly: Knowing when to stop training and when to lower the learning rate requires practice and intuition.
- Transfer Learning Setup: Freezing layers, choosing the right input shape, and adding new layers on top of pre-trained models can be confusing at first.
- Plotting Confusion Matrix: Visualizing and interpreting confusion matrices may seem difficult without clear guidance.
Frequently Asked Questions (FAQs)
Q1: Why are only 24 letters used instead of 26?
A: Letters J and Z require motion, which cannot be captured in a still image, so they are excluded from the dataset.
Q2: Why do we normalize pixel values?
A: Normalization helps the model train faster and more efficiently by keeping the input values between 0 and 1.
Q3: What is the purpose of dropout layers?
A: Dropout helps prevent overfitting by randomly turning off neurons during training, forcing the model to learn more general patterns.
Q4: Why use a pre-trained model like ResNet50?
A: Pre-trained models have already learned useful image features, which helps improve performance even with small datasets.
Q5: Can this model be used in real-time with a webcam?
A: Yes, with further development. You'll need to capture real-time images, preprocess them, and feed them into the trained model for prediction.