Sign language recognition

This project detects and classifies American Sign Language (ASL) alphabets using deep learning. It uses CNN models and ResNet50 to recognize 24 hand gestures from grayscale images, helping convert sign language into readable text.

Project Overview

The goal of this project is to build a computer vision system that can recognize and classify hand signs from the American Sign Language (ASL) alphabet using image data. The system is designed to take grayscale images of hand gestures and predict the corresponding letter. This project is especially important in bridging the communication gap between the deaf/mute community and others who do not understand sign language.

We use a popular dataset called Sign Language MNIST, which includes thousands of labeled hand sign images. Using this data, we train various convolutional neural networks (CNNs) from scratch and also apply transfer learning using a pre-trained deep learning model (ResNet50) to improve accuracy.

This kind of system has the potential to be developed further into real-time applications, such as translating sign language into text or speech, which can be used in schools, hospitals, and public services.

Prerequisites

Before you dive into this project, it's important to have some foundational knowledge and tools ready:

Programming Skills:
- Basic understanding of Python syntax and functions.
- Some familiarity with data structures like lists, dictionaries, and arrays.
Mathematics and Machine Learning:
- Understanding of how neural networks work.
- Basics of model training, such as epochs, loss functions, accuracy, and overfitting.
Image Processing:
- Knowing how image data is represented in arrays.
- Understanding grayscale vs. color images and how images are preprocessed.
Deep Learning Libraries:
- Some experience with TensorFlow or Keras (for building and training models).
- Using libraries like matplotlib, pandas, numpy for data handling and visualization.
Development Environment:
- Familiarity with Google Colab or Jupyter Notebook.
- Ability to install and use Python packages.

Approach

We followed a step-by-step and structured approach for the project:

Dataset Understanding:
- We used the Sign Language MNIST dataset, which contains labeled grayscale images of hand signs for 24 letters (excluding J and Z).
Data Preprocessing:
- The raw data was in a CSV format, with pixel values for each image.
- We reshaped and normalized the data to make it suitable for CNN models.
Model Building:
- We started with a basic CNN model.
- Then we built improved versions by adding Dropout, Batch Normalization, and other techniques.
- We used Transfer Learning by importing the ResNet50 model and fine-tuning it for our dataset.
Evaluation and Comparison:
- We evaluated each model using metrics like accuracy and loss.
- Confusion matrices were used to analyze which signs were commonly misclassified.
- Visualizations of training curves and prediction samples helped compare performance.
Final Output:
- The final model was able to recognize hand signs with high accuracy, demonstrating the power of deep learning for image classification tasks.

Workflow and Methodologies

The project followed a structured step-by-step process from data handling to model evaluation:

1. Data Loading and Exploration

Loaded the training and test datasets using Pandas.
Extracted image pixel values and labels from CSV files.
Reshaped image data into 28x28 grayscale image arrays.
Visualized sample images using Matplotlib to understand data distribution.

2. Data Preprocessing

Normalized pixel values to a 0–1 range by dividing by 255.
Converted labels to one-hot encoded format for classification.
Split the training data into training and validation sets to monitor performance.

3. Data Augmentation

Applied transformations like rotation, zoom, shift, and horizontal flip using ImageDataGenerator.
Augmentation helped increase dataset variety and reduce overfitting.

4. Model Training

Trained three CNN models with increasing complexity:
- Model 1: Basic CNN using Conv2D and MaxPooling.
- Model 2: Added Dropout and training callbacks like EarlyStopping and ReduceLROnPlateau.
- Model 3: Introduced BatchNormalization for improved training stability.

5. Transfer Learning (Model 4)

Implemented transfer learning using ResNet50 pre-trained on ImageNet.
Replaced the top layers with custom layers for 26-class classification.
Froze base layers to retain learned features, fine-tuned only the top layers.

6. Model Evaluation

Visualized training and validation accuracy/loss over epochs.
Created confusion matrices to evaluate per-class predictions.
Displayed actual vs. predicted images to assess real-world performance.

Data Collection and Preparation

Data Collection

The project used the Sign Language MNIST dataset, which is publicly available on Kaggle. The dataset consists of:

Training set: 27,455 grayscale images
Test set: 7,172 grayscale images
Image size: 28x28 pixels
Classes: 0 to 25 (excluding letters J and Z due to their dynamic gestures)

Data Preparation Workflow

Loaded data from CSV files using Pandas.
Separated image pixel values and labels.
Reshaped the flat pixel arrays into 28x28 image matrices.
Normalized pixel values to improve training consistency.
Split the training set into training and validation subsets.
Augmented data to enhance generalization and robustness.

Code Explanation

STEP 1:

Mounting Google Drive in Colab

Connects Google Drive to Colab. After authorization, files can be accessed at /content/drive for reading or writing.

from google.colab import drive
drive.mount('/content/drive')

Installing Required Python Packages

Installs Kaggle API, TensorFlow for deep learning, and Keras Tuner for hyperparameter tuning.

!pip install kaggle
!pip install tensorflow
!pip install keras-tuner

Project Setup and Library Imports

Now, import key libraries needed for building and training deep learning models in Google Colab. It connects Google Drive for easy data access, uses pandas and numpy for data manipulation, and TensorFlow Keras for creating CNN models. Pretrained architectures like ResNet50, MobileNetV2, and DenseNet121 are included for transfer learning. Additional tools like train-test splitting, performance evaluation with confusion matrices, visualization using matplotlib, and hyperparameter tuning with Keras Tuner ensure a smooth and effective model development process.

from google.colab import drive
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.applications import MobileNetV2, ResNet50, DenseNet121
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCV
from keras_tuner import HyperModel, RandomSearch

Load Dataset from Google Drive

We load the training and testing data for the sign language recognition project from CSV files stored in Google Drive. The data is read into pandas DataFrames for easy handling and analysis.

test_data = pd.read_csv('/content/drive/MyDrive/Sign language recognition/sign_mnist_test.csv')
train_data = pd.read_csv('/content/drive/MyDrive/Sign language recognition/sign_mnist_train.csv')
train_data.shape
test_data.shape

Loading, Processing, and Visualizing Training Data

Now, load the training dataset from a CSV file, extract the labels and image pixel data, reshape the flat pixel arrays into 28x28 grayscale images, and normalize pixel values to a 0-1 range. It then displays the first 10 images with their corresponding labels using matplotlib for a quick visual check of the dataset.

# Load the training data
train_data = pd.read_csv('/content/drive/MyDrive/Sign language recognition/sign_mnist_train.csv')
# Extract labels and image data
labels = train_data['label'].values
images = train_data.iloc[:, 1:].values
# Reshape and normalize the images
images = images.reshape(-1, 28, 28).astype('float32') / 255.0
# Select the first 10 images to display
num_images_to_display = 10
selected_indices = np.arange(num_images_to_display)
plt.figure(figsize=(12, 6))
for i, idx in enumerate(selected_indices):
plt.subplot(2, 5, i + 1)  # 2 rows, 5 columns
plt.imshow(images[idx], cmap='gray')
plt.title(f'Label: {labels[idx]}')
plt.axis('off')
plt.tight_layout()
plt.show()

STEP 2:

Data preprocessing

Prepares the data for training by reshaping the image pixel values into 28x28 grayscale images with a single channel and normalizing the pixel values between 0 and 1. It separates the features (X_train, X_test) from the labels (y_train, y_test) for both training and testing datasets, making the data ready for input into a neural network model.

# Preprocess data
X_train = train_data.iloc[:, 1:].values.reshape(-1, 28, 28, 1) / 255.0
y_train = train_data.iloc[:, 0].values
X_test = test_data.iloc[:, 1:].values.reshape(-1, 28, 28, 1) / 255.0
y_test = test_data.iloc[:, 0].values

Label Remapping for Consistent Classification

Now, remap the original labels, which range from 0 to 8 and 10 to 24 (skipping 9), into a continuous range from 0 to 23. It creates a dictionary to map old labels to new ones and applies this mapping to both training and testing labels. The process ensures the labels are sequential and suitable for classification tasks without gaps.

# Remap labels to 0-23
unique_labels = sorted(np.unique(y_train))  # [0-8, 10-24]
label_mapping = {old: new for new, old in enumerate(unique_labels)}
y_train_mapped = np.array([label_mapping[label] for label in y_train])
y_test_mapped = np.array([label_mapping[label] for label in y_test])

Train-Validation Split

This code divides the training data into two parts: training and validation sets. It uses 80% of the data for training and 20% for validation to help monitor the model’s performance during training. The random_state=42 ensures the split is reproducible.

# Split into train and validation sets
X_train, X_val, y_train_mapped, y_val_mapped = train_test_split(
X_train, y_train_mapped, test_size=0.2, random_state=42
)

Data Augmentation Setup

This code sets up data augmentation to artificially expand the training dataset by applying random transformations like small rotations, shifts, and zooms. These changes help the model generalize better and reduce overfitting during training.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1
)

STEP 3:

Simple CNN Model Training Function

This function train_model1() builds and trains a basic convolutional neural network (CNN) for sign language classification. It includes two Conv2D layers with ReLU activation and max-pooling, followed by a flattening layer and two dense layers. The output layer uses softmax for multi-class classification with 24 classes. The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss. Training uses augmented data and runs for 25 epochs with validation on a separate set. The function returns the trained model and its training history.