ffdwf

tyty

Project Overview

The goal of this project is to build a computer vision system that can recognize and classify hand signs from the American Sign Language (ASL) alphabet using image data. The system is designed to take grayscale images of hand gestures and predict the corresponding letter. This project is especially important in bridging the communication gap between the deaf/mute community and others who do not understand sign language.

We use a popular dataset called Sign Language MNIST, which includes thousands of labeled hand sign images. Using this data, we train various convolutional neural networks (CNNs) from scratch and also apply transfer learning using a pre-trained deep learning model (ResNet50) to improve accuracy.

This kind of system has the potential to be developed further into real-time applications, such as translating sign language into text or speech, which can be used in schools, hospitals, and public services.

Prerequisites

Before you dive into this project, it's important to have some foundational knowledge and tools ready:

Programming Skills:
- Basic understanding of Python syntax and functions.
- Some familiarity with data structures like lists, dictionaries, and arrays.
Mathematics and Machine Learning:
- Understanding of how neural networks work.
- Basics of model training, such as epochs, loss functions, accuracy, and overfitting.
Image Processing:
- Knowing how image data is represented in arrays.
- Understanding grayscale vs. color images and how images are preprocessed.
Deep Learning Libraries:
- Some experience with TensorFlow or Keras (for building and training models).
- Using libraries like matplotlib, pandas, numpy for data handling and visualization.
Development Environment:
- Familiarity with Google Colab or Jupyter Notebook.
- Ability to install and use Python packages.

Approach

We followed a step-by-step and structured approach for the project:

Dataset Understanding:
- We used the Sign Language MNIST dataset, which contains labeled grayscale images of hand signs for 24 letters (excluding J and Z).
Data Preprocessing:
- The raw data was in a CSV format, with pixel values for each image.
- We reshaped and normalized the data to make it suitable for CNN models.
Model Building:
- We started with a basic CNN model.
- Then we built improved versions by adding Dropout, Batch Normalization, and other techniques.
- We used Transfer Learning by importing the ResNet50 model and fine-tuning it for our dataset.
Evaluation and Comparison:
- We evaluated each model using metrics like accuracy and loss.
- Confusion matrices were used to analyze which signs were commonly misclassified.
- Visualizations of training curves and prediction samples helped compare performance.
Final Output:
- The final model was able to recognize hand signs with high accuracy, demonstrating the power of deep learning for image classification tasks.

Workflow and Methodologies

The project followed a structured step-by-step process from data handling to model evaluation:

1. Data Loading and Exploration

Loaded the training and test datasets using Pandas.
Extracted image pixel values and labels from CSV files.
Reshaped image data into 28x28 grayscale image arrays.
Visualized sample images using Matplotlib to understand data distribution.

2. Data Preprocessing

Normalized pixel values to a 0–1 range by dividing by 255.
Converted labels to one-hot encoded format for classification.
Split the training data into training and validation sets to monitor performance.

3. Data Augmentation

Applied transformations like rotation, zoom, shift, and horizontal flip using ImageDataGenerator.
Augmentation helped increase dataset variety and reduce overfitting.

4. Model Training

Trained three CNN models with increasing complexity:
- Model 1: Basic CNN using Conv2D and MaxPooling.
- Model 2: Added Dropout and training callbacks like EarlyStopping and ReduceLROnPlateau.
- Model 3: Introduced BatchNormalization for improved training stability.

5. Transfer Learning (Model 4)

Implemented transfer learning using ResNet50 pre-trained on ImageNet.
Replaced the top layers with custom layers for 26-class classification.
Froze base layers to retain learned features, fine-tuned only the top layers.

6. Model Evaluation

Visualized training and validation accuracy/loss over epochs.
Created confusion matrices to evaluate per-class predictions.
Displayed actual vs. predicted images to assess real-world performance.

Data Collection and Preparation

Data Collection

The project used the Sign Language MNIST dataset, which is publicly available on Kaggle. The dataset consists of:

Training set: 27,455 grayscale images
Test set: 7,172 grayscale images
Image size: 28x28 pixels
Classes: 0 to 25 (excluding letters J and Z due to their dynamic gestures)

Data Preparation Workflow

Loaded data from CSV files using Pandas.
Separated image pixel values and labels.
Reshaped the flat pixel arrays into 28x28 image matrices.
Normalized pixel values to improve training consistency.
Split the training set into training and validation subsets.
Augmented data to enhance generalization and robustness.

from google.colab import drive
drive.mount('/content/drive')

Installing Required Python Packages

Installs Kaggle API, TensorFlow for deep learning, and Keras Tuner for hyperparameter tuning.

Project Setup and Library Imports

Now, import key libraries needed for building and training deep learning models in Google Colab. It connects Google Drive for easy data access, uses pandas and numpy for data manipulation, and TensorFlow Keras for creating CNN models. Pretrained architectures like ResNet50, MobileNetV2, and DenseNet121 are included for transfer learning. Additional tools like train-test splitting, performance evaluation with confusion matrices, visualization using matplotlib, and hyperparameter tuning with Keras Tuner ensure a smooth and effective model development process.

from google.colab import drive
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, BatchNormalization
from tensorflow.keras.applications import MobileNetV2, ResNet50, DenseNet121
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCV
from keras_tuner import HyperModel, RandomSearch

Load Dataset from Google Drive

We load the training and testing data for the sign language recognition project from CSV files stored in Google Drive. The data is read into pandas DataFrames for easy handling and analysis.

test_data = pd.read_csv('/content/drive/MyDrive/Badhon/Sign language recognition/sign_mnist_test.csv')
train_data = pd.read_csv('/content/drive/MyDrive/Badhon/Sign language recognition/sign_mnist_train.csv')
train_data.shape
test_data.shape

Loading, Processing, and Visualizing Training Data

Now, load the training dataset from a CSV file, extract the labels and image pixel data, reshape the flat pixel arrays into 28x28 grayscale images, and normalize pixel values to a 0-1 range. It then displays the first 10 images with their corresponding labels using matplotlib for a quick visual check of the dataset.

# Load the training data
train_data = pd.read_csv('/content/drive/MyDrive/Badhon/Sign language recognition/sign_mnist_train.csv')
# Extract labels and image data
labels = train_data['label'].values
images = train_data.iloc[:, 1:].values
# Reshape and normalize the images
images = images.reshape(-1, 28, 28).astype('float32') / 255.0
# Select the first 10 images to display
num_images_to_display = 10
selected_indices = np.arange(num_images_to_display)
plt.figure(figsize=(12, 6))
for i, idx in enumerate(selected_indices):
plt.subplot(2, 5, i + 1)  # 2 rows, 5 columns
plt.imshow(images[idx], cmap='gray')
plt.title(f'Label: {labels[idx]}')
plt.axis('off')
plt.tight_layout()
plt.show()

STEP 2:

Data preprocessing

Prepares the data for training by reshaping the image pixel values into 28x28 grayscale images with a single channel and normalizing the pixel values between 0 and 1. It separates the features (X_train, X_test) from the labels (y_train, y_test) for both training and testing datasets, making the data ready for input into a neural network model.

# Preprocess data
X_train = train_data.iloc[:, 1:].values.reshape(-1, 28, 28, 1) / 255.0
y_train = train_data.iloc[:, 0].values
X_test = test_data.iloc[:, 1:].values.reshape(-1, 28, 28, 1) / 255.0
y_test = test_data.iloc[:, 0].values

Label Remapping for Consistent Classification

Now, remap the original labels, which range from 0 to 8 and 10 to 24 (skipping 9), into a continuous range from 0 to 23. It creates a dictionary to map old labels to new ones and applies this mapping to both training and testing labels. The process ensures the labels are sequential and suitable for classification tasks without gaps.

# Remap labels to 0-23
unique_labels = sorted(np.unique(y_train))  # [0-8, 10-24]
label_mapping = {old: new for new, old in enumerate(unique_labels)}
y_train_mapped = np.array([label_mapping[label] for label in y_train])
y_test_mapped = np.array([label_mapping[label] for label in y_test])

Train-Validation Split

This code divides the training data into two parts: training and validation sets. It uses 80% of the data for training and 20% for validation to help monitor the model’s performance during training. The random_state=42 ensures the split is reproducible.

# Split into train and validation sets
X_train, X_val, y_train_mapped, y_val_mapped = train_test_split(
X_train, y_train_mapped, test_size=0.2, random_state=42
)

Data Augmentation Setup

This code sets up data augmentation to artificially expand the training dataset by applying random transformations like small rotations, shifts, and zooms. These changes help the model generalize better and reduce overfitting during training.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1
)

STEP 3:

Simple CNN Model Training Function

This function train_model1() builds and trains a basic convolutional neural network (CNN) for sign language classification. It includes two Conv2D layers with ReLU activation and max-pooling, followed by a flattening layer and two dense layers. The output layer uses softmax for multi-class classification with 24 classes. The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss. Training uses augmented data and runs for 25 epochs with validation on a separate set. The function returns the trained model and its training history.

def train_model1():
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(24, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(
datagen.flow(X_train, y_train_mapped, batch_size=32),
epochs=25,
validation_data=(X_val, y_val_mapped)
)
return model, history

Model 1 Training Initialization

Prints a message indicating the start of Model 1 training, then calls the train_model1() function. The function builds and trains a basic CNN on the sign language dataset and returns the trained model (model1) along with its training history (history1) for performance analysis.

print("Training Model 1...")
model1, history1 = train_model1()

CNN Model 2

Improved CNN Model with Dropout and Callbacks

The function train_model2() defines an enhanced CNN model for sign language recognition. It adds a Dropout layer after the dense layer to help prevent overfitting. The data augmentation parameters remain the same for rotation, shifting, and zooming. This model uses two callbacks during training:

EarlyStopping: Stops training if validation accuracy doesn’t improve for 5 epochs and restores the best model weights.
ReduceLROnPlateau: Reduces learning rate by half if validation loss plateaus for 3 epochs, helping the model converge better.

def train_model2():
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1
)
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(24, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
early_stopping = EarlyStopping(monitor='val_accuracy', patience=5, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=0.00001)
history = model.fit(
datagen.flow(X_train, y_train_mapped, batch_size=32),
epochs=25,
validation_data=(X_val, y_val_mapped),
callbacks=[early_stopping, reduce_lr]
)
return model, history

Model 2 Training Initialization

It displays a message that Model 2 training is starting and then runs the train_model2() function, which builds an improved CNN with dropout and callbacks. It returns the trained model (model2) and its training history (history2) for further evaluation.

print("Training Model 2...")
model2, history2 = train_model2()

CNN Model 3:

CNN with Batch Normalization

This function defines and trains a more advanced Convolutional Neural Network (CNN) with batch normalization layers after each convolution. Batch normalization helps stabilize and speed up training by normalizing layer inputs. The model uses three convolutional blocks followed by a dense output layer for classifying 24 sign language letters. After compiling the model with the Adam optimizer and sparse categorical crossentropy loss, it is trained using augmented image data for 25 epochs. The function returns the trained model and its training history.

def train_model3():
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
BatchNormalization(),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
BatchNormalization(),
MaxPooling2D((2, 2)),
Conv2D(128, (3, 3), activation='relu'),
BatchNormalization(),
Flatten(),
Dense(256, activation='relu'),
Dense(24, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = model.fit(
datagen.flow(X_train, y_train_mapped, batch_size=32),
epochs=25,
validation_data=(X_val, y_val_mapped)
)
return model, history

Model 3 Training Initialization

This line prints a message indicating that Model 3 training has started. It then calls the train_model3() function, which builds and trains a CNN model with batch normalization layers. The result includes the trained model (model3) and its training history (history3) for performance evaluation and visualization.

print("Training Model 3...")
model3, history3 = train_model3()

STEP 4:

Evaluate models on test set

Evaluating Model 1

First, evaluate Model 1 on the test dataset. It calculates the test loss and accuracy using the evaluate() function. Then, it predicts the class labels on the test images and prints the loss and accuracy. The predictions are generated using predict() and converted to class labels using argmax().

test_loss1, test_acc1 = model1.evaluate(X_test, y_test_mapped)
y_pred1 = np.argmax(model1.predict(X_test), axis=1)
print(f"Model 1 - Test Loss: {test_loss1:.4f}, Test Accuracy: {test_acc1:.4f}")

Evaluating Model 2

We test Model 2 on the test dataset. It calculates and prints the test loss and accuracy using evaluate(). The predictions are made using predict(), and argmax() converts the predicted probabilities to class labels.

test_loss2, test_acc2 = model2.evaluate(X_test, y_test_mapped)
y_pred2 = np.argmax(model2.predict(X_test), axis=1)
print(f"Model 2 - Test Loss: {test_loss2:.4f}, Test Accuracy: {test_acc2:.4f}")

Evaluating Model 3

Evaluates Model 3 using the test dataset. It calculates the test loss and accuracy with evaluate(), then uses predict() followed by argmax() to get the predicted class labels. Finally, it prints the test results. (Note: The print statement should say "Model 3" instead of "Model 2.")

test_loss3, test_acc3 = model3.evaluate(X_test, y_test_mapped)
y_pred3 = np.argmax(model3.predict(X_test), axis=1)
print(f"Model 2 - Test Loss: {test_loss3:.4f}, Test Accuracy: {test_acc3:.4f}")

Plotting Actual vs. Predicted Sign Letters

We defined a mapping from numeric labels to letters for the sign language classes. The function plot_predictions randomly selects 10 test images and displays them with their true and predicted labels. Correct predictions are shown in green, while incorrect ones are in red. This layout helps visually assess the model’s performance on sample test data.

# Define label-to-letter mapping for visualization
letters = 'ABCDEFGHIKLMNOPQRSTUVWXY'  # 24 classes
label_to_letter = {i: letters[i] for i in range(24)}
# Function to plot actual vs. predicted images
def plot_predictions(model, y_pred, title):
num_images = 10
indices = np.random.choice(len(X_test), num_images, replace=False)
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
axes = axes.ravel()
for i, idx in enumerate(indices):
img = X_test[idx].reshape(28, 28)
true_letter = label_to_letter[y_test_mapped[idx]]
pred_letter = label_to_letter[y_pred[idx]]
axes[i].imshow(img, cmap='gray')
axes[i].set_title(f'True: {true_letter}\nPred: {pred_letter}',
color='green' if true_letter == pred_letter else 'red')
axes[i].axis('off')
plt.suptitle(title)
plt.show()

Confusion Matrix Calculation

By matching the true test labels against the predicted labels, you can calculate the confusion matrices for the three trained models. The model confirms to the cell diagonal whether it has properly recognized the different classes of sign languages by displaying the right and wrong prediction statistics for each item in the set.

cm1 = confusion_matrix(y_test_mapped, y_pred1)
cm2 = confusion_matrix(y_test_mapped, y_pred2)
cm3 = confusion_matrix(y_test_mapped, y_pred3)

Visualizing Confusion Matrices

Utilizes the Seaborn library for the purpose of making confusion matrix heatmaps of each of the models simultaneously. The heatmaps help the user to see from the graphics which categories are best and worst performing in the context of different sign languages by showing the number of right and wrong predictions of the classes.

import seaborn as sns
fig, axes = plt.subplots(1, 3, figsize=(20, 6))
sns.heatmap(cm1, annot=True, fmt='d', ax=axes[0], cmap='Blues')
axes[0].set_title('Model 1 Confusion Matrix')
sns.heatmap(cm2, annot=True, fmt='d', ax=axes[1], cmap='Blues')
axes[1].set_title('Model 2 Confusion Matrix')
sns.heatmap(cm3, annot=True, fmt='d', ax=axes[2], cmap='Blues')
axes[2].set_title('Model 3 Confusion Matrix')
plt.show()

Model Prediction Visualization

Display sample test images with their true and predicted labels for each of the three models. It helps visually assess how well each model is recognizing the sign language letters by showing correct predictions in green and incorrect ones in red.

plot_predictions(model1, y_pred1, 'Model 1 Predictions')
plot_predictions(model2, y_pred2, 'Model 2 Predictions')
plot_predictions(model3, y_pred3, 'Model 3 Predictions')

STEP 5:

Transfer Learning with ResNet50

Now, build a transfer learning model using ResNet50 pretrained on ImageNet. It freezes the base layers to keep learned features and adds custom dense layers for classifying 24 sign language gestures. The input images are resized from 28x28 grayscale to 32x32 RGB format by repeating channels and resizing. The model is compiled with the Adam optimizer and trained for 25 epochs using the processed training and validation data. This approach leverages powerful pretrained features to improve accuracy on the sign language dataset.

def build_resnet_model(input_shape, num_classes):
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=input_shape)
# Freeze the base model layers
for layer in base_model.layers:
layer.trainable = False
model = Sequential([
base_model,
GlobalAveragePooling2D(),
Dense(256, activation='relu'),
Dense(num_classes, activation='softmax')
])
return model
input_shape = (32, 32, 3)
num_classes = 24
resnet_model = build_resnet_model(input_shape, num_classes)
resnet_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
resnet_model.summary()
X_train_resized = tf.image.resize(np.repeat(X_train, 3, axis=-1), (32, 32)).numpy()
X_val_resized = tf.image.resize(np.repeat(X_val, 3, axis=-1), (32, 32)).numpy()
X_test_resized = tf.image.resize(np.repeat(X_test, 3, axis=-1), (32, 32)).numpy()
#Train the model
history_resnet = resnet_model.fit(
X_train_resized, y_train_mapped,
epochs=25,  # Adjust the number of epochs as needed
batch_size=32,
validation_data=(X_val_resized, y_val_mapped)
)

ResNet50 Model Evaluation on Test Data

Evaluates the trained ResNet50 model on the resized test dataset and prints the test loss and accuracy. It also predicts the test labels for further analysis.

test_loss4, test_acc4 = resnet_model.evaluate(X_test_resized, y_test_mapped) # Use X_test_resized
y_pred4 = np.argmax(resnet_model.predict(X_test_resized), axis=1) # Use X_test_resized
print(f"Resnet Model  - Test Loss: {test_loss4:.4f}, Test Accuracy: {test_acc4:.4f}")

ResNet50 Model Accuracy and Loss Visualization

This code plots the training and validation accuracy and loss curves of the ResNet50 model over epochs, helping visualize model performance and detect overfitting or underfitting trends.

plt.figure(figsize=(10, 5))
plt.plot(history_resnet.history['accuracy'])
plt.plot(history_resnet.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
plt.figure(figsize=(10, 5))
plt.plot(history_resnet.history['loss'])
plt.plot(history_resnet.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

ResNet50 Confusion Matrix Visualization

Generates a heatmap of the confusion matrix for the ResNet50 model, clearly showing how well each sign language class was predicted compared to the actual labels.

cm4 = confusion_matrix(y_test_mapped, y_pred4)
plt.figure(figsize=(8, 6))
sns.heatmap(cm4, annot=True, fmt='d', cmap='Blues')
plt.title('ResNet50 Confusion Matrix')
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.show()

ResNet50 Actual vs Predicted Sign Letters

This function visually compares true and predicted sign language letters using the ResNet50 model. It displays 10 random test images, showing predictions in green if correct and red if incorrect, providing a clear snapshot of model performance.

def plot_predictions(model, X_test, y_pred, y_true, title):
num_images = 10
indices = np.random.choice(len(X_test), num_images, replace=False)
fig, axes = plt.subplots(2, 5, figsize=(15, 6))
axes = axes.ravel()
for i, idx in enumerate(indices):
img = X_test[idx].reshape(32, 32, 3) #Reshape to 32x32x3
true_letter = label_to_letter[y_true[idx]]
pred_letter = label_to_letter[y_pred[idx]]
axes[i].imshow(img)
axes[i].set_title(f'True: {true_letter}\nPred: {pred_letter}',
color='green' if true_letter == pred_letter else 'red')
axes[i].axis('off')
plt.suptitle(title)
plt.show()
plot_predictions(resnet_model, X_test_resized, y_pred4, y_test_mapped, 'ResNet50 Predictions')

Accuracy Comparison of CNN and ResNet50 Models

This bar chart visualizes the test accuracy of four different models—Model 1, Model 2, Model 3, and ResNet50—enabling a quick comparison of their performance on the sign language recognition task.

model_names = ['Model 1', 'Model 2', 'Model 3', 'ResNet50']
accuracies = [test_acc1, test_acc2, test_acc3, test_acc4]
plt.figure(figsize=(10, 6))
plt.bar(model_names, accuracies, color=['skyblue', 'lightcoral', 'lightgreen', 'lightsalmon'])
plt.xlabel("Models")
plt.ylabel("Accuracy")
plt.title("Accuracy Comparison of Different Models")
plt.ylim(0, 1)  # Set y-axis limit to 0-1 for accuracy
plt.show()

Conclusion

Successfully demonstrated how deep learning techniques, especially CNNs, can be used to recognize American Sign Language from images. We trained multiple models and compared their performance. The ResNet50 model gave the best accuracy because of its depth and pre-learned features.

This project helps in understanding image classification and shows a real-world application that can be expanded into a translator tool for sign language. If it is improved more to allow for effective video processing in real time, it could become a useful tool for communication.

Challenges New Coders Might Face

Handling CSV Image Data: Understanding how to convert flat pixel values into images can be tricky for beginners.
Input Shape Errors: Incorrect reshaping or dimension mismatches are common when feeding data into CNNs or transfer learning models.
Overfitting: If models are not well regularized, they may seem to perform well on training data but perform poorly on data that they have never seen.
Using Callbacks Properly: Knowing when to stop training and when to lower the learning rate requires practice and intuition.
Transfer Learning Setup: Freezing layers, choosing the right input shape, and adding new layers on top of pre-trained models can be confusing at first.
Plotting Confusion Matrix: Visualizing and interpreting confusion matrices may seem difficult without clear guidance.

Frequently Asked Questions (FAQs)

Q1: Why are only 24 letters used instead of 26?
A: Letters J and Z require motion, which cannot be captured in a still image, so they are excluded from the dataset.

Q2: Why do we normalize pixel values?
A: Normalization helps the model train faster and more efficiently by keeping the input values between 0 and 1.

Q3: What is the purpose of dropout layers?
A: Dropout helps prevent overfitting by randomly turning off neurons during training, forcing the model to learn more general patterns.

Q4: Why use a pre-trained model like ResNet50?
A: Pre-trained models have already learned useful image features, which helps improve performance even with small datasets.

Q5: Can this model be used in real-time with a webcam?
A: Yes, with further development. You'll need to capture real-time images, preprocess them, and feed them into the trained model for prediction.