Project Overview
This project focuses on deep learning modules to develop a human action recognition system. Thus, actions like sitting, standing, walking or laughing are distinct categories that will serve as inputs for the classification of images. The dataset contains images capturing different human activities tagged with their corresponding categories.
The data preprocessing stage begins with resizing all images to 160 x 160 pixels, normalization of pixel values and contrast enhancement if required. The stages that follow include the conversion of categorical labels to numerical format via LabelEncoder and one hot encoding of the labels to prepare them for the model.
We have the architecture, composed of fine-tuned leads of supremely potent pre-trained deep learning models such as ResNet50 and InceptionV3, to our specific task. During training, early stopping is utilized with the best-selected model based on validation performance. However, during the training process itself, accuracy and loss are monitored.
To evaluate the performance of the trained models, we test them on test data in which we predict labels and compute accuracy. We also produce confusion matrices to visualize class performances. What is achieved in the end is a robust action recognition system that could classify human activity from images accurately.
Prerequisite
- Programming Basics in Python and Data Manipulation Techniques.
- Basic Knowledge of Machine Learning and Deep Learning.
- Basics about Images Preprocessing-Resizing, Normalizing.
- Basics of keras and tensorflow to build deep learning models.
- Experience working with Jupyter Notebooks, if not Google Colab.
- Data Visualization with Matplotlib and Plotly.
- The important evaluation metrics of the model are Accuracy and Confusion Matrix.
- Knows how to work with models like ResNet50 and InceptionV3 that are already trained.
Approach
The approach starts with preprocessing the image data, specifically resizing and normalizing the images thus ensuring uniformity throughout the data. The categorical action labels are first subjected to label encoding which is eventually followed by one hot encoding to make it compatible with the model. For the model, we use powerful pre-trained architectures such as ResNet50 and InceptionV3 that are capable of learning complex features given an image, then fine-tuning them with the training data to improve accuracy for the task at hand which is human action recognition. We also utilize early stopping during training, to avoid the occurrence of overfitting, while progress is monitored using accuracy and loss metrics. The model's efficiency is evaluated against testing data after successful training by making use of accuracy scores and a confusion matrix to visualize how well the model classifies the various human actions.
Workflow and Methodology
Workflow
- Data Collection and Organization: Collect and organize the dataset of images with corresponding action labels.
- Preprocessing of Data: Resize the images to 160 x 160 pixels and normalize pixel values.
- Encoding Labels: Convert the action labels to numbers with LabelEncoder and use a one-hot encoding.
- Model construction: Train deep learning models based on pre-trained architectures, for example, ResNet50 and InceptionV3.
- Model training: Train models on the preprocessed dataset using early stopping to avoid overfitting.
- Model Evaluation: Evaluate model performance using accuracy and confusion matrices.
- Visualization of Results: Visualizing the results through graphs and performance metrics.
Methodology
- Transfer learning has been used with pre-trained models such as ResNet50 and InceptionV3 for the learned features by ImageNet.
- Fine-tune these models with the human action recognition data and obtain increases in accuracy for specific action recognition tasks.
- Process the images in resizing, normalizing and other data preparation steps.
- Convert these class labels into a machine-readable format using LabelEncoder and one-hot encoding.
- Divide the data into training and validation data sets where early stopping occurs during training to avoid overfitting.
- Finally, evaluate the model using performance metrics like accuracy and confusion matrix for complete assessment.
Data collection
Human action dataset is available in Kaggle. It is possible to conveniently and securely access a Kaggle dataset from within Google Colab after configuring your Kaggle credentials to prevent compromising sensitive information. It brings in the user’s data to collect securely the Kaggle API key and username and assigns them as environment variables. This enables the use of Kaggle’s CLI command (!kaggle datasets download -d meetnagadia/human-action-recognition-har-dataset) which authenticates the user and downloads the dataset straight into Colab.
Data preparation workflow
- Load the dataset from source folders or CSV files with image paths and labels.
- Resize all images to a consistent shape (e.g., 160x160 pixels).
- Normalize pixel values to the range [0, 1].
- Convert categorical labels into numeric values using LabelEncoder.
- Apply one-hot encoding to the numeric labels.
- Organize data into batches for model training.