Human Action Recognition Using Image Preprocessing

This project deals with human action recognition from images through deep learning models. We use datasets of annotated images that show various human interactions, such as sitting, standing, laughing and etc. The main objective is to classify these images into predefined action classes. Several state-of-the-art models, like ResNet50 and InceptionV3, are used to predict highly accurate results.

Project Overview

This project focuses on deep learning modules to develop a human action recognition system. Thus, actions like sitting, standing, walking or laughing are distinct categories that will serve as inputs for the classification of images. The dataset contains images capturing different human activities tagged with their corresponding categories.

The data preprocessing stage begins with resizing all images to 160 x 160 pixels, normalization of pixel values and contrast enhancement if required. The stages that follow include the conversion of categorical labels to numerical format via LabelEncoder and one hot encoding of the labels to prepare them for the model.

We have the architecture, composed of fine-tuned leads of supremely potent pre-trained deep learning models such as ResNet50 and InceptionV3, to our specific task. During training, early stopping is utilized with the best-selected model based on validation performance. However, during the training process itself, accuracy and loss are monitored.

To evaluate the performance of the trained models, we test them on test data in which we predict labels and compute accuracy. We also produce confusion matrices to visualize class performances. What is achieved in the end is a robust action recognition system that could classify human activity from images accurately.

Prerequisite

Programming Basics in Python and Data Manipulation Techniques.
Basic Knowledge of Machine Learning and Deep Learning.
Basics about Images Preprocessing-Resizing, Normalizing.
Basics of keras and tensorflow to build deep learning models.
Experience working with Jupyter Notebooks, if not Google Colab.
Data Visualization with Matplotlib and Plotly.
The important evaluation metrics of the model are Accuracy and Confusion Matrix.
Knows how to work with models like ResNet50 and InceptionV3 that are already trained.

Approach

The approach starts with preprocessing the image data, specifically resizing and normalizing the images thus ensuring uniformity throughout the data. The categorical action labels are first subjected to label encoding which is eventually followed by one hot encoding to make it compatible with the model. For the model, we use powerful pre-trained architectures such as ResNet50 and InceptionV3 that are capable of learning complex features given an image, then fine-tuning them with the training data to improve accuracy for the task at hand which is human action recognition. We also utilize early stopping during training, to avoid the occurrence of overfitting, while progress is monitored using accuracy and loss metrics. The model's efficiency is evaluated against testing data after successful training by making use of accuracy scores and a confusion matrix to visualize how well the model classifies the various human actions.

Workflow and Methodology

Workflow

Data Collection and Organization: Collect and organize the dataset of images with corresponding action labels.
Preprocessing of Data: Resize the images to 160 x 160 pixels and normalize pixel values.
Encoding Labels: Convert the action labels to numbers with LabelEncoder and use a one-hot encoding.
Model construction: Train deep learning models based on pre-trained architectures, for example, ResNet50 and InceptionV3.
Model training: Train models on the preprocessed dataset using early stopping to avoid overfitting.
Model Evaluation: Evaluate model performance using accuracy and confusion matrices.
Visualization of Results: Visualizing the results through graphs and performance metrics.

Methodology

Transfer learning has been used with pre-trained models such as ResNet50 and InceptionV3 for the learned features by ImageNet.
Fine-tune these models with the human action recognition data and obtain increases in accuracy for specific action recognition tasks.
Process the images in resizing, normalizing and other data preparation steps.
Convert these class labels into a machine-readable format using LabelEncoder and one-hot encoding.
Divide the data into training and validation data sets where early stopping occurs during training to avoid overfitting.
Finally, evaluate the model using performance metrics like accuracy and confusion matrix for complete assessment.

Data collection

Human action dataset is available in Kaggle. It is possible to conveniently and securely access a Kaggle dataset from within Google Colab after configuring your Kaggle credentials to prevent compromising sensitive information. It brings in the user’s data to collect securely the Kaggle API key and username and assigns them as environment variables. This enables the use of Kaggle’s CLI command (!kaggle datasets download -d meetnagadia/human-action-recognition-har-dataset) which authenticates the user and downloads the dataset straight into Colab.

Data preparation workflow

Load the dataset from source folders or CSV files with image paths and labels.
Resize all images to a consistent shape (e.g., 160x160 pixels).
Normalize pixel values to the range [0, 1].
Convert categorical labels into numeric values using LabelEncoder.
Apply one-hot encoding to the numeric labels.
Organize data into batches for model training.

Code Explanation

Step 1

Mount Google Drive

Mount your Google Drive to access and save datasets, models and other resources.

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Import Libraries for Image Processing and Modeling

This code imports libraries such as OpenCV, NumPy and Matplotlib for image processing; machine learning libraries such as pandas and Scikit-learn and even TensorFlow/Keras for deep learning model building. In addition, the code imports several pre-trained models such as ResNet50, VGG16, InceptionV3 and other layers that form part of creating and training neural networks.

import os
import cv2
import zipfile
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.applications import ResNet50, VGG16, InceptionV3
from tensorflow.keras.applications.vgg16 import preprocess_input as vgg_preprocess
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout, Flatten
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess
from tensorflow.keras.applications.inception_v3 import preprocess_input as inception_preprocess

Check out the dataset folder structure.

This code sets the path for the dataset folder and lists the files and subdirectories within it. It helps one understand the dataset's folder structure for any further processing it might undergo.

# Dataset folder path
data_folder = '/content/human-action-recognition-har-dataset/Human Action Recognition/'
# Check the structure
os.listdir(data_folder)

Import and Preview Training Data

The code imports training data from a CSV file to pandas DataFrame objects and shows the first few rows in it. This gives an idea about the dataset's structure and allows an overview of the data.

train_df=pd.read_csv('/content/human-action-recognition-har-dataset/Human Action Recognition/Training_set.csv')
train_df.head()

Import and Preview Test Data

The code imports test data from a CSV file to pandas DataFrame objects and shows the first few rows in it. This gives an idea about the test dataset's structure and allows an overview of the data

test_df=pd.read_csv('/content/human-action-recognition-har-dataset/Human Action Recognition/Testing_set.csv')
test_df.head()

Visualizes the Label Distribution.

This uses Plotly to make a pie chart that visualizes how different human activity labels in the training dataset are distributed. In particular, this allows an understanding of the balance of classes in the dataset.

HAR = train_df.label.value_counts()
fig = px.pie(train_df, values=HAR.values, names=HAR.index, title='Distribution of Human Activity')
fig.show()

Display Random Image with Label

Selects a random image from a training dataset, loads and displays with the assigned label. If the image isn’t found, it just prints a message and skips that file.

def displaying_random_images():
num = random.randint(1,10000)
imgg = "Image_{}.jpg".format(num)
train = "/content/human-action-recognition-har-dataset/Human Action Recognition/train/"
if os.path.exists(train+imgg):
# Use plt.imread or matplotlib.image.imread instead of img.imread
testImage = plt.imread(train+imgg)
plt.imshow(testImage)
plt.title("{}".format(train_df.loc[train_df['filename'] == "{}".format(imgg), 'label'].item()))
else:
#print(train+img)
print("File Path not found \nSkipping the file\!\!")
displaying_random_images()

Loading and Preprocessing Images

This code applies iterating through the training dataset files; loading images, resizing them to 160 by 160, normalizing the pixel quality values and aggregating the data and labels as images into two separate lists. This step prepares the training data for model training.

from PIL import Image
# Path to the train folder and the CSV file
train_folder = '/content/human-action-recognition-har-dataset/Human Action Recognition/train/'
img_data = []  # This will store the images
img_label = []  # This will store the labels corresponding to the images
# Loop through each row in the DataFrame
for index, row in train_df.iterrows():
# Get the image filename and the corresponding label
image_filename = row['filename']
label = row['label']  # e.g., 'sitting'
# Create the full path to the image
image_path = os.path.join(train_folder, image_filename)
# Open the image
temp_img = Image.open(image_path)
# Resize the image to 160x160 pixels
temp_img = temp_img.resize((160, 160))
# Convert the image to a numpy array and normalize it (scale pixel values to [0, 1])
img_data.append(np.asarray(temp_img) / 255.0)
# Append the corresponding label
img_label.append(label)

Transforming image data and labels to arrays

This code includes the image data list (img_data) and labels (img_label) as NumPy arrays. The shape of the X array will be (num_samples, 160, 160, 3); conversely, the y array will shape it to (num_samples,) for labels.