3D image generation in Computer Vision with implementation | Computer Vision

Written by- AionlinecourseComputer Vision Tutorials

In artificial intelligence, image generation is a trending topic, you may know or daily use chetgpt or Midjourny, it helps your business robustly. Image generation may have several types, such as text-to-image, image-to-image, and artistic style transfer. In this article, we will discuss basic image generation,  types of image generation, and a coding implementation for image generation. So, lats deep drive onto it.

Image generation is the process of creating new images from scratch using machine learning or artificial intelligence (AI) methods. This expertise is crucial in several industries, including computer graphics, the arts, entertainment, and more. It allows for the developing of realistic or creative images, data augmentation for deep learning model training, and even the generation of images based on particular characteristics or aesthetic preferences.

Types of image generation techniques:

There are several types of image generation techniques. Here are some of what we discussed.

Generative Adversarial Networks (GANs): GANs consist of a generator and a discriminator network. The generator creates images, while the discriminator distinguishes between real and generated images. GANs are known for their ability to produce high-quality and diverse images.

Variational Autoencoders (VAEs): VAEs are probabilistic models that generate images by sampling from a learned latent space. They are useful for generating images with controlled attributes and have applications in image reconstruction.

Auto-Regressive Models: Auto-regressive models generate images one pixel at a time, often using recurrent neural networks (RNNs) or transformers. Pixel values are predicted sequentially based on previous pixels.

Flow-Based Models: Flow-based models transform a simple distribution into a complex data distribution, allowing for image generation. They are known for their invertibility and can be used for generative tasks.


Implementation Part:

Here, we are implementing a basic image generation using generating an adversarial network. You will get the full project code on Google ColabFirstly,  we import the necessary library for our implementation. 

Import library

import tensorflow as tf
from tensorflow.keras import layers, models, datasets
import numpy as np

Model define

In this code, we define the architecture of the generator model. It takes the latent_dim as an input, which represents the dimension of the input noise vector. Here are two models: the generator model and the discrimination model.

# Define the generator model
# Define the generator model
def build_generator(latent_dim):
# Create a sequential model (a linear stack of layers)
model = models.Sequential()
    
# Add a dense layer with input dimension as latent_dim
model.add(layers.Dense(7 * 7 * 256, input_dim=latent_dim))
    
# Reshape the output to a 7x7x256 tensor
model.add(layers.Reshape((7, 7, 256)))
    
# Add a transposed convolutional layer with 128 filters, a 4x4 kernel, and 2x2 strides
model.add(layers.Conv2DTranspose(128, (4, 4), strides=(2, 2), padding='same'))
    
 # Add batch normalization to stabilize training
 model.add(layers.BatchNormalization())
    
 # Add LeakyReLU activation with a small slope (alpha=0.2)
  model.add(layers.LeakyReLU(alpha=0.2))
    
 # Add another transposed convolutional layer with 64 filters, 4x4 kernel, and 2x2 strides
 model.add(layers.Conv2DTranspose(64, (4, 4), strides=(2, 2), padding='same'))
    
# Add batch normalization
 model.add(layers.BatchNormalization())
    
# Add LeakyReLU activation
 model.add(layers.LeakyReLU(alpha=0.2))
    
# Add a final transposed convolutional layer with 1 filter, a 7x7 kernel, and sigmoid #activation
    
model.add(layers.Conv2DTranspose(1, (7, 7), activation='sigmoid', padding='same'))
    
# Return the generator model
 return model
# Define the discriminator model
def build_discriminator(img_shape):
# Create a sequential model
model = models.Sequential()
# Add a convolutional layer with 64 filters, 3x3 kernel, 2x2 strides, and input shape img_shape
model.add(layers.Conv2D(64, (3, 3), strides=(2, 2), padding='same', input_shape=img_shape))
# Add LeakyReLU activation
model.add(layers.LeakyReLU(alpha=0.2))
# Add dropout to prevent overfitting
model.add(layers.Dropout(0.4))
# Add another convolutional layer with 128 filters, 3x3 kernel, and 2x2 strides
model.add(layers.Conv2D(128, (3, 3), strides=(2, 2), padding='same'))
# Add LeakyReLU activation
model.add(layers.LeakyReLU(alpha=0.2))
# Add dropout
model.add(layers.Dropout(0.4))
# Flatten the output to a 1D vector
model.add(layers.Flatten())
# Add a dense layer with 1 neuron and sigmoid activation to classify real or fake
model.add(layers.Dense(1, activation='sigmoid'))
# Return the discriminator model
return model

# Define the GAN model
def build_gan(generator, discriminator):
discriminator.trainable = False
model = models.Sequential()
model.add(generator)
model.add(discriminator)
return model


# Define hyperparameters
latent_dim = 100
img_shape = (28, 28, 1)
batch_size = 64
epochs = 100


# Load and preprocess the dataset (MNIST)
(train_images, _), (_, _) = datasets.mnist.load_data()
train_images = train_images / 127.5 - 1.0
train_images = np.expand_dims(train_images, axis=-1)


# Build and compile the discriminator
discriminator = build_discriminator(img_shape)
discriminator.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Build and compile the generator
generator = build_generator(latent_dim)
discriminator.trainable = False
gan = build_gan(generator, discriminator)
gan.compile(loss='binary_crossentropy', optimizer='adam')
# Training loop
# Training loop
for epoch in range(epochs):
# Generate random noise samples
noise = np.random.normal(0, 1, (batch_size, latent_dim))
# Generate fake images from noise
generated_images = generator.predict(noise)
# Select a random batch of real images from the dataset
idx = np.random.randint(0, train_images.shape[0], batch_size)
real_images = train_images[idx]
# Labels for the real and fake images
real_labels = np.ones((batch_size, 1))
fake_labels = np.zeros((batch_size, 1))


# Train the discriminator on real and fake images
d_loss_real = discriminator.train_on_batch(real_images, real_labels)
d_loss_fake = discriminator.train_on_batch(generated_images, fake_labels)
# Calculate the total discriminator loss
d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
# Generate new random noise samples
noise = np.random.normal(0, 1, (batch_size, latent_dim))
# Labels for the generator (tricking the discriminator)
valid_labels = np.ones((batch_size, 1))
# Train the generator to fool the discriminator
g_loss = gan.train_on_batch(noise, valid_labels)
# Print progress
print(f"Epoch {epoch}/{epochs}, D Loss: {d_loss[0]}, G Loss: {g_loss}")
# Save generated images at specified intervals
if epoch % 100 == 0:
generated_image = generator.predict(np.random.normal(0, 1, (1, latent_dim)))
generated_image = 0.5 * generated_image + 0.5

Challenges in Image Generation

Image generation is a complex task. There are challenges faced by image generation. There are some common,  mode collapses, Balancing Exploration and Exploitation. Mode collapse happens when a GAN produces a small number of visually similar images rather than fully capturing the diversity of the training data.    


Recent Advancements and Future Research Directions

Recent improvements in picture creation show a lot of potential. Self-attention methods have been used to improve long-range dependencies in images. Researchers are also diligently attempting to produce more diverse and innovative images, preventing mode collapse by encouraging models to explore the huge space of data distributions.

Image generation is used for multiple tasks. In day by day, new fields are created using image generation. In this article, we try to cover basic image generation, types of image generation, and an implementation. A single tutorial can only describe a partial part of image generation. You can learn more from here.