Skip to content

MichaelDeng03/Generative-Vision-Models

Repository files navigation

Summary

Style Transfer

style_transfer.ipynb

This project implements Neural Style Transfer, a technique for rendering a new image that combines the content of one source image with the artistic style of another. The method leverages a pre-trained deep network, SqueezeNet, as a feature extractor to represent the perceptual qualities of the images. Rather than training the network's parameters, the optimization is performed via gradient descent directly on the pixels of the output image itself. This process iteratively adjusts the image to minimize a carefully defined loss function, which quantifies the difference between the generated image and the desired content and style characteristics.

The composite loss function is a weighted sum of three distinct terms. First, the content loss measures how much the high-level feature representations of the generated image differ from those of the content image at a specific layer in the network. Second, the style loss captures texture, color, and patterns by comparing the correlations between filter activations using Gram matrices. This loss is typically calculated across multiple layers to capture stylistic features at different scales. Finally, a total variation loss is added as a regularization term to encourage spatial smoothness and reduce high-frequency noise in the resulting image.

GAN

generative_adversarial_network.ipynb

This script implements a Generative Adversarial Network (GAN) in PyTorch to generate handwritten digits trained on the MNIST dataset. The implementation is centered around two competing neural networks: a Generator and a Discriminator. The Generator is designed to produce realistic images from a random noise vector, while the Discriminator's objective is to differentiate between these synthetically generated images and real images from the training data. The project defines the specific adversarial loss functions for both networks, leveraging PyTorch's numerically stable binary_cross_entropy_with_logits function for implementation.

The script constructs and trains two distinct types of GANs. The first is a vanilla GAN that uses simple fully-connected (dense) layers for both the generator and discriminator. The second, more advanced model is a Deep Convolutional GAN (DCGAN). This architecture uses Conv2d and MaxPool layers in the discriminator for spatial feature extraction and ConvTranspose2d layers with BatchNorm in the generator to progressively build an image from the noise input. A generalized training function orchestrates the optimization process for both GAN types. Finally, the script demonstrates that the model has learned something non-trivial about the underlying spatial structure by interpolating between random vectors in its latent space.

About

Implements: 1) Style Transfer, 2) GAN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors