On the optimality of generative adversarial networks : a variational perspective

By: Contributor(s): Material type: BookBookPublication details: Bangalore : Indian Institute of Science, 2023Description: 479p.: ill. col. e-Thesis 248.2 MbDissertation: PhD; 2023; Robert Bosch Center for Cyber-Physical SystemsSubject(s): DDC classification:
  • 515.2433 ASO
Online resources: Dissertation note: PhD; 2023; Robert Bosch Center for Cyber-Physical Systems Summary: Generative adversarial networks are a popular generative modeling framework, where the task is to learn the underlying distribution of data. GANs comprise a min-max game between two neural networks, the generator and the discriminator. The generator transforms noise, typically Gaussian distributed, into a desired output, typically images. The discriminator learns to distinguish between the target samples and the generator output. The objective is to learn the optimal generator — one that can generate samples that perfectly confuse the discriminator. GANs are trained to either minimize a divergence function or an integral probability metric (IPM) between the data and generator distributions. Common divergences include the Jensen-Shannon divergence in the standard GAN (SGAN), the chi-squared divergence in least-squares GAN (LSGAN) and f-divergences in f-GANs. Popular IPMs include the Wasserstein-2 metric or the Sobolev metric. The choice of the IPM results in a constraint class over which the discriminator is optimized, such as Lipschitz-1 functions in Wasserstein GAN (WGAN) or functions with bounded energy in their gradients as in the case of Sobolev GAN. While GANs excel at generating realistic images, their optimization is not well understood. This thesis focuses on understanding the optimality of GANs, viewed from the perspective of Variational Calculus. The thesis is organized into three parts. In Part-I, we consider the functional analysis of the discriminator in various GAN formulations. In f-GANs, the functional optimization of the loss coincides with pointwise optimization as reported in the literature. We extend the analysis to novel GAN losses via a new contrastive-learning framework called Rumi-GAN, in which the target data is split into positive and negative classes. We design novel GAN losses that allow for the generator to learn the positive class while the discriminator is trained on both classes. For the WGAN IPM, we propose a novel variant of the gradient-norm penalty, and show by means of Euler-Lagrange analysis, that the optimal discriminator solves the Poisson partial differential equation (PDE). We solve the PDE via two approaches – one involving Fourier-series approximations and the other, involving radial basis function (RBF) expansions. We derive approximation bounds for the Fourier-series discriminator in high-dimensional spaces, and implement the discriminator by means of a novel Fourier series network architecture. The proposed approach outperforms baseline GANs on synthetic Gaussian learning tasks. We extend the approach to image generation by means of latent-space matching in Wasserstein autoencoders (WAE). The RBF formulation results in a charge-field interpretation of the optimal discriminator. We also present generalizations to higher-order gradient penalties for the LSGAN and WGAN losses, and show that the optimal discriminator can be implemented by means of a polyharmonic spline interpolator, giving rise to the name PolyGANs. PolyGANs, implemented by means of an RBF discriminator whose weights and centers are evaluated in closed-form, results in superior convergence of the generator. In Part-II, we tackle the issue of choosing the input distribution of the generator. We introduce Spider GANs, a generalization of image-to-image translation GANs, wherein providing the generator with data coming from a closely related/“friendly neighborhood” source dataset accelerates and stabilizes training, even in scenarios where there is no visual similarity between the source and target datasets. Spider GANs can be cascaded, resulting in state-of-the-art performance when trained with StyleGAN architectures on small, highresolution datasets, in merely one-fifth of the training time. To identify “friendly neighbors” of a target dataset, we propose the “signed Inception distance” (SID), which employs the PolyGAN discriminator to quantify the proximity between datasets. In Part-III, we extend the analysis performed in Part-I to GAN generators. In divergenceminimizing GANs, the optimal generator matches the gradient of its push-forward distribution with the gradient of the data distribution (known as the score), linking GANs to score-based Langevin diffusion. In IPM-GANs, the optimal generator performs flowmatching on the gradient-field of the discriminator, thereby deriving an equivalence between the score-matching and flow-matching frameworks. We present implementations of flowmatching GANs, and develop an active-contour-based technique to train the generator in SnakeGANs. Finally, we leverage the gradient field of the discriminator to evolve particles in a Langevin-flow setting, and show that the proposed discriminator-guided Langevin diffusion accelerates baseline score-matching diffusion without the need for noise conditioning.
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)

includes bibliographical references and index

PhD; 2023; Robert Bosch Center for Cyber-Physical Systems

Generative adversarial networks are a popular generative modeling framework, where the
task is to learn the underlying distribution of data. GANs comprise a min-max game between
two neural networks, the generator and the discriminator. The generator transforms noise,
typically Gaussian distributed, into a desired output, typically images. The discriminator
learns to distinguish between the target samples and the generator output. The objective
is to learn the optimal generator — one that can generate samples that perfectly confuse
the discriminator. GANs are trained to either minimize a divergence function or an integral
probability metric (IPM) between the data and generator distributions. Common divergences
include the Jensen-Shannon divergence in the standard GAN (SGAN), the chi-squared
divergence in least-squares GAN (LSGAN) and f-divergences in f-GANs. Popular IPMs
include the Wasserstein-2 metric or the Sobolev metric. The choice of the IPM results in a
constraint class over which the discriminator is optimized, such as Lipschitz-1 functions in
Wasserstein GAN (WGAN) or functions with bounded energy in their gradients as in the
case of Sobolev GAN. While GANs excel at generating realistic images, their optimization
is not well understood. This thesis focuses on understanding the optimality of GANs, viewed
from the perspective of Variational Calculus. The thesis is organized into three parts.
In Part-I, we consider the functional analysis of the discriminator in various GAN formulations. In f-GANs, the functional optimization of the loss coincides with pointwise
optimization as reported in the literature. We extend the analysis to novel GAN losses
via a new contrastive-learning framework called Rumi-GAN, in which the target data is split into positive and negative classes. We design novel GAN losses that allow for the
generator to learn the positive class while the discriminator is trained on both classes. For the
WGAN IPM, we propose a novel variant of the gradient-norm penalty, and show by means
of Euler-Lagrange analysis, that the optimal discriminator solves the Poisson partial differential equation (PDE). We solve the PDE via two approaches – one involving Fourier-series
approximations and the other, involving radial basis function (RBF) expansions. We derive
approximation bounds for the Fourier-series discriminator in high-dimensional spaces, and
implement the discriminator by means of a novel Fourier series network architecture. The
proposed approach outperforms baseline GANs on synthetic Gaussian learning tasks. We
extend the approach to image generation by means of latent-space matching in Wasserstein
autoencoders (WAE). The RBF formulation results in a charge-field interpretation of the
optimal discriminator. We also present generalizations to higher-order gradient penalties
for the LSGAN and WGAN losses, and show that the optimal discriminator can be implemented by means of a polyharmonic spline interpolator, giving rise to the name PolyGANs.
PolyGANs, implemented by means of an RBF discriminator whose weights and centers are
evaluated in closed-form, results in superior convergence of the generator.
In Part-II, we tackle the issue of choosing the input distribution of the generator. We
introduce Spider GANs, a generalization of image-to-image translation GANs, wherein
providing the generator with data coming from a closely related/“friendly neighborhood”
source dataset accelerates and stabilizes training, even in scenarios where there is no visual
similarity between the source and target datasets. Spider GANs can be cascaded, resulting
in state-of-the-art performance when trained with StyleGAN architectures on small, highresolution datasets, in merely one-fifth of the training time. To identify “friendly neighbors”
of a target dataset, we propose the “signed Inception distance” (SID), which employs the
PolyGAN discriminator to quantify the proximity between datasets.
In Part-III, we extend the analysis performed in Part-I to GAN generators. In divergenceminimizing GANs, the optimal generator matches the gradient of its push-forward distribution with the gradient of the data distribution (known as the score), linking GANs to score-based Langevin diffusion. In IPM-GANs, the optimal generator performs flowmatching on the gradient-field of the discriminator, thereby deriving an equivalence between
the score-matching and flow-matching frameworks. We present implementations of flowmatching GANs, and develop an active-contour-based technique to train the generator in
SnakeGANs. Finally, we leverage the gradient field of the discriminator to evolve particles in
a Langevin-flow setting, and show that the proposed discriminator-guided Langevin diffusion
accelerates baseline score-matching diffusion without the need for noise conditioning.

There are no comments on this title.

to post a comment.

                                                                                                                                                                                                    Facebook    Twitter

                             Copyright © 2023. J.R.D. Tata Memorial Library, Indian Institute of Science, Bengaluru - 560012

                             Contact   Phone: +91 80 2293 2832

Powered by Koha