On the optimality of generative adversarial networks : a variational perspective
Material type:![Book](/opac-tmpl/lib/famfamfam/BK.png)
- 515.2433 ASO
Item type | Current library | Call number | URL | Status | Date due | Barcode |
---|---|---|---|---|---|---|
![]() |
JRD Tata Memorial Library | 515.2433 ASO (Browse shelf(Opens below)) | Link to resource | Available | ET00233 |
Browsing JRD Tata Memorial Library shelves Close shelf browser (Hides shelf browser)
![]() |
![]() |
![]() |
No cover image available | No cover image available |
![]() |
![]() |
||
515.24 SHI A Primer on Number Sequences : Mathematical Marvels / | 515.243 3 P12 (MA) Fourier analysis : an introduction | 515.2433 ANT First course in harmonic analysis | 515.2433 ASO On the optimality of generative adversarial networks : a variational perspective | 515.2433 JAI Some Aspects of Weighted Kernel Functions on Planar Domains | 515.2433 KAT Introduction to harmonic analysis | 515.2433 KAT/I Introduction to harmonic analysis |
includes bibliographical references and index
PhD; 2023; Robert Bosch Center for Cyber-Physical Systems
Generative adversarial networks are a popular generative modeling framework, where the
task is to learn the underlying distribution of data. GANs comprise a min-max game between
two neural networks, the generator and the discriminator. The generator transforms noise,
typically Gaussian distributed, into a desired output, typically images. The discriminator
learns to distinguish between the target samples and the generator output. The objective
is to learn the optimal generator — one that can generate samples that perfectly confuse
the discriminator. GANs are trained to either minimize a divergence function or an integral
probability metric (IPM) between the data and generator distributions. Common divergences
include the Jensen-Shannon divergence in the standard GAN (SGAN), the chi-squared
divergence in least-squares GAN (LSGAN) and f-divergences in f-GANs. Popular IPMs
include the Wasserstein-2 metric or the Sobolev metric. The choice of the IPM results in a
constraint class over which the discriminator is optimized, such as Lipschitz-1 functions in
Wasserstein GAN (WGAN) or functions with bounded energy in their gradients as in the
case of Sobolev GAN. While GANs excel at generating realistic images, their optimization
is not well understood. This thesis focuses on understanding the optimality of GANs, viewed
from the perspective of Variational Calculus. The thesis is organized into three parts.
In Part-I, we consider the functional analysis of the discriminator in various GAN formulations. In f-GANs, the functional optimization of the loss coincides with pointwise
optimization as reported in the literature. We extend the analysis to novel GAN losses
via a new contrastive-learning framework called Rumi-GAN, in which the target data is split into positive and negative classes. We design novel GAN losses that allow for the
generator to learn the positive class while the discriminator is trained on both classes. For the
WGAN IPM, we propose a novel variant of the gradient-norm penalty, and show by means
of Euler-Lagrange analysis, that the optimal discriminator solves the Poisson partial differential equation (PDE). We solve the PDE via two approaches – one involving Fourier-series
approximations and the other, involving radial basis function (RBF) expansions. We derive
approximation bounds for the Fourier-series discriminator in high-dimensional spaces, and
implement the discriminator by means of a novel Fourier series network architecture. The
proposed approach outperforms baseline GANs on synthetic Gaussian learning tasks. We
extend the approach to image generation by means of latent-space matching in Wasserstein
autoencoders (WAE). The RBF formulation results in a charge-field interpretation of the
optimal discriminator. We also present generalizations to higher-order gradient penalties
for the LSGAN and WGAN losses, and show that the optimal discriminator can be implemented by means of a polyharmonic spline interpolator, giving rise to the name PolyGANs.
PolyGANs, implemented by means of an RBF discriminator whose weights and centers are
evaluated in closed-form, results in superior convergence of the generator.
In Part-II, we tackle the issue of choosing the input distribution of the generator. We
introduce Spider GANs, a generalization of image-to-image translation GANs, wherein
providing the generator with data coming from a closely related/“friendly neighborhood”
source dataset accelerates and stabilizes training, even in scenarios where there is no visual
similarity between the source and target datasets. Spider GANs can be cascaded, resulting
in state-of-the-art performance when trained with StyleGAN architectures on small, highresolution datasets, in merely one-fifth of the training time. To identify “friendly neighbors”
of a target dataset, we propose the “signed Inception distance” (SID), which employs the
PolyGAN discriminator to quantify the proximity between datasets.
In Part-III, we extend the analysis performed in Part-I to GAN generators. In divergenceminimizing GANs, the optimal generator matches the gradient of its push-forward distribution with the gradient of the data distribution (known as the score), linking GANs to score-based Langevin diffusion. In IPM-GANs, the optimal generator performs flowmatching on the gradient-field of the discriminator, thereby deriving an equivalence between
the score-matching and flow-matching frameworks. We present implementations of flowmatching GANs, and develop an active-contour-based technique to train the generator in
SnakeGANs. Finally, we leverage the gradient field of the discriminator to evolve particles in
a Langevin-flow setting, and show that the proposed discriminator-guided Langevin diffusion
accelerates baseline score-matching diffusion without the need for noise conditioning.
There are no comments on this title.