Deep Learning with Differentiable
Rasterization for Vector Graphics
analysis

Reddy, Pradyumna

Deep Learning with Differentiable Rasterization for Vector Graphics analysis

Files

phd_thesis.pdf (126.21 MB)

Date

2023-11

Authors

Reddy, Pradyumna

Abstract

Images and 2D geometric representations are two different ways of representing visual information. Vector graphics/2D geometric representations are widely used to represent patterns, fonts, logos, digital artworks, and graphic designs. In a 2D geometric representation, a shape is represented using geometric objects, such as points, lines, curves, and the relationship between these objects. Unlike pixel based image representation, geometric representations enjoy several advantages: they are compact, easy to resize to an arbitrary resolution without loss of visual quality, and amenable to stylization by simply adjusting parameters such as size, color, or stroke type. As a result, it is favored by graphic artists and designers. Advances in deep learning, have resulted in unprecedented progress in many classical image manipulation tasks. In the context of learning based methods, while a vast body of work has focused on algorithms for raster images, only a handful of options exist for 2D geometric representations. The goal of this thesis is to develop deep learning based methods for the analysis of visual information using 2D geometric representations. Our key hypothesis is that using image domain statistics for optimizing geometric representation attributes enables accurate capture and manipulation of the visual information. In this thesis, we explore three methods with this insight as our point of departure. First, we tackle the problem of editing and expanding patterns that are encoded as flat images. We present a Differentiable Compositing function that enables propagating gradients between a raster point pattern image and the parameters of individual elements in the pattern. We use these gradients to decompose raster pattern images into layered graphical objects thereby facilitating accurate editing of the patterns. We also show how the Differentiable Compositing function along with the perceptual style loss can be used to expand a pattern. Next, current neural network models specialized for vector graphics representation require explicit supervision of the vector graphics data at training time. This is not ideal because large-scale high-quality vector-graphics datasets are difficult to obtain. We introduce Im2Vec a neural network architecture that can estimate complex vector graphics with varying topologies. We leverage the Differentiable compositing function to robustly train the network in an end-to-end manner with only indirect supervision from readily available raster training images (i.e., with no vector counterparts). Finally, fonts when represented in a vector format cannot benefit from recent network architectures for neural representations alternatively rasterizing the vector into images with fixed resolution suffer from loss of data fidelity. We propose multi-implicits to represent complex fonts as a superposition of a set of permutation invariant neural implicit functions, without compromising font-specific features such as edges and corners. For all three tasks, we show the utility of our insight by evaluating each method extensively and showing superior performance over baseline methods.