2023

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/2633352

Browse

Now showing 1 - 20 of 23

Path from Photorealism to Perceptual Realism
(2022-09) Zhong, Fangcheng
Photorealism in computer graphics --- rendering images that appear as realistic as photographs --- has matured to the point that it is now widely used in industry. With emerging 3D display technologies, the next big challenge in graphics is to achieve Perceptual Realism --- producing virtual imagery that is perceptually indistinguishable from real-world 3D scenes. Such a significant upgrade in the level of realism offers highly immersive and engaging experiences that have the potential to revolutionise numerous aspects of life and society, including entertainment, social connections, education, business, scientific research, engineering, and design. While perceptual realism puts strict requirements on the quality of reproduction, the virtual scene does not have to be identical in light distributions to its physical counterpart to be perceptually realistic, providing that it is visually indistinguishable to human eyes. Due to the limitations of human vision, a significant improvement in perceptual realism can, in principle, be achieved by fulfilling the essential visual requirements with sufficient qualities and without having to reconstruct the physically accurate distribution of light. In this dissertation, we start by discussing the capabilities and limits of the human visual system, which serves as a basis for the analysis of the essential visual requirements for perceptual realism. Next, we introduce a Perceptually Realistic Graphics (PRG) pipeline consisting of the acquisition, representation, and reproduction of the plenoptic function of a 3D scene. Finally, we demonstrate that taking advantage of the limits and mechanisms of the human visual system can significantly improve this pipeline. Specifically, we present three approaches to push the quality of virtual imagery towards perceptual realism. First, we introduce DiCE, a real-time rendering algorithm that exploits the binocular fusion mechanism of the human visual system to boost the perceived local contrast of stereoscopic displays. The method was inspired by an established model of binocular contrast fusion. To optimise the experience of binocular fusion, we proposed and empirically validated a rivalry-prediction model that better controls rivalry. Next, we introduce Dark Stereo, another real-time rendering algorithm that facilitates depth perception from binocular depth cues for stereoscopic displays, especially those under low luminance. The algorithm was designed based on a proposed model of stereo constancy that predicts the precision of binocular depth cues for a given contrast and luminance. Both DiCE and Dark Stereo have been experimentally demonstrated to be effective in improving realism. Their real-time performance also makes them readily integrable into any existing VR rendering pipeline. Nonetheless, only improving rendering is not sufficient to meet all the visual requirements for perceptual realism. The overall fidelity of a typical stereoscopic VR display is still confined by its limited dynamic range, low spatial resolution, optical aberrations, and vergence-accommodation conflicts. To push the limits of the overall fidelity, we present a High-Dynamic-Range Multi-Focal Stereo display (HDRMFS display) with an end-to-end imaging and rendering system. The system can visually reproduce real-world 3D objects with high resolution, accurate colour, a wide dynamic range and contrast, and most depth cues, including binocular disparity and focal depth cues, and permits a direct comparison between real and virtual scenes. It is the first work that achieves a close perceptual match between a physical 3D object and its virtual counterpart. The fidelity of reproduction has been confirmed by a Visual Turing Test (VTT) where naive participants failed to discern any difference between the real and virtual objects in more than half of the trials. The test provides insights to better understand the conditions necessary to achieve perceptual realism. In the long term, we foresee this system as a crucial step in the development of perceptually realistic graphics, for not only a quality unprecedentedly achieved but also a fundamental approach that can effectively identify bottlenecks and direct future studies for perceptually realistic graphics.
Optimization of Photogrammetric 3D Assets
(Università degli Studi di Milano, 2023) Maggiordomo, Andrea
The photogrammetric 3D reconstruction pipeline has emerged in recent years as the prominent technology for a variety of use cases, from heritage documentation to the authoring of high-quality photo-realistic assets for interactive graphics applications. However, the reconstruction process is prone to introduce modeling artifacts, and models generated with such tools often require careful post-processing before being ready for use in downstream applications. In this thesis, we face the problem of optimizing 3D models from this particular data source under several quality metrics. We begin by providing an analysis of their distinguishing features, focusing on the defects stemming from the automatic reconstruction process, and then we propose robust methods to optimize high-resolution 3D models under several objectives, addressing some of the identified recurring defects. First, we introduce a method to reduce the fragmentation of photo-reconstructed texture atlases, producing a more compact UV-map encoding with significantly fewer seams while minimizing the resampling artifacts introduced when synthesizing the new texture images. Next, we design a method to address texturing defects. Our technique combines recent advancements in image inpainting achieved by modern CNN-based models with the definition of a local texture-space inpainting domain that is semantically coherent, overcoming limitations of alternative texture inpainting strategies that operate in screen-space or global texture-space. Finally, we propose a method to efficiently represent densely tessellated 3D data by adopting a recently introduced efficient representation of displacement-mapped surfaces that benefits from GPU hardware support and does not require explicit parametrization of the 3D surface. Our approach generates a compact representation of high-resolution textured models, augmented with a texture map that is optimized and tailored to the displaced surface.
Computational Modeling and Design of Nonlinear Mechanical Systems and Materials
(2023) Tang, Pengbin
Nonlinear mechanical systems and materials are broadly used in diverse fields. However, their modeling and design are nontrivial as they require a complete understanding of their internal nonlinearities and other phenomena. To enable their efficient design, we must first introduce computational models to accurately characterize their complex behavior. Furthermore, new inverse design techniques are also required to capture how the behavior changes when we change the design parameters of nonlinear mechanical systems and materials. Therefore, in this thesis, we introduce three novel methods for computational modeling and design of nonlinear mechanical systems and materials. In the first article, we address the design problem of nonlinear mechanical systems exhibiting stable periodic motions in response to a periodic force. We present a computational method that utilizes a frequency-domain approach for dynamical simulation and the powerful sensitivity analysis for design optimization to design compliant mechanical systems with large-amplitude oscillations. Our method is versatile and can be applied to various types of compliant mechanical systems. We validate its effectiveness by fabricating and evaluating several physical prototypes. Next, we focus on the computation modeling and mechanical characterization of contact-dominated nonlinear materials, particularly Discrete Interlocking Materials (DIM), which are generalized chainmail fabrics made of quasi-rigid interlocking elements. Unlike conventional elastic materials for which deformation and restoring forces are directly coupled, the mechanics of DIM are governed by contacts between individual elements that give rise to anisotropic kinematic deformation constraints. To replicate the biphasic behavior of DIM without simulating expensive microscale structures, we introduce an efficient anisotropic strain-limiting method based on second-order cone programming (SOCP). Additionally, to comprehensively characterize strong anisotropy, complex coupling, and other nonlinear phenomena of DIM, we introduce a novel homogenization approach for distilling macroscale deformation limits from microscale simulations and develop a data-driven macromechanical model for simulating DIM with homogenized deformation constraints.
Machine Learning Supported Interactive Visualization of Hybrid 3D and 2D Data for the Example of Plant Cell Lineage Specification
(2023) Hong, Jiayi
As computer graphics technologies develop, spatial data can be better visualized in the 3D environment so that viewers can observe 3D shapes and positions clearly. Meanwhile, 2D abstract visualizations can present summarized information, visualize additional data, and control the 3D view. Combining these two parts in one interface can assist people in ﬁnishing complicated tasks, especially in scientific domains, though there is a lack of design guidelines for the interaction. Generally, experts need to analyze large amounts of scientific data to ﬁnish challenging tasks. For example, in the biological ﬁeld, biologists need to build the hierarchy tree for an embryo with more than 200 cells. In this case, manual work can be time-consuming and tedious, and machine learning algorithms have the potential to alleviate some of the tedious manual processes to serve as the basis for experts. These predictions, however, contain hierarchical and multi-layer information, and it is essential to visualize them sequentially and progressively so that experts can control their viewing pace and validation. Also, 3D and 2D representations, together with machine learning predictions, need to be visually and interactively connected in the system.In this thesis, we worked on the cell lineage problem for plant embryos as an example to investigate a visualization system and its interaction design that makes use of combinations of 3D and 2D representations as well as visualizations for machine learning. We ﬁrst investigated the 3D selection interaction techniques for the plant embryo. The cells in a plant embryo are tightly packed together, without any space in between. Traditional techniques can hardly deal with such an occlusion problem. We conducted a study to evaluate three different selection techniques and found out that the combination of the Explosion Selection technique and the List Selection technique works well for people to get access to and observe plant cells in an embryo. These techniques can also be extended to other similar densely packed 3D data. Second, we explored the visualization and interaction de-sign to combine the 3D visualizations of a plant embryo with its associated 2D hierarchy tree. We designed a system with such combinations for biologists to examine the plant cells and record the development history in the hierarchy tree. We support the hierarchy building in two directions, both constructing the history top-down using the lasso selection in a 3D environment and bottom-up as the traditional workﬂow does in the hierarchy tree. We also added a neural network model to give predictions about the assignments for biologists to start with. We conducted an evaluation with biologists, which showed that both 3D and 2D representations help with making decisions, and the tool can inspire insights for them. One main drawback was that the performance of the machine learning model was not ideal. Thus, to assist the process and enhance the model performance, in an improved version of our system, we trained ﬁve different ML models and visualized the predictions and their associated uncertainty. We performed a study, and the results indicated that our designed ML representations are easy to understand and that the new tool can eﬀectively improve the eﬃciency of assigning the cell lineage.
Self-Supervised Shape and Appearance Modeling via Neural Differentiable Graphics
(2023) Henzler, Philipp
Inferring 3D shape and appearance from natural images is a fundamental challenge in computer vision. Despite recent progress using deep learning methods, a key limitation is the availability of annotated training data, as acquisition is often very challenging and expensive, especially at a large scale. This thesis proposes to incorporate physical priors into neural networks that allow for self-supervised learning. As a result, easy-to-access unlabeled data can be used for model training. In particular, novel algorithms in the context of 3D reconstruction and texture/material synthesis are introduced, where only image data is available as supervisory signal. First, a method that learns to reason about 3D shape and appearance solely from unstructured 2D images, achieved via differentiable rendering in an adversarial fashion, is proposed. As shown next, learning from videos significantly improves 3D reconstruction quality. To this end, a novel ray-conditioned warp embedding is proposed that aggregates pixel-wise features from multiple source images. Addressing the challenging task of disentangling shape and appearance, first a method that enables 3D texture synthesis independent of shape or resolution is presented. For this purpose, 3D noise fields of different scales are transformed into stationary textures. The method is able to produce 3D textures, despite only requiring 2D textures for training. Lastly, the surface characteristics of textures under different illumination conditions are modeled in the form of material parameters. Therefore, a self-supervised approach is proposed that has no access to material parameters but only flash images. Similar to the previous method, random noise fields are reshaped to material parameters, which are conditioned to replicate the visual appearance of the input under matching light.
Efficient and Expressive Microfacet Models
(Charles University, 2023) Atanasov, Asen
In realistic appearance modeling, rough surfaces that have microscopic details are described using so-called microfacet models. These include analytical models that statistically define a physically-based microsurface. Such models are extensively used in practice because they are inexpensive to compute and offer considerable flexibility in terms of appearance control. Also, small but visible surface features can easily be added to them through the use of a normal map. However, there are still areas in which this general type of model can be improved: important features like anisotropy control sometimes lack analytic solutions, and the efficient rendering of normal maps requires accurate and general filtering algorithms. We advance the state of the art with regard to such models in these areas: we derive analytic anisotropic models, reformulate the filtering problem and propose an efficient filtering algorithm based on a novel filtering data structure. Specifically, we derive a general result in microfacet theory: given an arbitrary microsurface defined via standard microfacet statistics, we show how to construct the statistics of its linearly transformed counterparts. This leads to a simple closed-form expression for anisotropic variations of a given surface that generalizes previous work by supporting all microfacet distributions and all invertible tangential linear transformations. As a consequence, our approach allows transferring macrosurface deformations to the microsurface, so as to render its corresponding complex anisotropic appearance. Furthermore, we analyze the filtering of the combined effect of a microfacet BRDF and a normal map. We show that the filtering problem can be expressed as an Integral Histogram (IH) evaluation. Due to the high memory usage of IHs, we develop the Inverse Bin Map (IBM): a form of an IH that is very compact and fast to build. Based on the IBM, we present a highly memory-efficient technique for filtering normal maps that is targeted at the accurate rendering of glints, but in contrast with previous approaches also offers roughness control.
Recognition and representation of curve and surface primitives in digital models via the Hough transform
(2023-01-12) Romanengo, Chiara
Curve and surface primitives have an important role in conveying an object shape and their recognition finds significant applications in manufacturing, art, design and medical applications. When 3D models are acquired by scanning real objects, the resulting geometry does not explicitly encode these curves and surfaces, especially in the presence of noise or missing data. Then, the knowledge of the parts that compose a 3D model allows the reconstruction of the model itself. The problem of recognising curves and surfaces and providing a mathematical representation of them can be addressed using the Hough transform technique (HT), which in literature is mainly used to recognise curves in the plane and planes in space. Only in the last few years, it has been explored for the fitting of space curves and extended to different families of surfaces. Such a technique is robust to noise, does not suffer from missing parts and benefits from the flexibility of the template curve or surface. For these reasons, our approach is inspired by a generalisation of the Hough transform defined for algebraic curves. In this thesis, we present the methods we implemented and the results we obtained about the recognition, extraction, and representation of feature parts that compose a 3D model (both meshes and point clouds). Specifically, we first study the recognition of plane curves, simple and compound, expressed both in implicit and parametric form, with a focus on the application of cultural heritage and geometric motifs. Then, we analyse the extension of the method to space curves, concentrating on the improvement of the model through the insertion of the recognised curves directly on its surface. To overcome the limitation of knowing in advance the family of curves to be used with the HT, we introduce a piece-wise curve approximation using specific parametric, low-degree polynomial curves. Finally, we analyse how to recognise simple and complex geometric surface primitives on both pre-segmented and entire point clouds, and we show a comparison with state-of-the-art approaches on two benchmarks specifically created to evaluate existing and our methods.
Fully Controllable Data Generation For Realtime Face Capture
(ETH Zurich, 2023-01-31) Chandran, Prashanth
Data driven realtime face capture has gained considerable momentum in the last few years thanks to deep neural networks that leverage specialized datasets to speedup the acquisition of face geometry and appearance. However generalizing such neural solutions to generic in-the-wild face capture continues to remain a challenge due to the lack of, or a means to generate a high quality in-the-wild face database with all forms of groundtruth (geometry, appearance, environment maps, etc.). In this thesis we recognize this data bottleneck and propose a comprehensive framework for controllable, high quality, in-the-wild data generation that can support present and future applications in face capture. We approach this problem in four stages starting with the building of a high quality 3D face database consisting of a few hundred subjects in a studio setting. This database will serve as a strong prior for 3D face geometry and appearance for several methods discussed in this thesis. To build this 3D database and to automate the registration of scans to a template mesh, we propose the first deep facial landmark detector capable of operating on 4K resolution imagery while also achieving state-of-the-art performance on several in-the-wild benchmarks. Our second stage leverages the proposed 3D face database to build powerful nonlinear 3D morphable models for static geometry modelling and synthesis. We propose the first semantic deep face model that combines the semantic interpretability of traditional 3D morphable models with the nonlinear expressivity of neural networks. We later extend this semantic deep face model with a novel transformer based architecture and propose the Shape Transformer, for representing and manipulating face shapes irrespective of their mesh connectivity. The third stage of our data generation pipeline involves extending the approaches for static geometry synthesis to support facial deformations across time so as to synthesize dynamic performances. To synthesize facial performances we propose two parallel approaches, one involving performance retargeting and another based on a data driven 4D (3D + time) morphable model. We propose a local anatomically constrained facial performance retargeting technique that uses only a handful of blendshapes (20 shapes) to achieve production quality results. This retargeting technique can readily be used to create novel animations for any given actor via animation transfer. Our second contribution for generating facial performances is through a transformer based 4D autoencoder that encodes a sequence of expression blend weights into a learned performance latent space. Novel performances can then be generated at inference time by sampling this learned latent space. The fourth and final stage of our data generation pipeline involves the creation of photorealistic imagery that can go along with the facial geometry and animations synthesized thus far. We propose a hybrid rendering approach that leverages state-of-the-art techniques for ray traced skin rendering and a pretrained 2D generative model for photorealistic and consistent inpainting of the skin renders. Our hybrid rendering technique allows for the creation of an infinite number of training samples where the user has full control over the facial geometry, appearance, lighting and viewpoint. The techniques presented in this thesis will serve as the foundation for creating large scale photorealistic in-the-wild face datasets to support the next generation of realtime face capture.
Visual and multimodal perception in immersive environments
(2023-02-22) Malpica Mallo, Sandra
Through this thesis we use virtual reality (VR) as a tool to better understand human visual perception and attentional behavior. We leverage the intrinsic properties provided by VR in order to build user studies tailored to a set of different topics: VR provides increased control over sensory information when compared to traditional media, as well as more natural interactions with the environment and an increased sense of realism. These qualities, together with the feeling of presence and immersion, increase the ecological validity of user studies made in VR. Furthermore, it allows us researchers to explore closer to real-world scenarios in a safe and reproducible way. By increasing the available knowledge about visual perception we aim to provide visual computing researchers with more tools to overcome current limitations in the field, either hardware- or software-caused. Understanding human visual perception and attentional behavior is a challenging task: measuring such high-level cognitive processes is often not feasible, more so without medical-grade devices (which are commonly invasive for the user). For this reason, we settle on measuring observable data, both qualitative and quantitative. This data is further processed to obtain information about human behavior and create high-level guidelines or models when possible. We present the contributions of this thesis around two topics: visual perception of realistic stimuli and multimodal perception in immersive environments. The first one is devoted to visual appearance and has two separate contributions. First, we have created a learning-based appearance similarity metric by means of large-scale crowdsourced user studies and a deep learning model which correlates with human perception. Additionally, we study how low-level, asemantic visual features can be used to alter time perception in virtual reality, manifesting the interplay between visual and temporal perception at interval timing (several seconds to several minutes) intervals. Regarding the second topic, multimodal perception, we have first compiled an in-depth study of the state of the art of the use of different sensory modalities (visual, auditory, haptic, etc.) in immersive environments. Additionally, we have analyzed a crossmodal suppressive effect in virtual reality, where auditory cues can significantly degrade visual performance. Finally, we have shown how temporal synchronization is key to correctly perceive multimodal events and enhance their realism, even when visual quality is degraded. Ultimately, this thesis aims to increase the understanding of human behavior in immersive environments. This knowledge can not only benefit cognitive science researchers, but also computer graphics researchers, especially those in the field of VR, who will be able to use our findings to create better user experiences.
Modeling and simulating virtual terrains
(2023-03-21) Paris, Axel
This PhD, entitled "Modeling and simulating virtual terrains" is related to digital content creation and geological simulations, in the context of virtual terrains. Real terrains exhibit landforms of different scales (namely microscale, mesoscale, and macroscale), formed by multiple interconnected physical processes operating at various temporal and spatial scales. On a computer, landforms are usually represented by elevation models, but features such as arches and caves require a volumetric representation. The increasing needs for realism and larger worlds bring new challenges that existing techniques do not fulfill. This thesis is organized in two parts. First, we observe that several macroscale landforms, such as desert landscapes made of sand dunes and meandering rivers, simply cannot be modeled by existing techniques. Thus, we develop new simulations, inspired by research in geomorphology, to generate these landforms. We particularly focus on the plausibility of our results and user control, which is a key requirement in Computer Graphics. In the second part, we address the modeling and generation of volumetric landforms in virtual terrains. Existing models are often based on voxels and have a high memory impact, which forbids their use at a large-scale. Instead, we develop a new model based on signed distance functions for representing volumetric landforms, such as arches, overhangs and caves with a low memory footprint. We show that this representation is adapted to generating volumetric landforms across a range of scales (microscale, mesoscale, and macroscale).
Progressive Shape Reconstruction from Raw 3D Point Clouds
(Université Côte d’Azur, 2023-03-27) ZHAO, Tong
With the enthusiasm for digital twins in the fourth industrial revolution, surface reconstruction from raw 3D point clouds is increasingly needed while facing multifaceted challenges. Advanced data acquisition devices makes it possible to obtain 3D point clouds with multi-scale features. Users expect controllable surface reconstruction approaches with meaningful criteria such as preservation of sharp features or satisfactory accuracy-complexity tradeoffs. This thesis addresses these issues by contributing several approaches to the problem of surface reconstruction from defect-laden point clouds. We first propose the notion of progressive discrete domain for global implicit reconstruction approaches that refines and optimizes a discrete 3D domain in accordance to both input and output, and to user-defined criteria. Based on such a domain discretization, we devise a progressive primitive-aware surface reconstruction approach with capacity to refine the implicit function and its representation, in which the most ill-posed parts of the reconstruction problem are postponed to later stages of the reconstruction, and where the fine geometric details are resolved after discovering the topology. Secondly, we contribute a deep learning-based approach that learns to detect and consolidate sharp feature points on raw 3D point clouds, whose results can be taken as an additional input to consolidate sharp features for the previous reconstruction approach. Finally, we contribute a coarse-to-fine piecewise smooth surface reconstruction approach that proceeds by clustering quadric error metrics. This approach outputs a simplified reconstructed surface mesh, whose vertices are located on sharp features and whose connectivity is solved by a binary problem solver.In summary, this thesis seeks for effective surface reconstruction from a global and progressive perspective. By combining multiple priors and designing meaningful criteria, the contributed approaches can deal with various defects and multi-scale features.
Inverse Shape Design with Parametric Representations: Kirchhoff Rods and Parametric Surface Models
(Institute of Science and Technology Austria, 2023-05) Hafner, Christian
Inverse design problems in fabrication-aware shape optimization are typically solved on discrete representations such as polygonal meshes. This thesis argues that there are benefits to treating these problems in the same domain as human designers, namely, the parametric one. One reason is that discretizing a parametric model usually removes the capability of making further manual changes to the design, because the human intent is captured by the shape parameters. Beyond this, knowledge about a design problem can sometimes reveal a structure that is present in a smooth representation, but is fundamentally altered by discretizing. In this case, working in the parametric domain may even simplify the optimization task. We present two lines of research that explore both of these aspects of fabrication-aware shape optimization on parametric representations. The first project studies the design of plane elastic curves and Kirchhoff rods, which are common mathematical models for describing the deformation of thin elastic rods such as beams, ribbons, cables, and hair. Our main contribution is a characterization of all curved shapes that can be attained by bending and twisting elastic rods having a stiffness that is allowed to vary across the length. Elements like these can be manufactured using digital fabrication devices such as 3d printers and digital cutters, and have applications in free-form architecture and soft robotics. We show that the family of curved shapes that can be produced this way admits geometric description that is concise and computationally convenient. In the case of plane curves, the geometric description is intuitive enough to allow a designer to determine whether a curved shape is physically achievable by visual inspection alone. We also present shape optimization algorithms that convert a user-defined curve in the plane or in three dimensions into the geometry of an elastic rod that will naturally deform to follow this curve when its endpoints are attached to a support structure. Implemented in an interactive software design tool, the rod geometry is generated in real time as the user edits a curve and enables fast prototyping. The second project tackles the problem of general-purpose shape optimization on CAD models using a novel variant of the extended finite element method (XFEM). Our goal is the decoupling between the simulation mesh and the CAD model, so no geometry-dependent meshing or remeshing needs to be performed when the CAD parameters change during optimization. This is achieved by discretizing the embedding space of the CAD model, and using a new high-accuracy numerical integration method to enable XFEM on free-form elements bounded by the parametric surface patches of the model. Our simulation is differentiable from the CAD parameters to the simulation output, which enables us to use off-the-shelf gradient-based optimization procedures. The result is a method that fits seamlessly into the CAD workflow because it works on the same representation as the designer, enabling the alternation of manual editing and fabrication-aware optimization at will.
Data-centric Design and Training of Deep Neural Networks with Multiple Data Modalities for Vision-based Perception Systems
(University of the Basque Country, 2023-06-12) Aranjuelo, Nerea
The advances in computer vision and machine learning have revolutionized the ability to build systems that process and interpret digital data, enabling them to mimic human perception and paving the way for a wide range of applications. In recent years, both disciplines have made significant progress, fueled by advances in deep learning techniques. Deep learning is a discipline that uses deep neural networks (DNNs) to teach machines to recognize patterns and make predictions based on data. Deep learning-based perception systems are increasingly prevalent in diverse fields, where humans and machines collaborate to combine their strengths. These fields include automotive, industry, or medicine, where enhancing safety, supporting diagnosis, and automating repetitive tasks are some of the aimed goals. However, data are one of the key factors behind the success of deep learning algorithms. Data dependency strongly limits the creation and success of a new DNN. The availability of quality data for solving a specific problem is essential but hard to obtain, even impracticable, in most developments. Data-centric artificial intelligence emphasizes the importance of using high-quality data that effectively conveys what a model must learn. Motivated by the challenges and necessity of data, this thesis formulates and validates five hypotheses on the acquisition and impact of data in DNN design and training. Specifically, we investigate and propose different methodologies to obtain suitable data for training DNNs in problems with limited access to large-scale data sources. We explore two potential solutions for obtaining data, which rely on synthetic data generation. Firstly, we investigate the process of generating synthetic training data using 3D graphics-based models and the impact of different design choices on the accuracy of obtained DNNs. Beyond that, we propose a methodology to automate the data generation process and generate varied annotated data by replicating a 3D custom environment given an input configuration file. Secondly, we propose a generative adversarial network (GAN) that generates annotated images using both limited annotated data and unannotated in-the-wild data. Typically, limited annotated datasets have accurate annotations but lack realism and variability, which can be compensated for by the in-the-wild data. We analyze the suitability of the data generated with our GAN-based method for DNN training. This thesis also presents a data-oriented DNN design, as data can present very different properties depending on their source. We differentiate sources based on the sensor modality used to obtain the data (e.g., camera, LiDAR) or the data generation domain (e.g., real, synthetic). On the one hand, we redesign an image-oriented object detection DNN architecture to process point clouds from the LiDAR sensor and optionally incorporate information from RGB images. On the other hand, we adapt a DNN to learn from both real and synthetic images while minimizing the domain gap of learned features from data. We have validated our formulated hypotheses in various unresolved computer vision problems that are critical for numerous real-world vision-based systems. Our findings demonstrate that synthetic data generated using 3D models and environments are suitable for DNN training. However, we also highlight that the design choices during the generation process, such as lighting and camera distortion, significantly affect the accuracy of the resulting DNN. Additionally, we show that a simulation 3D environment can assist in designing better sensor setups for a target task. Furthermore, we demonstrate that GANs offer an alternative means of generating training data by exploiting labeled and existing unlabeled data to generate new samples that are suitable for DNN training without a simulation environment. Finally, we show that adapting DNN design and training to data modality and source can increase model accuracy. More specifically, we demonstrate that modifying a predefined architecture designed for images to accommodate the peculiarities of point clouds results in state-of-the-art performance in 3D object detection. The DNN can be designed to handle data from a single modality or leverage data from different sources. Furthermore, when training with real and synthetic data, considering their domain gap and designing a DNN architecture accordingly improves model accuracy.
Neural Mesh Reconstruction
(Simon Fraser University, 2023-06-16) Chen, Zhiqin
Deep learning has revolutionized the field of 3D shape reconstruction, unlocking new possibilities and achieving superior performance compared to traditional methods. However, despite being the dominant 3D shape representation in real-world applications, polygon meshes have been severely underutilized as a representation for output shapes in neural 3D reconstruction methods. One key reason is that triangle tessellations are irregular, which poses challenges for generating them using neural networks. Therefore, it is imperative to develop algorithms that leverage the power of deep learning while generating output shapes in polygon mesh formats for seamless integration into real-world applications. In this thesis, we propose several data-driven approaches to reconstruct explicit meshes from diverse types of input data, aiming to address this challenge. Drawing inspiration from classical data structures and algorithms in computer graphics, we develop representations to effectively represent meshes within neural networks. First, we introduce BSP-Net. Inspired by a classical data structure Binary Space Partitioning (BSP), we represent a 3D shape as a union of convex primitives, where each convex primitive is obtained by intersecting half-spaces. This 3-layer BSP-tree representation allows a shape to be stored in a 3-layer multilayer perceptron (MLP) as a neural implicit, while an exact polygon mesh can be extracted from the MLP weights by parsing the underlying BSP-tree. BSP-Net is the first deep neural network that is able to produce compact and watertight polygon meshes natively, and the generated meshes are capable of representing sharp geometric features. We demonstrate its effectiveness in the task of single-view 3D reconstruction. Next, we introduce a series of works that reconstruct explicit meshes by storing meshes in regular grid structures. We present Neural Marching Cubes (NMC), a data-driven algorithm for reconstructing meshes from discretized implicit fields. NMC is built upon Marching Cubes (MC), but it learns the vertex positions and local mesh topologies from example training meshes, thereby avoiding topological errors and achieving better reconstruction of geometric features, especially sharp features such as edges and corners, compared to MC and its variants. In our subsequent work, Neural Dual Contouring (NDC), we replace the MC meshing algorithm with Dual Contouring (DC) with slight modifications, so that our algorithm can reconstruct meshes from both signed inputs, such as signed distance fields or binary voxels, and unsigned inputs, such as unsigned distance fields or point clouds, with high accuracy and fast inference speed in a unified framework. Furthermore, inspired by the volume rendering algorithm in Neural Radiance Fields (NeRF), we introduce differentiable rendering to NDC to arrive at MobileNeRF, a NeRF-based method for reconstructing objects and scenes as triangle meshes with view-dependent textures from multi-view images. MobileNeRF is the first NeRF-based method that is able to run on mobile phones and AR/VR platforms thanks to the explicit mesh representation, demonstrating its efficiency and compatibility on common devices.
Processing Freehand Vector Sketches
(University of British Columbia, 2023-06-22) Liu, Chenxi
Freehand sketching is a fast and intuitive way for artists to communicate visual ideas, and is often the first step of creating visual content, ranging from industrial design to cartoon production. As drawing tablets and touch displays become increasingly common among professionals, a growing number of sketches are created and stored digitally in vector graphics format. This trend motivates a series of downstream sketch-based applications, performing tasks including drawing colorization, 3D model creation, editing, and posing. Even when stored digitally in vector format, hand-drawn sketches, often containing overdrawn strokes and inaccurate junctions, are different from the clean vector sketches required by these applications, which results in tedious and time-consuming manual cleanup tasks. In this thesis, we analyze the human perceptual cues that influence these two tasks: grouping overdrawn strokes that depict a single intended curve and connecting unintended gaps between strokes. Guided by these cues, we develop three methods for these two tasks. We first introduce StrokeAggregator, a method that automatically groups strokes in the input vector sketch and then replaces each group by the best corresponding fitting curve—a procedure we call sketch consolidation. We then present a method that detects and resolves unintended gaps in a consolidated vector line drawing using learned local classifiers and global cues. Finally, we propose StripMaker, a consolidation method that jointly considers local perception cues from the first method and connectivities detected by the second method. We further integrate observations about temporal and contextual information present in drawing, resulting in a method with superior consolidation performance and potential for better user interactivity. Together, this work identifies important factors in humans’ perception of freehand sketches and provides automatic tools that narrow the gap between the raw freehand vector sketches directly created by artists and the requirements of downstream computational applications.
New tools for modelling sensor data
(2023-06-30) López Ruiz, Alfonso
The objective of this thesis is to develop a framework capable of handling multiple data sources by correcting and fusing them to monitor, predict, and optimize real-world processes. The scope is not limited to images but also covers the reconstruction of 3D point clouds integrating visible, multispectral, thermal and hyperspectral data. However, working with real-world data is also tedious as it involves multiple steps that must be performed manually, such as collecting data, marking control points or annotating points. Instead, an alternative is to generate synthetic data from realistic scenarios, hence avoiding the acquisition of prohibitive technology and efficiently constructing large datasets. In addition, models in virtual scenarios can be attached to semantic annotations and materials, among other properties. Unlike manual annotations, synthetic datasets do not introduce spurious information that could mislead the algorithms that will use them. Remotely sensed images, albeit showing notable radiometric changes, can be fused by optimizing the correlation among them. This thesis exploits the Enhanced Correlation Coefficient image-matching algorithm to overlap visible, multispectral and thermal data. Then, multispectral and thermal data are projected into a dense RGB point cloud reconstructed with photogrammetry. By projecting and not directly reconstructing, the aim is to achieve geometrically accurate and dense point clouds from low-resolution imagery. In addition, this methodology is notably more efficient than GPU-based photogrammetry in commercial software. Radiometric data is ensured to be correct by identifying the occlusion of points as well as by minimizing the dissimilarity of aggregated data from the starting samples. Hyperspectral data is, on the other hand, projected over 2.5D point clouds with a pipeline adapted to push-broom scanning. The hyperspectral swaths are geometrically corrected and overlapped to compose an orthomosaic. Then, it is projected over a voxelized point cloud. Due to the large volume of the resulting hypercube, it is compressed following a stack-based representation in the radiometric dimension. The real-time rendering of the compressed hypercube is enabled by iteratively constructing an image in a few frames, thus reducing the overhead of single frames. In contrast, the generation of synthetic data is focused on LiDAR technology. The baseline of this simulation is the indexing of scenarios with a high level of detail in state-of-the-art ray-tracing data structures that help to rapidly solve ray-triangle intersections. From here, random and systematic errors are introduced, such as outliers, jittering of rays and return losses, among others. In addition, the construction of large LiDAR datasets is supported by the procedural generation of scenes that can be enriched with semantic annotations and materials. Airborne and terrestrial scans are parameterized to be fed with datasheets from commercial sensors. The airborne scans integrate several scan geometries, whereas the intensity of returns is estimated with BRDF databases that collect samples from a gonio-photometer. In addition, the simulated LiDAR can operate at different wavelengths, including bathymetry, and emulates several returns. This thesis is concluded by showing the benefits of fused data and synthetic datasets with three case studies. The LiDAR simulation is employed to optimize scanning plans in buildings by using local searches to determine optimal scan locations while minimizing the number of required scans with the help of genetic algorithms. These metaheuristics are guided by four objective functions that evaluate the accuracy, coverage, detail, and overlapping of the LiDAR scans. Then, thermal infrared point clouds and orthorectified maps are used to locate buried remains and reconstruct the structure of a poorly conserved archaeological site, highlighting the potential of remotely sensed data to support the preservation of cultural heritage. Finally, hyperspectral data is corrected and transformed to train a convolutional neural network in pursuit of classifying different grapevine varieties.
Latent Disentanglement for the Analysis and Generation of Digital Human Shapes
(2023-06-30) Foti, Simone
Analysing and generating digital human shapes is crucial for a wide variety of applications ranging from movie production to healthcare. The most common approaches for the analysis and generation of digital human shapes involve the creation of statistical shape models. At the heart of these techniques is the definition of a mapping between shapes and a low-dimensional representation. However, making these representations interpretable is still an open challenge. This thesis explores latent disentanglement as a powerful technique to make the latent space of geometric deep learning based statistical shape models more structured and interpretable. In particular, it introduces two novel techniques to disentangle the latent representation of variational autoencoders and generative adversarial networks with respect to the local shape attributes characterising the identity of the generated body and head meshes. This work was inspired by a shape completion framework that was proposed as a viable alternative to intraoperative registration in minimally invasive surgery of the liver. In addition, one of these methods for latent disentanglement was also applied to plastic surgery, where it was shown to improve the diagnosis of craniofacial syndromes and aid surgical planning.
Neural Networks for Digital Materials and Radiance Encoding
(2023-07) Rodriguez-Pardo, Carlos
Realistic virtual scenes are becoming increasingly prevalent in our society, with a wide range of applications in areas such as manufacturing, architecture, fashion design, and entertainment, including movies, video games, and augmented and virtual reality. Generating realistic images of such scenes requires highly accurate illumination, geometry, and material models, which can be time-consuming and challenging to obtain. Traditionally, such models have often been created manually by skilled artists, but this process can be prohibitively time-consuming and costly. Alternatively, real-world examples can be captured, but this approach presents additional challenges in terms of accuracy and scalability. Moreover, while realism and accuracy are crucial in such processes, rendering efficiency is also a key requirement, so that lifelike images can be generated with the speed required in many real-world applications. One of the most significant challenges in this regard is the acquisition and representation of materials, which are a critical component of our visual world and, by extension, of virtual representations of it. However, existing approaches for material acquisition and representation are limited in terms of efficiency and accuracy, which limits their real-world impact. To address these challenges, data-driven approaches that leverage machine learning may provide viable solutions. Nevertheless, designing and training machine learning models that meet all these competing requirements remains a challenging task, requiring careful consideration of trade-offs between quality and efficiency. In this thesis, we propose novel learning-based solutions to address several key challenges in physically-based rendering and material digitization. Our approach leverages various forms of neural networks to introduce innovative algorithms for radiance encoding, digital material generation, edition, and estimation. First, we present a visual attribute transfer framework for digital materials that can effectively generalize to new illumination conditions and geometric distortions. We showcase a use-case of this method for high-resolution material acquisition using a custom device. Additionally, we propose a generative model capable of synthesizing tileable textures from a single input image, which helps improve the quality of material rendering. Building upon recent work in neural fields, we also introduce a material representation that accurately encodes material reflectance while offering powerful editing and propagation capabilities. In addition to reflectance, we present a novel method for global illumination encoding that leverages carefully designed generative models to achieve significantly faster sampling than previous work. Finally, we propose two innovative methods for low-cost material digitization. With flatbed scanners as our capture device, we present a generative model that can provide high-resolution material reflectance estimations using a single image as input, while introducing an uncertainty quantification algorithm that increases its reliability and efficiency. Additionally, we present a novel method for digitizing fabric mechanical properties using depth images as input, which we extend with a perceptually-validated drape similarity metric. Overall, the contributions of this thesis represent significant advances in the fields of radiance encoding and digital material acquisition and edition, enhancing the quality, scalability, and efficiency of physically-based rendering pipelines.
Physically Based Modeling of Micro-Appearance
(University of Bonn, 2023-07-11) Huang, Weizhen
This dissertation addresses the challenges of creating photorealistic images by focusing on generating and rendering microscale details and irregularities, because a lack of such imperfections is usually the key aspect of telling a photograph from a synthetic, computer-generated image. In Chapter 3, we model the fluid flow on soap bubbles, which demonstrate iridescent beauty due to their micrometer-scale thickness. Instead of approximating the variation in film thickness with random noise textures, this work incorporates the underlying mechanics that drive such fluid flow, namely the Navier-Stokes equations, which include factors such as surfactant concentration, Marangoni surface tension, and evaporation. We address challenges such as the singularity at poles in spherical coordinates and the need for extremely small step sizes in a stiff system to simulate a wide range of dynamic effects. As a result, our approach produces soap bubble renderings that match real-world footage. Chapter 4 explores hair rendering. Existing models based on the Marschner model split the scattering function into a longitudinal and an azimuthal component. While this separation benefits importance sampling, it lacks a physical ground and does not match measurements. We propose a novel physically based hair scattering model, representing hair as cylinders with microfacet roughness. We reveal that the focused highlight in the forward-scattering direction observed in the measurement is a result of the rough cylindrical geometry itself. Additionally, our model naturally extends to elliptical hair fibers. A much related topic, feather rendering, is discussed in Chapter 5. Unlike human hairs, feathers possess unique substructures, such as barbs and barbules with irregular cross-sections, the existing pipeline of modeling feathers using hair shaders therefore fails to accurately describe their appearance. We propose a model that directly accounts for these multi-scale geometries by representing feathers as collections of barb primitives and incorporating the contributions of barbule cross-sections using a normal distribution function. We demonstrate the effectiveness of our model on rock dove neck feathers, showing close alignment with measurements and photographs.
Hierarchical Gradient Domain Vector Field Processing
(Johns Hopkins University, 2023-07-31) Lee, Sing Chun
Vector fields are a fundamental mathematical construct for describing flow-field-related problems in science and engineering. To solve these types of problems effectively on a discrete surface, various vector field representations are proposed using finite dimensional bases, a discrete connection, and an operator approach. Furthermore, for computational efficiency, quadratic Dirichlet energy is preferred to measure the smoothness of the vector field in the gradient domain. However, while quadratic energy gives a simple linear system, it does not support real-time vector field processing on a high-resolution mesh without extensive GPU parallelization. To this end, this dissertation describes an efficient hierarchical solver for vector field processing. Our method extends the successful multigrid design for interactive signal processing on meshes using an induced vector field prolongation combing it with novel speedup techniques. We formulate a general way for extending scalar field prolongation to vector fields. Focusing on triangle meshes, our convergence study finds that a standard multigrid does not achieve fast convergence due to the poorly-conditioned system matrix. We observe a similar performance in standard single-level iterative methods such as the Jacobi, Gauss-Seidel, and conjugate gradient methods. Therefore, we compare three speedup techniques -- successive over-relaxation, smoothed prolongation, and Krylov subspace update, and incorporate them into our solver. Finally, we demonstrate our solver on useful applications such as logarithmic map computation and discuss the applications to other hierarchies such as texture grids, followed by the conclusion and future work.

Browse

Browsing 2023 by Issue Date

Results Per Page

Sort Options