41-Issue 7

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/2633207

Browse

Now showing 1 - 20 of 57

Abstract Painting Synthesis via Decremental optimization
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Yan, Ming; Pu, Yuanyuan; Zhao, Pengzheng; Xu, Dan; Wu, Hao; Yang, Qiuxia; Wang, Ruxin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Existing stroke-based painting synthesis methods usually fail to achieve good results with limited strokes because these methods use semantically irrelevant metrics to calculate the similarity between the painting and photo domains. Hence, it is hard to see meaningful semantical information from the painting. This paper proposes a painting synthesis method that uses a CLIP (Contrastive-Language-Image-Pretraining) model to build a semantically-aware metric so that the cross-domain semantic similarity is explicitly involved. To ensure the convergence of the objective function, we design a new strategy called decremental optimization. Specifically, we define painting as a set of strokes and use a neural renderer to obtain a rasterized painting by optimizing the stroke control parameters through a CLIP-based loss. The optimization process is initialized with an excessive number of brush strokes, and the number of strokes is then gradually reduced to generate paintings of varying levels of abstraction. Experiments show that our method can obtain vivid paintings, and the results are better than the comparison stroke-based painting synthesis methods when the number of strokes is limited.
BareSkinNet: De-makeup and De-lighting via 3D Face Reconstruction
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Yang, Xingchao; Taketomi, Takafumi; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
We propose BareSkinNet, a novel method that simultaneously removes makeup and lighting influences from the face image. Our method leverages a 3D morphable model and does not require a reference clean face image or a specified light condition. By combining the process of 3D face reconstruction, we can easily obtain 3D geometry and coarse 3D textures. Using this information, we can infer normalized 3D face texture maps (diffuse, normal, roughness, and specular) by an image-translation network. Consequently, reconstructed 3D face textures without undesirable information will significantly benefit subsequent processes, such as re-lighting or re-makeup. In experiments, we show that BareSkinNet outperforms state-of-the-art makeup removal methods. In addition, our method is remarkably helpful in removing makeup to generate consistent high-fidelity texture maps, which makes it extendable to many realistic face generation applications. It can also automatically build graphic assets of face makeup images before and after with corresponding 3D data. This will assist artists in accelerating their work, such as 3D makeup avatar creation.
Classifier Guided Temporal Supersampling for Real-time Rendering
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Guo, Yu-Xiao; Chen, Guojun; Dong, Yue; Tong, Xin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
We present a learning based temporal supersampling algorithm for real-time rendering. Different from existing learning-based approaches that adopt an end-to-end training of a 'black-box' neural network, we design a 'white-box' solution that first classifies the pixels into different categories and then generates the supersampling result based on classification. Our key observation is that the core problem in temporal supersampling for rendering is to distinguish the pixels that consist of occlusion, aliasing, or shading changes. Samples from these pixels exhibit similar temporal radiance change but require different composition strategies to produce the correct supersampling result. Based on this observation, our method first classifies the pixels into several classes. Based on the classification results, our method then blends the current frame with the warped last frame via a learned weight map to get the supersampling results. We design compact neural networks for each step and develop dedicated loss functions for pixels belonging to different classes. Compared to existing learning based methods, our classifier-based supersampling scheme takes less computational and memory cost for real-time supersampling and generates visually compelling temporal supersampling results with fewer flickering artifacts. We evaluate the performance and generality of our method on several rendered game sequences and our method can upsample the rendered frames from 1080P to 2160P in just 13.39ms on a single Nvidia 3090GPU.
Color-mapped Noise Vector Fields for Generating Procedural Micro-patterns
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Grenier, Charline; Sauvage, Basile; Dischler, Jean-Michel; Thery, Sylvain; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Stochastic micro-patterns successfully enhance the realism of virtual scenes. Procedural models using noise combined with transfer functions are extremely efficient. However, most patterns produced today employ 1D transfer functions, which assign color, transparency, or other material attributes, based solely on the single scalar quantity of noise. Multi-dimensional transfer functions have received widespread attention in other fields, such as scientific volume rendering. But their potential has not yet been well explored for modeling micro-patterns in the field of procedural texturing. We propose a new procedural model for stochastic patterns, defined as the composition of a bi-dimensional transfer function (a.k.a. color-map) with a stochastic vector field. Our model is versatile, as it encompasses several existing procedural noises, including Gaussian noise and phasor noise. It also generates a much larger gamut of patterns, including locally structured patterns which are notoriously difficult to reproduce. We leverage the Gaussian assumption and a tiling and blending algorithm to provide real-time generation and filtering. A key contribution is a real-time approximation of the second order statistics over an arbitrary pixel footprint, which enables, in addition, the filtering of procedural normal maps. We exhibit a wide variety of results, including Gaussian patterns, profiled waves, concentric and non-concentric patterns.
Contrastive Semantic-Guided Image Smoothing Network
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Wang, Jie; Wang, Yongzhen; Feng, Yidan; Gong, Lina; Yan, Xuefeng; Xie, Haoran; Wang, Fu Lee; Wei, Mingqiang; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Image smoothing is a fundamental low-level vision task that aims to preserve salient structures of an image while removing insignificant details. Deep learning has been explored in image smoothing to deal with the complex entanglement of semantic structures and trivial details. However, current methods neglect two important facts in smoothing: 1) naive pixel-level regression supervised by the limited number of high-quality smoothing ground-truth could lead to domain shift and cause generalization problems towards real-world images; 2) texture appearance is closely related to object semantics, so that image smoothing requires awareness of semantic difference to apply adaptive smoothing strengths. To address these issues, we propose a novel Contrastive Semantic-Guided Image Smoothing Network (CSGIS-Net) that combines both contrastive prior and semantic prior to facilitate robust image smoothing. The supervision signal is augmented by leveraging undesired smoothing effects as negative teachers, and by incorporating segmentation tasks to encourage semantic distinctiveness. To realize the proposed network, we also enrich the original VOC dataset with texture enhancement and smoothing labels, namely VOC-smooth, which first bridges image smoothing and semantic segmentation. Extensive experiments demonstrate that the proposed CSGIS-Net outperforms state-of-the-art algorithms by a large margin. Code and dataset are available at https://github.com/wangjie6866/CSGIS-Net.
Depth-Aware Shadow Removal
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Fu, Yanping; Gai, Zhenyu; Zhao, Haifeng; Zhang, Shaojie; Shan, Ying; Wu, Yang; Tang, Jin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Shadow removal from a single image is an ill-posed problem because shadow generation is affected by the complex interactions of geometry, albedo, and illumination. Most recent deep learning-based methods try to directly estimate the mapping between the non-shadow and shadow image pairs to predict the shadow-free image. However, they are not very effective for shadow images with complex shadows or messy backgrounds. In this paper, we propose a novel end-to-end depth-aware shadow removal method without using depth images, which estimates depth information from RGB images and leverages the depth feature as guidance to enhance shadow removal and refinement. The proposed framework consists of three components, including depth prediction, shadow removal, and boundary refinement. First, the depth prediction module is used to predict the corresponding depth map of the input shadow image. Then, we propose a new generative adversarial network (GAN) method integrated with depth information to remove shadows in the RGB image. Finally, we propose an effective boundary refinement framework to alleviate the artifact around boundaries after shadow removal by depth cues. We conduct experiments on several public datasets and real-world shadow images. The experimental results demonstrate the efficiency of the proposed method and superior performance against state-of-the-art methods.
DiffusionPointLabel: Annotated Point Cloud Generation with Diffusion Model
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Li, Tingting; Fu, Yunfei; Han, Xiaoguang; Liang, Hui; Zhang, Jian Jun; Chang, Jian; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Point cloud generation aims to synthesize point clouds that do not exist in supervised dataset. Generating a point cloud with certain semantic labels remains an under-explored problem. This paper proposes a formulation called DiffusionPointLabel, which completes point-label pair generation based on a DDPM generative model (Denoising Diffusion Probabilistic Model). Specifically, we use a point cloud diffusion generative model and aggregate the intermediate features of the generator. On top of this, we propose Feature Interpreter that transforms intermediate features into semantic labels. Furthermore, we employ an uncertainty measure to filter unqualified point-label pairs for a better quality of generated point cloud dataset. Coupling these two designs enables us to automatically generate annotated point clouds, especially when supervised point-labels pairs are scarce. Our method extends the application of point cloud generation models and surpasses state-of-the-art models.
A Drone Video Clip Dataset and its Applications in Automated Cinematography
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Ashtari, Amirsaman; Jung, Raehyuk; Li, Mingxiao; Noh, Junyong; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Drones became popular video capturing tools. Drone videos in the wild are first captured and then edited by humans to contain aesthetically pleasing camera motions and scenes. Therefore, edited drone videos have extremely useful information for cinematography and for applications such as camera path planning to capture aesthetically pleasing shots. To design intelligent camera path planners, learning drone camera motions from these edited videos is essential. However, first, this requires to filter drone clips and extract their camera motions out of these edited videos that commonly contain both drone and non-drone content. Moreover, existing video search engines return the whole edited video as a semantic search result and cannot return only drone clips inside an edited video. To address this problem, we proposed the first approach that can automatically retrieve drone clips from an unlabeled video collection using high-level search queries, such as ''drone clips captured outdoor in daytime from rural places". The retrieved clips also contain camera motions, camera view, and 3D reconstruction of a scene that can help develop intelligent camera path planners. To train our approach, we needed numerous examples of edited drone videos. To this end, we introduced the first large-scale dataset composed of edited drone videos. This dataset is also used for training and validating our drone video filtering algorithm. Both quantitative and qualitative evaluations have confirmed the validity of our method.
Effective Eyebrow Matting with Domain Adaptation
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Wang, Luyuan; Zhang, Hanyuan; Xiao, Qinjie; Xu, Hao; Shen, Chunhua; Jin, Xiaogang; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
We present the first synthetic eyebrow matting datasets and a domain adaptation eyebrow matting network for learning domain-robust feature representation using synthetic eyebrow matting data and unlabeled in-the-wild images with adversarial learning. Different from existing matting methods that may suffer from the lack of ground-truth matting datasets, which are typically labor-intensive to annotate or even worse, unable to obtain, we train the matting network in a semi-supervised manner using synthetic matting datasets instead of ground-truth matting data while achieving high-quality results. Specifically, we first generate a large-scale synthetic eyebrow matting dataset by rendering avatars and collect a real-world eyebrow image dataset while maximizing the data diversity as much as possible. Then, we use the synthetic eyebrow dataset to train a multi-task network, which consists of a regression task to estimate the eyebrow alpha mattes and an adversarial task to adapt the learned features from synthetic data to real data. As a result, our method can successfully train an eyebrow matting network using synthetic data without the need to label any real data. Our method can accurately extract eyebrow alpha mattes from in-the-wild images without any additional prior and achieves state-of-the-art eyebrow matting performance. Extensive experiments demonstrate the superior performance of our method with both qualitative and quantitative results.
Efficient and Stable Simulation of Inextensible Cosserat Rods by a Compact Representation
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Zhao, Chongyao; Lin, Jinkeng; Wang, Tianyu; Bao, Hujun; Huang, Jin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Piecewise linear inextensible Cosserat rods are usually represented by Cartesian coordinates of vertices and quaternions on the segments. Such representations use excessive degrees of freedom (DOFs), and need many additional constraints, which causes unnecessary numerical difficulties and computational burden for simulation. We propose a simple yet compact representation that exactly matches the intrinsic DOFs and naturally satisfies all such constraints. Specifically, viewing a rod as a chain of rigid segments, we encode its shape as the Cartesian coordinates of its root vertex, and use axis-angle representation for the material frame on each segment. Under our representation, the Hessian of the implicit time-stepping has special non-zero patterns. Exploiting such specialties, we can solve the associated linear equations in nearly linear complexity. Furthermore, we carefully designed a preconditioner, which is proved to be always symmetric positive-definite and accelerates the PCG solver in one or two orders of magnitude compared with the widely used block-diagonal one. Compared with other technical choices including Super-Helices, a specially designed compact representation for inextensible Cosserat rods, our method achieves better performance and stability, and can simulate an inextensible Cosserat rod with hundreds of vertices and tens of collisions in real time under relatively large time steps.
Efficient Direct Isosurface Rasterization of Scalar Volumes
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Kreskowski, Adrian; Rendle, Gareth; Froehlich, Bernd; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
In this paper we propose a novel and efficient rasterization-based approach for direct rendering of isosurfaces. Our method exploits the capabilities of task and mesh shader pipelines to identify subvolumes containing potentially visible isosurface geometry, and to efficiently extract primitives which are consumed on the fly by the rasterizer. As a result, our approach requires little preprocessing and negligible additional memory. Direct isosurface rasterization is competitive in terms of rendering performance when compared with ray-marching-based approaches, and significantly outperforms them for increasing resolution in most situations. Since our approach is entirely rasterization based, it affords straightforward integration into existing rendering pipelines, while allowing the use of modern graphics hardware features, such as multi-view stereo for efficient rendering of stereoscopic image pairs for geometry-bound applications. Direct isosurface rasterization is suitable for applications where isosurface geometry is highly variable, such as interactive analysis scenarios for static and dynamic data sets that require frequent isovalue adjustment.
Efficient Texture Parameterization Driven by Perceptual-Loss-on-Screen
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Sun, Haoran; Wang, Shiyi; Wu, Wenhai; Jin, Yao; Bao, Hujun; Huang, Jin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Texture mapping is a ubiquitous technique to enrich the visual effect of a mesh, which represents the desired signal (e.g. diffuse color) on the mesh to a texture image discretized by pixels through a bijective parameterization. To achieve high visual quality, large number of pixels are generally required, which brings big burden in storage, memory and transmission. We propose to use a perceptual model and a rendering procedure to measure the loss coming from the discretization, then optimize a parameterization to improve the efficiency, i.e. using fewer pixels under a comparable perceptual loss. The general perceptual model and rendering procedure can be very complicated, and non-isotropic property rooted in the square shape of pixels make the problem more difficult to solve. We adopt a two-stage strategy and use the Bayesian optimization in the triangle-wise stage. With our carefully designed weighting scheme, the mesh-wise optimization can take the triangle-wise perceptual loss into consideration under a global conforming requirement. Comparing with many parameterizations manually designed, driven by interpolation error, or driven by isotropic energy, ours can use significantly fewer pixels with comparable perception loss or vise vesa.
EL-GAN: Edge-Enhanced Generative Adversarial Network for Layout-to-Image Generation
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Gao, Lin; Wu, Lei; Meng, Xiangxu; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Although some progress has been made in the layout-to-image generation of complex scenes with multiple objects, object-level generation still suffers from distortion and poor recognizability. We argue that this is caused by the lack of feature encodings for edge information during image generation. In order to solve these limitations, we propose a novel edge-enhanced Generative Adversarial Network for layout-to-image generation (termed EL-GAN). The feature encodings of edge information are learned from the multi-level features output by the generator and iteratively optimized along the generator's pipeline. Two new components are included at each generator level to enable multi-scale learning. Specifically, one is the edge generation module (EGM), which is responsible for converting the output of the multi-level features by the generator into images of different scales and extracting their edge maps. The other is the edge fusion module (EFM), which integrates the feature encodings refined from the edge maps into the subsequent image generation process by modulating the parameters in the normalization layers. Meanwhile, the discriminator is fed with frequency-sensitive image features, which greatly enhances the generation quality of the image's high-frequency edge contours and low-frequency regions. Extensive experiments show that EL-GAN outperforms the state-of-the-art methods on the COCO-Stuff and Visual Genome datasets. Our source code is available at https://github.com/Azure616/EL-GAN.
Exploring Contextual Relationships in 3D Cloud Points by Semantic Knowledge Mining
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Chen, Lianggangxu; Lu, Jiale; Cai, Yiqing; Wang, Changbo; He, Gaoqi; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
3D scene graph generation (SGG) aims to predict the class of objects and predicates simultaneously in one 3D point cloud scene with instance segmentation. Since the underlying semantic of 3D point clouds is spatial information, recent ideas of the 3D SGG task usually face difficulties in understanding global contextual semantic relationships and neglect the intrinsic 3D visual structures. To build the global scope of semantic relationships, we first propose two types of Semantic Clue (SC) from entity level and path level, respectively. SC can be extracted from the training set and modeled as the co-occurrence probability between entities. Then a novel Semantic Clue aware Graph Convolution Network (SC-GCN) is designed to explicitly model each SC of which the message is passed in their specific neighbor pattern. For constructing the interactions between the 3D visual and semantic modalities, a visual-language transformer (VLT) module is proposed to jointly learn the correlation between 3D visual features and class label embeddings. Systematic experiments on the 3D semantic scene graph (3DSSG) dataset show that our full method achieves state-of-the-art performance.
Eye-Tracking-Based Prediction of User Experience in VR Locomotion Using Machine Learning
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Gao, Hong; Kasneci, Enkelejda; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
VR locomotion is one of the most important design features of VR applications and is widely studied. When evaluating locomotion techniques, user experience is usually the first consideration, as it provides direct insights into the usability of the locomotion technique and users' thoughts about it. In the literature, user experience is typically measured with post-hoc questionnaires or surveys, while users' behavioral (i.e., eye-tracking) data during locomotion, which can reveal deeper subconscious thoughts of users, has rarely been considered and thus remains to be explored. To this end, we investigate the feasibility of classifying users experiencing VR locomotion into L-UE and H-UE (i.e., low- and high-user-experience groups) based on eye-tracking data alone. To collect data, a user study was conducted in which participants navigated a virtual environment using five locomotion techniques and their eye-tracking data was recorded. A standard questionnaire assessing the usability and participants' perception of the locomotion technique was used to establish the ground truth of the user experience. We trained our machine learning models on the eye-tracking features extracted from the time-series data using a sliding window approach. The best random forest model achieved an average accuracy of over 0.7 in 50 runs. Moreover, the SHapley Additive exPlanations (SHAP) approach uncovered the underlying relationships between eye-tracking features and user experience, and these findings were further supported by the statistical results. Our research provides a viable tool for assessing user experience with VR locomotion, which can further drive the improvement of locomotion techniques. Moreover, our research benefits not only VR locomotion, but also VR systems whose design needs to be improved to provide a good user experience.
Fine-Grained Memory Profiling of GPGPU Kernels
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Buelow, Max von; Guthe, Stefan; Fellner, Dieter W.; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-theart memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.
Fine-Grained Scene Graph Generation with Overlap Region and Geometrical Center
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Zhao, Yong Qiang; Jin, Zhi; Zhao, Hai Yan; Zhang, Feng; Tao, Zheng Wei; Dou, Cheng Feng; Xu, Xin Hai; Liu, Dong Hong; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Scene graph generation refers to the task of identifying the objects and specifically the relationships between the objects from an image. Existing scene graph generation methods generally use the bounding boxes region features of objects to identify the relationships between objects. However, we feel that the overlap region features of two objects may play an important role in fine-grained relationship identification. In fact, some fine-grained relationships can only be obtained from the overlap region features of two objects. Therefore, we propose the Multi-Branch Feature Combination (MFC) module and Overlap Region Transformer (ORT) module to comprehensively obtain the visual features contained in the overlap regions of two objects. Concretely, the MFC module uses deconvolution and multi-branch dilation convolution to obtain high-pixels and multi-receptive field features in the overlap regions. The ORT module uses the vision transformer to obtain the self-attention of the overlap regions. The joint use of these two modules achieves the mutual complementation of local connectivity properties of convolution and the global connectivity properties of attention. We also design a Geometrical Center Augmented (GCA) module to obtain the relative position information of the geometric centers between two objects, to prevent the problem that only relying on the scale of the overlap region cannot accurately capture the relationship between two objects. Experiments show that our model ORGC (Overlap Region and Geometrical Center), the combination of the MFC module, the ORT module, and the GCA module, can enhance the performance of fine-grained relation identification. On the Visual Genome dataset, our model outperforms the current state-of-the-art model by 4.4% on the R@50 evaluation metric, reaching a state-of-the-art result of 33.88.
Generative Deformable Radiance Fields for Disentangled Image Synthesis of Topology-Varying Objects
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Wang, Ziyu; Deng, Yu; Yang, Jiaolong; Yu, Jingyi; Tong, Xin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
3D-aware generative models have demonstrated their superb performance to generate 3D neural radiance fields (NeRF) from a collection of monocular 2D images even for topology-varying object categories. However, these methods still lack the capability to separately control the shape and appearance of the objects in the generated radiance fields. In this paper, we propose a generative model for synthesizing radiance fields of topology-varying objects with disentangled shape and appearance variations. Our method generates deformable radiance fields, which builds the dense correspondence between the density fields of the objects and encodes their appearances in a shared template field. Our disentanglement is achieved in an unsupervised manner without introducing extra labels to previous 3D-aware GAN training. We also develop an effective image inversion scheme for reconstructing the radiance field of an object in a real monocular image and manipulating its shape and appearance. Experiments show that our method can successfully learn the generative model from unstructured monocular images and well disentangle the shape and appearance for objects (e.g., chairs) with large topological variance. The model trained on synthetic data can faithfully reconstruct the real object in a given single image and achieve high-quality texture and shape editing results.
Implicit Neural Deformation for Sparse-View Face Reconstruction
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Li, Moran; Huang, Haibin; Zheng, Yi; Li, Mengtian; Sang, Nong; Ma, Chongyang; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
In this work, we present a new method for 3D face reconstruction from sparse-view RGB images. Unlike previous methods which are built upon 3D morphable models (3DMMs) with limited details, we leverage an implicit representation to encode rich geometric features. Our overall pipeline consists of two major components, including a geometry network, which learns a deformable neural signed distance function (SDF) as the 3D face representation, and a rendering network, which learns to render on-surface points of the neural SDF to match the input images via self-supervised optimization. To handle in-the-wild sparse-view input of the same target with different expressions at test time, we propose residual latent code to effectively expand the shape space of the learned implicit face representation as well as a novel view-switch loss to enforce consistency among different views. Our experimental results on several benchmark datasets demonstrate that our approach outperforms alternative baselines and achieves superior face reconstruction results compared to state-of-the-art methods.
Joint Hand and Object Pose Estimation from a Single RGB Image using High-level 2D Constraints
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Song, Hao-Xuan; Mu, Tai-Jiang; Martin, Ralph R.; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Joint pose estimation of human hands and objects from a single RGB image is an important topic for AR/VR, robot manipulation, etc. It is common practice to determine both poses directly from the image; some recent methods attempt to improve the initial poses using a variety of contact-based approaches. However, few methods take the real physical constraints conveyed by the image into consideration, leading to less realistic results than the initial estimates. To overcome this problem, we make use of a set of high-level 2D features which can be directly extracted from the image in a new pipeline which combines contact approaches and these constraints during optimization. Our pipeline achieves better results than direct regression or contactbased optimization: they are closer to the ground truth and provide high quality contact.

Browse

Browsing 41-Issue 7 by Title

Results Per Page

Sort Options