PG2023 Short Papers and Posters

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/3543885

Browse

Now showing 1 - 20 of 26

Automatic Vector Caricature via Face Parametrization
(The Eurographics Association, 2023) Madono, Koki; Hold-Geoffroy, Yannick; Li, Yijun; Ito, Daichi; Echevarria, Jose; Smith, Cameron; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Automatic caricature generation is a challenging task that aims to emphasize the subject's facial characteristics while preserving its identity. Due to the complexity of the task, caricatures could exclusively be performed by a trained artist. Recent developments in deep learning have achieved promising results in capturing artistic styles. Despite the success, current methods still struggle to accurately capture the whimsical aspect of caricatures while preserving identity. In this work, we propose Parametric Caricature, the first parametric-based caricature generation that yields vectorized and animatable caricatures. We devise several hundred parameters to encode facial traits, which our method directly predicts instead of estimating the raster caricature like previous methods. To guide the attention of the method, we segment the different parts of the face and retrieve the most similar parts from an artist-made database of caricatures. Our method proposes visually appealing caricatures more adapted to use as avatars than existing methods, as demonstrated by our user study.
Avatar Emotion Recognition using Non-verbal Communication
(The Eurographics Association, 2023) Bazargani, Jalal Safari; Sadeghi-Niaraki, Abolghasem; Choi, Soo-Mi; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Among the sources of information about emotions, body movements, recognized as ''kinesics'' in non-verbal communication, have received limited attention. This research gap suggests the need to investigate suitable body movement-based approaches for making communication in virtual environments more realistic. Therefore, this study proposes an automated emotion recognition approach suitable for use in virtual environments. This study consists of two pipelines for emotion recognition. For the first pipeline, i.e., upper-body keypoint-based recognition, the HEROES video dataset was employed to train a bidirectional long short-term memory model using upper-body keypoints capable of predicting four discrete emotions: boredom, disgust, happiness, and interest, achieving an accuracy of 84%. For the second pipeline, i.e., wrist-movement-based recognition, a random forest model was trained based on 17 features computed from acceleration data of wrist movements along each axis. The model achieved an accuracy of 63% in distinguishing three discrete emotions: sadness, neutrality, and happiness. The findings suggest that the proposed approach is a noticeable step toward automated emotion recognition, without using any additional sensors other than the head mounted display (HMD).
Color3d: Photorealistic Texture Mapping for 3D Mesh
(The Eurographics Association, 2023) Zhao, Chenxi; Fan, Chuanmao; Mohadikar, Payal; Duan, Ye; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
3D reconstruction plays a significant role in various fields, including medical imaging, architecture, and forensic science, in both research and industry. The quality of color is one of the criteria that determine reconstruction performance. However, the predicted color from deep learning often suffers from low quality and a lack of details. While traditional texture mapping methods can provide superior color, they are restricted by mesh quality. In this study, we propose Color3D, a comprehensive procedure that applies photorealistic colors to the reconstructed mesh, accommodating both static objects and animations. The necessary inputs include multiview RGB images, depth images, camera poses, and camera intrinsic. Compared to traditional methods, our approach replaces face colors directly from the texture map with vertex colors from multiview images. The colors of the faces are obtained by interpolating the vertex colors of each triangle. Our method can generate high-quality color for different objects, and the performance remains strong even when the input mesh is not perfect.
Combining Transformer and CNN for Super-Resolution of Animal Fiber Microscopy Images
(The Eurographics Association, 2023) Li, Jiagen; Ji, Yatu; Lu, Min; Wang, Li; Dai, Lingjie; Xu, Xuanxuan; Wu, Nier; Liu, Na; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
The images of cashmere and wool fibers used for scientific research in the textile field are mostly acquired manually under an optical microscope. However, due to the interference of microscope quality, shooting environment, focal length selection, acquisition techniques and other factors, the quality of the obtained photographs tends to have a low resolution, and it is difficult to display the fine fiber texture structure and scale details. To address the above problems, a lightweight super-resolution reconstruction algorithm with multi-scale hierarchical screening is proposed. Specifically, firstly, a hybrid module incorporating SwinTransformer and enhanced channel attention is proposed to extract the global features and obtain the important localization among them, in addition, a multi-scale hierarchical screening filtering module is proposed based on the residual model, which amplifies the feature information focusing on high-frequency regions by splitting the channels to allow the model to adaptively weight the features according to the feature weights and amplifies the feature information focusing on high-frequency regions. Finally, the global average pooling attention module integrates and weights the high-frequency features again to enhance details such as edges and textures. A large number of experiments show that compared with other state-of-the-art algorithms, the proposed method significantly improves the image quality on the fiber dataset, and at the same time proves the effectiveness of the proposed method at all scales in five public datasets, occupies less memory parameters than SwinIR, and not only improves the PSNR and SSIM, but also reduces the parameters compared with the light-weight ESRT.
DASKEL: An Interactive Choreographical System with Labanotation-Skeleton Translation
(The Eurographics Association, 2023) Luo, Siyuan; Yu, Borou; Wang, Zeyu; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
We propose DASKEL, a real-time interactive choreography system with bidirectional human skeleton-Labanotation conversion. DASKEL fuses dance notation (DA) with human skeleton data (SKEL). Our approach connects dance data represented in Labanotation with motion capture skeleton data in BVH format, facilitating seamless bidirectional conversion between the two formats. Moreover, DASKEL introduces a numerical representation for symbols used in Labanotation and supports their intuitive visualization which augments the practicality and applicability. Previous methods for the conversion between Labanotation and human skeleton only support the upper body, and our approach generalizes to the bidirectional conversion for the whole body. To generate more accurate and human-like dance postures, we integrate kinematic methods with physics-based simulation, resulting in more natural character animations generated from dance notations.
Detection of Impurities in Wool Based on Improved YOlOV8
(The Eurographics Association, 2023) Liu, Yang; Ji, Yatu; Ren, Qing Dao Er Ji; Shi, Bao; Zhuang, Xufei; Yao, Miaomiao; Li, Xiaomei; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
In the current production process of wool products, the cleaning of wool raw materials has been realized in an automated way. However, detecting whether the washed and dried wool still contains excessive impurities still requires manual testing. This method greatly reduces production efficiency. To solve the problem of detecting wool impurities, we propose an improved model based on YOLOv8. Our work applied some techniques to solve the low resource model training problem, and incorporated a block for small object detection into the new neural network structure. The newly proposed model achieved an accuracy of 84.3% on the self built dataset and also achieved good results on the VisDrone2019 dataset.
Emotion-based Interaction Technique Using User's Voice and Facial Expressions in Virtual and Augmented Reality
(The Eurographics Association, 2023) Ko, Beom-Seok; Kang, Ho-San; Lee, Kyuhong; Braunschweiler, Manuel; Zünd, Fabio; Sumner, Robert W.; Choi, Soo-Mi; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
This paper presents a novel interaction approach based on a user's emotions within augmented reality (AR) and virtual reality (VR) environments to achieve immersive interaction with virtual intelligent characters. To identify the user's emotions through voice, the Google Speech-to-Text API is used to transcribe speech and then the RoBERTa language processing model is utilized to classify emotions. In AR environment, the intelligent character can change the styles and properties of objects based on the recognized user's emotions during a dialog. On the other side, in VR environment, the movement of the user's eyes and lower face is tracked by VIVE Pro Eye and Facial Tracker, and EmotionNet is used for emotion recognition. Then, the virtual environment can be changed based on the recognized user's emotions. Our findings present an interesting idea for integrating emotionally intelligent characters in AR/VR using generative AI and facial expression recognition.
Feature-Sized Sampling for Vector Line Art
(The Eurographics Association, 2023) Ohrhallinger, Stefan; Parakkat, Amal Dev; Memari, Pooran; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
By introducing a first-of-its-kind quantifiable sampling algorithm based on feature size, we present a fresh perspective on the practical aspects of planar curve sampling. Following the footsteps of e-sampling, which was originally proposed in the context of curve reconstruction to offer provable topological guarantees [ABE98] under quantifiable bounds, we propose an arbitrarily precise e-sampling algorithm for sampling smooth planar curves (with a prior bound on the minimum feature size of the curve). This paper not only introduces the first such algorithm which provides user-control and quantifiable precision but also highlights the importance of such a sampling process under two key contexts: 1) To conduct a first study comparing theoretical sampling conditions with practical sampling requirements for reconstruction guarantees that can further be used for analysing the upper bounds of e for various reconstruction algorithms with or without proofs, 2) As a feature-aware sampling of vector line art that can be used for applications such as coloring and meshing.
Generalizable Dynamic Radiance Fields For Talking Head Synthesis With Few-shot
(The Eurographics Association, 2023) Dang, Rujing; Wang, Shaohui; Wang, Haoqian; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Audio-driven talking head generation has wide applications in virtual games, hosts, online meetings, etc. Recently, great achievements have been made in synthesizing talking heads based on neural radiance fields. However, the existing few-shot talking head synthesis methods still suffer from inaccurate deformation and lack of visual consistency. Therefore, we propose a Generalizable Dynamic Radiance Field (GDRF), which can rapidly generalize to unseen identities with few-shot. We introduce a warping module with 3D constraints to act in feature volume space, which is identity adaptive and exhibits excellent shape-shifting abilities. Our method can generate more accurately deformed and view consistent target images compared to previous methods. Furthermore, we map the audio signal to 3DMM parameters by applying an LSTM network, which helps get long-term context and generate more continuous and natural video. Extensive experiments demonstrate the superiority of our proposed method.
Hand Shadow Art: A Differentiable Rendering Perspective
(The Eurographics Association, 2023) Gangopadhyay, Aalok; Singh, Prajwal; Tiwari, Ashish; Raman, Shanmuganathan; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Shadow art is an exciting form of sculptural art that produces captivating artistic effects through the 2D shadows cast by 3D shapes. Hand shadows, also known as shadow puppetry or shadowgraphy, involve creating various shapes and figures using your hands and fingers to cast meaningful shadows on a wall. In this work, we propose a differentiable rendering-based approach to deform hand models such that they cast a shadow consistent with a desired target image and the associated lighting configuration. We showcase the results of shadows cast by a pair of two hands and the interpolation of hand poses between two desired shadow images. We believe that this work will be a useful tool for the graphics community.
Local Positional Encoding for Multi-Layer Perceptrons
(The Eurographics Association, 2023) Fujieda, Shin; Yoshimura, Atsushi; Harada, Takahiro; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
A multi-layer perceptron (MLP) is a type of neural networks which has a long history of research and has been studied actively recently in computer vision and graphics fields. One of the well-known problems of an MLP is the capability of expressing highfrequency signals from low-dimensional inputs. There are several studies for input encodings to improve the reconstruction quality of an MLP by applying pre-processing against the input data. This paper proposes a novel input encoding method, local positional encoding, which is an extension of positional and grid encodings. Our proposed method combines these two encoding techniques so that a small MLP learns high-frequency signals by using positional encoding with fewer frequencies under the lower resolution of the grid to consider the local position and scale in each grid cell. We demonstrate the effectiveness of our proposed method by applying it to common 2D and 3D regression tasks where it shows higher-quality results compared to positional and grid encodings, and comparable results to hierarchical variants of grid encoding such as multi-resolution grid encoding with equivalent memory footprint.
Multi-scale Monocular Panorama Depth Estimation
(The Eurographics Association, 2023) Mohadikar, Payal; Fan, Chuanmao; Zhao, Chenxi; Duan, Ye; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Panorama images are widely used for scene depth estimation as they provide comprehensive scene representation. The existing deep-learning monocular panorama depth estimation networks produce inconsistent, discontinuous, and poor-quality depth maps. To overcome this, we propose a novel multi-scale monocular panorama depth estimation framework. We use a coarseto- fine depth estimation approach, where multi-scale tangent perspective images, projected from 360 images, are given to coarse and fine encoder-decoder networks to produce multi-scale perspective depth maps, that are merged to get low and high-resolution 360 depth maps. The coarse branch extracts holistic features that guide fine branch extracted features using a Multi-Scale Feature Fusion (MSFF) module at the network bottleneck. The performed experiments on the Stanford2D3D benchmark dataset show that our model outperforms the existing methods, producing consistent, smooth, structure-detailed, and accurate depth maps.
Multi-Stage Degradation and Content Embedding Fusion for Blind Super-Resolution
(The Eurographics Association, 2023) Zhang, Haiyang; Jiang, Mengyu; Liu, Liang; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
To achieve promising results on blind image super-resolution (SR), some Unsupervised Degradation Prediction (UDP) methods narrow the domain gap between the degradation embedding space and the SR feature space by fusing the degradation embedding with the additional content embedding before multi-stage SR. However, fusing these two embeddings before multi-stage SR is inflexible, due to the variation of the domain gap at each SR stage. To address this issue, we propose the Multi-Stage Degradation and Content Embedding Fusion (MDCF), which adaptively fuses the degradation embedding with the content embedding at each SR stage rather than before multi-stage SR. Based on the MDCF, we introduce a novel UDP method, called MDCFnet, which contains an additional Dual-Path Local and Global encoder (DPLG) to extract the degradation embedding and the content embedding separately. Specially, DPLG diversifies receptive fields to enrich the degradation embedding and combines local and global features to optimize the content embedding. Extensive experiments on real images and several benchmarks demonstrate that the proposed MDCFnet can outperform the existing UDP methods and achieve competitive performance on PSNR and SSIM even compared with the state-of-the-art SKP methods.
Pacific Graphics 2023 - Short Papers and Posters: Frontmatter
(The Eurographics Association, 2023) Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Progressive Graph Matching Network for Correspondences
(The Eurographics Association, 2023) Feng, Huihang; Liu, Lupeng; Xiao, Jun; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
This paper presents a progressive graph matching network shorted as PGMNet. The method is more explainable and can match features from easy to hard. PGMNet contains two major blocks: sinkformers module and guided attention module. First, we use sinkformers to get the similar matrix which can be seen as an assignment matrix between two sets of feature keypoints. Matches with highest scores in both rows and columns are selected as pre-matched correspondences. These pre-matched matches can be leveraged to guide the update and matching of ambiguous features. The matching quality can be progressively improved as the the transformer blocks go deeper as visualized in Figure 1. Experiments show that our method achieves better results with typical attention-based methods.
Reconstructing Baseball Pitching Motions from Video
(The Eurographics Association, 2023) Kim, Jiwon; Kim, Dongkwon; Yu, Ri; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Baseball is one of the most loved sports in the world. In baseball game, the pitcher's control ability is a key factor for determining the outcome of the game. There are a lot of video data shooting baseball games, and learning baseball pitching motions from video can be possible thanks to the pose estimation techniques. However, reconstructing pitching motions using pose estimators is challenging. When we watch a baseball game, motion blur occurs inevitably because the pitcher throws a ball into the strike zone as fast as possible. To tackle this problem, We propose a framework using physics simulation and deep reinforcement learning to reconstruct baseball pitching motions based on unsatisfactory poses estimated from video. We set the target point and design rewards to encourage the character to throw the ball to the target point. Consequently, we can reconstruct plausible pitching motion.
Revisiting Visualization Evaluation Using EEG and Visualization Literacy Assessment Test
(The Eurographics Association, 2023) Yim, Soobin; Jung, Chanyoung; Yoon, Chanyoung; Yoo, Sangbong; Choi, Seongwon; Jang, Yun; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Using EEG signals, also known as Electroencephalogram, can provide a quantitative measure of human cognitive load, making it an effective tool for evaluating visualization. However, the suitability of EEG for visualization evaluation has not been verified in previous studies. This paper investigates the feasibility of utilizing EEG data in visualization evaluation by comparing previous experiments. We trained and estimated individual CNN models for each subject using the EEG data. Our study demonstrates that EEG-based visualization evaluation provides a more feasible estimate of the difficulties experienced by subjects during the visualization task compared to previous studies that used accuracy and response time.
A Simple Stochastic Regularization Technique for Avoiding Overfitting in Low Resource Image Classification
(The Eurographics Association, 2023) Ji, Ya Tu; Wang, Bai Lun; Ren, Qing Dao Er Ji; Shi, Bao; Wu, Nier E.; Lu, Min; Liu, Na; Zhuang, Xu Fei; Xu, Xuan Xuan; Wang, Li; Dai, Ling Jie; Yao, Miao Miao; Li, Xiao Mei; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Drop type technique, as a method that can effectively regulate the co-adaptations and prediction ability of neural network units, is widely used in model parameter optimization to reduce overfitting problems. However, low resource image classification faces serious overfitting problems, and the data sparsity problem weakens or even disappears the effectiveness of most regularization methods. This paper is inspired by the value iteration strategy and attempts a Drop type method based on Metcalfe's law, named Metcalfe-Drop. The experimental results indicate that using Metcalfe-Drop technique as a basis to determine parameter sharing is more effective than randomly controlling neurons according to a certain probability. Our code is available at https://gitee.com/giteetu/metcalfe-drop.git.
Sketch-to-Architecture: Generative AI-aided Architectural Design
(The Eurographics Association, 2023) Li, Pengzhi; Li, Baijuan; Li, Zhiheng; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Recently, the development of large-scale models has paved the way for various interdisciplinary research, including architecture. By using generative AI, we present a novel workflow that utilizes AI models to generate conceptual floorplans and 3D models from simple sketches, enabling rapid ideation and controlled generation of architectural renderings based on textual descriptions. Our work demonstrates the potential of generative AI in the architectural design process, pointing towards a new direction of computer-aided architectural design.
SketchBodyNet: A Sketch-Driven Multi-faceted Decoder Network for 3D Human Reconstruction
(The Eurographics Association, 2023) Wang, Fei; Tang, Kongzhang; Wu, Hefeng; Zhao, Baoquan; Cai, Hao; Zhou, Teng; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Reconstructing 3D human shapes from 2D images has received increasing attention recently due to its fundamental support for many high-level 3D applications. Compared with natural images, freehand sketches are much more flexible to depict various shapes, providing a high potential and valuable way for 3D human reconstruction. However, such a task is highly challenging. The sparse abstract characteristics of sketches add severe difficulties, such as arbitrariness, inaccuracy, and lacking image details, to the already badly ill-posed problem of 2D-to-3D reconstruction. Although current methods have achieved great success in reconstructing 3D human bodies from a single-view image, they do not work well on freehand sketches. In this paper, we propose a novel sketch-driven multi-faceted decoder network termed SketchBodyNet to address this task. Specifically, the network consists of a backbone and three separate attention decoder branches, where a multi-head self-attention module is exploited in each decoder to obtain enhanced features, followed by a multi-layer perceptron. The multi-faceted decoders aim to predict the camera, shape, and pose parameters, respectively, which are then associated with the SMPL model to reconstruct the corresponding 3D human mesh. In learning, existing 3D meshes are projected via the camera parameters into 2D synthetic sketches with joints, which are combined with the freehand sketches to optimize the model. To verify our method, we collect a large-scale dataset of about 26k freehand sketches and their corresponding 3D meshes containing various poses of human bodies from 14 different angles. Extensive experimental results demonstrate our SketchBodyNet achieves superior performance in reconstructing 3D human meshes from freehand sketches.

Browse

Browsing PG2023 Short Papers and Posters by Title

Results Per Page

Sort Options