Context-based Style Transfer of Tokenized Gestures

Kuriyama, Shigeru; Mukai, Tomohiko; Taketomi, Takafumi; Mukasa, Tomoyuki

Context-based Style Transfer of Tokenized Gestures

dc.contributor.author	Kuriyama, Shigeru	en_US
dc.contributor.author	Mukai, Tomohiko	en_US
dc.contributor.author	Taketomi, Takafumi	en_US
dc.contributor.author	Mukasa, Tomoyuki	en_US
dc.contributor.editor	Dominik L. Michels	en_US
dc.contributor.editor	Soeren Pirk	en_US
dc.date.accessioned	2022-08-10T15:20:04Z
dc.date.available	2022-08-10T15:20:04Z
dc.date.issued	2022
dc.description.abstract	Gestural animations in the amusement or entertainment field often require rich expressions; however, it is still challenging to synthesize characteristic gestures automatically. Although style transfer based on a neural network model is a potential solution, existing methods mainly focus on cyclic motions such as gaits and require re-training in adding new motion styles. Moreover, their per-pose transformation cannot consider the time-dependent features, and therefore motion styles of different periods and timings are difficult to be transferred. This limitation is fatal for the gestural motions requiring complicated time alignment due to the variety of exaggerated or intentionally performed behaviors. This study introduces a context-based style transfer of gestural motions with neural networks to ensure stable conversion even for exaggerated, dynamically complicated gestures. We present a model based on a vision transformer for transferring gestures' content and style features by time-segmenting them to compose tokens in a latent space. We extend this model to yield the probability of swapping gestures' tokens for style-transferring. A transformer model is suited to semantically consistent matching among gesture tokens, owing to the correlation with spoken words. The compact architecture of our network model requires only a small number of parameters and computational costs, which is suitable for real-time applications with an ordinary device. We introduce loss functions provided by the restoration error of identically and cyclically transferred gesture tokens and the similarity losses of content and style evaluated by splicing features inside the transformer. This design of losses allows unsupervised and zero-shot learning, by which the scalability for motion data is obtained. We comparatively evaluated our style transfer method, mainly focusing on expressive gestures using our dataset captured for various scenarios and styles by introducing new error metrics tailored for gestures. Our experiment showed the superiority of our method in numerical accuracy and stability of style transfer against the existing methods.	en_US
dc.description.number	8
dc.description.sectionheaders	Learning
dc.description.seriesinformation	Computer Graphics Forum
dc.description.volume	41
dc.identifier.doi	10.1111/cgf.14645
dc.identifier.issn	1467-8659
dc.identifier.pages	305-315
dc.identifier.pages	11 pages
dc.identifier.uri	https://doi.org/10.1111/cgf.14645
dc.identifier.uri	https://diglib.eg.org:443/handle/10.1111/cgf14645
dc.publisher	The Eurographics Association and John Wiley & Sons Ltd.	en_US
dc.subject	CCS Concepts: Computing methodologies --> Motion processing
dc.subject	Computing methodologies
dc.subject	Motion processing
dc.title	Context-based Style Transfer of Tokenized Gestures	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: v41i8pp305-315.pdf
Size:: 1.26 MB
Format:: Adobe Portable Document Format

Download

Collections

41-Issue 8
SCA 2022: Eurographics/SIGGRAPH Symposium on Computer Animation