Learning a Generative Model for Multi-Step Human-Object Interactions from Videos

dc.contributor.authorWang, Heen_US
dc.contributor.authorPirk, Sörenen_US
dc.contributor.authorYumer, Ersinen_US
dc.contributor.authorKim, Vladimiren_US
dc.contributor.authorSener, Ozanen_US
dc.contributor.authorSridhar, Srinathen_US
dc.contributor.authorGuibas, Leonidasen_US
dc.contributor.editorAlliez, Pierre and Pellacini, Fabioen_US
dc.date.accessioned2019-05-05T17:41:35Z
dc.date.available2019-05-05T17:41:35Z
dc.date.issued2019
dc.description.abstractCreating dynamic virtual environments consisting of humans interacting with objects is a fundamental problem in computer graphics. While it is well-accepted that agent interactions play an essential role in synthesizing such scenes, most extant techniques exclusively focus on static scenes, leaving the dynamic component out. In this paper, we present a generative model to synthesize plausible multi-step dynamic human-object interactions. Generating multi-step interactions is challenging since the space of such interactions is exponential in the number of objects, activities, and time steps. We propose to handle this combinatorial complexity by learning a lower dimensional space of plausible human-object interactions. We use action plots to represent interactions as a sequence of discrete actions along with the participating objects and their states. To build action plots, we present an automatic method that uses state-of-the-art computer vision techniques on RGB videos in order to detect individual objects and their states, extract the involved hands, and recognize the actions performed. The action plots are built from observing videos of everyday activities and are used to train a generative model based on a Recurrent Neural Network (RNN). The network learns the causal dependencies and constraints between individual actions and can be used to generate novel and diverse multi-step human-object interactions. Our representation and generative model allows new capabilities in a variety of applications such as interaction prediction, animation synthesis, and motion planning for a real robotic agent.en_US
dc.description.number2
dc.description.sectionheadersLearning to Animate
dc.description.seriesinformationComputer Graphics Forum
dc.description.volume38
dc.identifier.doi10.1111/cgf.13644
dc.identifier.issn1467-8659
dc.identifier.pages367-378
dc.identifier.urihttps://doi.org/10.1111/cgf.13644
dc.identifier.urihttps://diglib.eg.org:443/handle/10.1111/cgf13644
dc.publisherThe Eurographics Association and John Wiley & Sons Ltd.en_US
dc.titleLearning a Generative Model for Multi-Step Human-Object Interactions from Videosen_US
Files
Original bundle
Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
v38i2pp367-378.pdf
Size:
11.77 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
suppmat.pdf
Size:
1.91 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
suppvideo.mp4
Size:
250.01 MB
Format:
Unknown data format
Collections