WICED 2020

Permanent URI for this collection

https://diglib7.eg.org/handle/10.2312/2632900

Norrköping, Sweden, May 25-29, 2020 (Virtual)

Table of Contents

Morning Session

User-Adaptive Rotational Snap-Cutting for Streamed 360º Videos

[full paper

] [meta data

]

Miguel Fabian Romero Rondón, Lucile Sassatelli, Ramon Aparicio Pardo, and Frédéric Precioso

Designing an Adpative Assisting Interface for Learning Virtual Filmmaking

[full paper

] [meta data

]

Qiu-Jie Wu, Chih-Hsuan Kuo, Hui-Yin Wu, and Tsai-Yen Li

How Good is Good Enough? The Challenge of Evaluating Subjective Quality of AI-Edited Video Coverage of Live Events

[full paper

] [meta data

]

Miruna Radut, Michael Evans, Kristie To, Tamsin Nooney, and Graeme Phillipson

Exploring the Impact of 360º Movie Cuts in Users' Attention

[full paper

] [meta data

]

Carlos Marañes, Diego Gutierrez, and Ana Serrano

Afternoon Session

CineFilter: Unsupervised Filtering for Real Time Autonomous Camera Systems

[full paper

] [meta data

]

Sudheer Achary, K. L. Bhanu Moorthy, Ashar Javed, Nikitha Shravan, Vineet Gandhi, and Anoop M. Namboodiri

GAZED - Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings

[full paper

] [meta data

]

K. L. Bhanu Moorthy, Moneish Kumar, Ramanathan Subramanian, and Vineet Gandhi

Joint Attention for Automated Video Editing

[full paper

] [meta data

]

Hui-Yin Wu, Trevor Santarra, Michael Leece, Rolando Vargas, and Arnav Jhala

BibTeX (WICED 2020)

@inproceedings{10.2312:wiced.20201126,

booktitle = {Workshop on Intelligent Cinematography and Editing},

editor = {Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
},
title = {{Designing an Adpative Assisting Interface for Learning Virtual Filmmaking}},

author = {Wu, Qiu-Jie and 
Kuo, Chih-Hsuan and 
Wu, Hui-Yin and 
Li, Tsai-Yen
},
year = {2020},

publisher = {The Eurographics Association},

ISSN = {2411-9733},
ISBN = {978-3-03868-127-4},

DOI = {10.2312/wiced.20201126}

}

@inproceedings{10.2312:wiced.20201125,

booktitle = {Workshop on Intelligent Cinematography and Editing},

editor = {Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
},
title = {{User-Adaptive Rotational Snap-Cutting for Streamed 360º Videos}},

author = {Rondón, Miguel Fabian Romero and 
Sassatelli, Lucile and 
Pardo, Ramon Aparicio and 
Precioso, Frédéric
},
year = {2020},

publisher = {The Eurographics Association},

ISSN = {2411-9733},
ISBN = {978-3-03868-127-4},

DOI = {10.2312/wiced.20201125}

}

@inproceedings{10.2312:wiced.20201130,

booktitle = {Workshop on Intelligent Cinematography and Editing},

editor = {Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
},
title = {{GAZED - Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings}},

author = {Moorthy, K. L. Bhanu and 
Kumar, Moneish and 
Subramanian, Ramanathan and 
Gandhi, Vineet
},
year = {2020},

publisher = {The Eurographics Association},

ISSN = {2411-9733},
ISBN = {978-3-03868-127-4},

DOI = {10.2312/wiced.20201130}

}

@inproceedings{10.2312:wiced.20201127,

booktitle = {Workshop on Intelligent Cinematography and Editing},

editor = {Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
},
title = {{How Good is Good Enough? The Challenge of Evaluating Subjective Quality of AI-Edited Video Coverage of Live Events}},

author = {Radut, Miruna and 
Evans, Michael and 
To, Kristie and 
Nooney, Tamsin and 
Phillipson, Graeme
},
year = {2020},

publisher = {The Eurographics Association},

ISSN = {2411-9733},
ISBN = {978-3-03868-127-4},

DOI = {10.2312/wiced.20201127}

}

@inproceedings{10.2312:wiced.20201129,

booktitle = {Workshop on Intelligent Cinematography and Editing},

editor = {Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
},
title = {{CineFilter: Unsupervised Filtering for Real Time Autonomous Camera Systems}},

author = {Achary, Sudheer and 
Moorthy, K. L. Bhanu and 
Javed, Ashar and 
Shravan, Nikitha and 
Gandhi, Vineet and 
Namboodiri, Anoop M.
},
year = {2020},

publisher = {The Eurographics Association},

ISSN = {2411-9733},
ISBN = {978-3-03868-127-4},

DOI = {10.2312/wiced.20201129}

}

@inproceedings{10.2312:wiced.20201128,

booktitle = {Workshop on Intelligent Cinematography and Editing},

editor = {Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
},
title = {{Exploring the Impact of 360º Movie Cuts in Users' Attention}},

author = {Marañes, Carlos and 
Gutierrez, Diego and 
Serrano, Ana
},
year = {2020},

publisher = {The Eurographics Association},

ISSN = {2411-9733},
ISBN = {978-3-03868-127-4},

DOI = {10.2312/wiced.20201128}

}

@inproceedings{10.2312:wiced.20201131,

booktitle = {Workshop on Intelligent Cinematography and Editing},

editor = {Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
},
title = {{Joint Attention for Automated Video Editing}},

author = {Wu, Hui-Yin and 
Santarra, Trevor and 
Leece, Michael and 
Vargas, Rolando and 
Jhala, Arnav
},
year = {2020},

publisher = {The Eurographics Association},

ISSN = {2411-9733},
ISBN = {978-3-03868-127-4},

DOI = {10.2312/wiced.20201131}

}

Browse

Now showing 1 - 8 of 8

WICED 2020: Frontmatter
(The Eurographics Association, 2020) Christie, Marc; Wu, Hui-Yin; Li, Tsai-Yen; Gandhi, Vineet; Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
Designing an Adpative Assisting Interface for Learning Virtual Filmmaking
(The Eurographics Association, 2020) Wu, Qiu-Jie; Kuo, Chih-Hsuan; Wu, Hui-Yin; Li, Tsai-Yen; Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
In this paper, we present an adaptive assisting interface for learning virtual filmmaking. The design of the system is based on the scaffolding theory, to provide timely guidance to the user in the form of visual and audio messages that are adapted to each person's skill level and performance. The system was developed on an existing virtual filmmaking setup. We conducted a study with 24 participants, who were asked to operate the film set with or without our adaptive assisting interface. Results suggest that our system can provide users with a better learning experience and positive knowledge harvest.
User-Adaptive Rotational Snap-Cutting for Streamed 360º Videos
(The Eurographics Association, 2020) Rondón, Miguel Fabian Romero; Sassatelli, Lucile; Pardo, Ramon Aparicio; Precioso, Frédéric; Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
Designing editing cuts for cinematic Virtual Reality (VR) has been under active investigation. Recently, the connection has been made between cuts in VR and adaptive streaming logics for 360º videos, with the introduction of rotational snap-cuts. Snapcuts can benefit the user's experience both by improving the streamed quality in the FoV and ensuring the user sees important elements for the plot. However, snap-cuts should not be too frequent and may be avoided when not beneficial to the streamed quality. We formulate the dynamic decision problem of snap-change triggering as a model-free Reinforcement Learning. We express the optimum cut triggering decisions computed offline with dynamic programming and investigate possible gains in quality of experience compared to baselines. We design Imitation Learning-based dynamic triggering strategies, and show that only knowing the past user's motion and video content, it is possible to outperform the controls without and with all cuts.
GAZED - Gaze-guided Cinematic Editing of Wide-Angle Monocular Video Recordings
(The Eurographics Association, 2020) Moorthy, K. L. Bhanu; Kumar, Moneish; Subramanian, Ramanathan; Gandhi, Vineet; Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
We present GAZED- eye GAZ-guided EDiting for videos captured by a solitary, static, wide-angle and high-resolution camera. Eye-gaze has been effectively employed in computational applications as a cue to capture interesting scene content; we employ gaze as a proxy to select shots for inclusion in the edited video. Given the original video, scene content and user eye-gaze tracks are combined to generate an edited video comprising of cinematically valid actor shots and shot transitions to generate an aesthetic and vivid representation of the original narrative. We model cinematic video editing as an energy minimization problem over shot selection, whose constraints capture cinematographic editing conventions. Gazed scene locations primarily determine the shots constituting the edited video. Effectiveness of GAZED against multiple competing methods is demonstrated via a psychophysical study involving 12 users and twelve performance videos. Professional video recordings of stage performances are typically created by employing skilled camera operators, who record the performance from multiple viewpoints. These multi-camera feeds, termed rushes, are then edited together to portray an eloquent story intended to maximize viewer engagement. Generating professional edits of stage performances is both difficult and challenging. Firstly, maneuvering cameras during a live performance is difficult even for experts as there is no option of retake upon error, and camera viewpoints are limited as the use of large supporting equipment (trolley, crane .) is infeasible. Secondly, manual video editing is an extremely slow and tedious process and leverages the experience of skilled editors. Overall, the need for (i) a professional camera crew, (ii) multiple cameras and supporting equipment, and (iii) expert editors escalates the process complexity and costs. Consequently, most production houses employ a large field-of-view static camera, placed far enough to capture the entire stage. This approach is widespread as it is simple to implement, and also captures the entire scene. Such static visualizations are apt for archival purposes; however, they are often unsuccessful at captivating attention when presented to the target audience. While conveying the overall context, the distant camera feed fails to bring out vivid scene details like close-up faces, character emotions and actions, and ensuing interactions which are critical for cinematic storytelling. GAZED denotes an end-to-end pipeline to generate an aesthetically edited video from a single static, wide-angle stage recording. This is inspired by prior work [GRG14], which describes how a plural camera crew can be replaced by a single highresolution static camera, and multiple virtual camera shots or rushes generated by simulating several virtual pan/tilt/zoom cameras to focus on actors and actions within the original recording. In this work, we demonstrate that the multiple rushes can be automatically edited by leveraging user eye gaze information, by modeling (virtual) shot selection as a discrete optimization problem. Eye-gaze represents an inherent guiding factor for video editing, as eyes are sensitive to interesting scene events [RKH*09,SSSM14] that need to be vividly presented in the edited video. The objective critical for video editing and the key contribution of our work is to decide which shot (or rush) needs to be selected to describe each frame of the edited video. The shot selection problem is modeled as an optimization, which incorporates gaze information along with other cost terms that model cinematic editing principles. Gazed scene locations are utilized to define gaze potentials, which measure the importance of the different shots to choose from. Gaze potentials are then combined with other terms that model cinematic principles like avoiding jump cuts (which produce jarring shot transitions), rhythm (pace of shot transitioning), avoiding transient shots . The optimization is solved using dynamic programming. [MKSG20] refers to the detailed full article.
How Good is Good Enough? The Challenge of Evaluating Subjective Quality of AI-Edited Video Coverage of Live Events
(The Eurographics Association, 2020) Radut, Miruna; Evans, Michael; To, Kristie; Nooney, Tamsin; Phillipson, Graeme; Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
This paper reports on recent and ongoing work to develop empirical methods for assessment of the subjective quality of artificial intelligence (AI)-produced multicamera video. We have developed a prototype software system that recording panel performances, using a variety of didactic and machine learning techniques to intelligently crop and cut between feeds from an array of static, unmanned cameras. Evaluating the subjective quality rendered by the software's decisions regarding when and to what to cut represents an important and interesting challenge, due to the technical behaviour of the system, the large number of potential quality risks, and the need to mitigate for content specificity.
CineFilter: Unsupervised Filtering for Real Time Autonomous Camera Systems
(The Eurographics Association, 2020) Achary, Sudheer; Moorthy, K. L. Bhanu; Javed, Ashar; Shravan, Nikitha; Gandhi, Vineet; Namboodiri, Anoop M.; Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
Autonomous camera systems are often subjected to an optimization operation to smoothen and stabilize the rough trajectory estimates. Most common filtering techniques do reduce the irregularities in data; however, they fail to mimic the behavior of a human cameraman. Global filtering methods modeling human camera operators have been successful; however, they are limited to offline settings. In this paper, we propose two online filtering methods called Cinefilters, which produce smooth camera trajectories that are motivated by cinematographic principles. The first filter (CineConvex) uses a sliding windowbased convex optimization formulation, and the second (CineCNN) is a CNN based encoder-decoder model. We evaluate the proposed filters in two different settings, namely a basketball dataset and a stage performance dataset. Our models outperform previous methods and baselines on quantitative metrics. The CineConvex and CineCNN filters operate at about 250fps and 1000fps, respectively, with a minor latency (half a second), making them apt for a variety of real-time applications.
Exploring the Impact of 360º Movie Cuts in Users' Attention
(The Eurographics Association, 2020) Marañes, Carlos; Gutierrez, Diego; Serrano, Ana; Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
Virtual Reality (VR) has become more relevant since the first devices for personal use became available on the market. New content has emerged for this new medium with different purposes such as education, traning, entertainment, etc. However, the production workflow of cinematic VR content is still in an experimental phase. The main reason is that there is controversy between content creators on how to tell a story effectively. Unlike traditional filmmaking, which has been in development for more than 100 years, movie editing in VR has brought new challenges to be addressed. Now viewers have partial control of the camera and can watch every degree of the 360º that surrounds them, with the possibility of losing important aspects of the scene that are key to understand the narrative of the movie. Directors can decide how to edit the film by combining the different shots. Nevertheless, depending on the scene before and after the cut, viewers' behavior may be influenced. To address this issue, we analyze users' behavior through cuts in a professional movie, where the narrative plays an important role, and derive new insights that could potentially influence VR content creation, informing content creators about the impact of different cuts in viewers' behavior.
Joint Attention for Automated Video Editing
(The Eurographics Association, 2020) Wu, Hui-Yin; Santarra, Trevor; Leece, Michael; Vargas, Rolando; Jhala, Arnav; Christie, Marc and Wu, Hui-Yin and Li, Tsai-Yen and Gandhi, Vineet
Joint attention refers to the shared focal points of attention for occupants in a space. In this work, we introduce a computational definition of joint attention for the automated editing of meetings in multi-camera environments from the AMI corpus. Using extracted head pose and individual headset amplitude as features, we developed three editing methods: (1) a naive audio-based method that selects the camera using only the headset input, (2) a rule-based edit that selects cameras at a fixed pacing using pose data, and (3) an editing algorithm using LSTM (Long-short term memory) learned joint-attention from both pose and audio data, trained on expert edits. The methods are evaluated qualitatively against the human edit, and quantitatively in a user study with 22 participants. Results indicate that LSTM-trained joint attention produces edits that are comparable to the expert edit, offering a wider range of camera views than audio, while being more generalizable as compared to rule-based methods.

BibTeX (WICED 2020)

Browse

Recent Submissions

Results Per Page

Sort Options