Coder Social home page Coder Social logo

fpvsum's Introduction

FPVSum Dataset

First-Person Video Summarization (FPVSum) dataset proposed in our ECCV 2018 paper "First-Person Video Summarization From Third Person's Point of Views."

Description

First-Person Video Summarization (FPVSum) dataset is a benchmark to evaluate video summarization algorithms specifically for first-person videos. FPVSum contains 14 categories of first-person videos captured by head or body-mounted devices alongside their corrosponding human-annotated frame-level importance scores which are used for training and evalutaion in this paper.

File description

The files of FPVSum are listed as below. We provide lists and scripts to obtain selected first-person videos from YouTube. In addition, both frame-level importance scores and the user interface for annotation are also provided for public use.

fpvsum/
 ⊢ crawler/        # scripts to obtain video data
 ⊢ annotation/     # ground-truth scores of videos
 ⊢ UI/             # user interface for labeling

Videos

FPVSum consists of 14 categories of videos from YouTube, resulting in a total number of 98 first-person videos. During the collection of the first-person videos, we found that a large number of such videos on YouTube are not raw videos but edited ones, consisting of obvious frame discontinuity, transitions of point-of-view, and unrelated contents. Thus, we carefully selected continuous first-person videos only (i.e., no transition within or between points of views), and exluded those with unrelated contents.

How to get the videos

Prerequisites

Run the scripts
To download the videos in the list, you may simply run the following commands in the crawler folder:

mkdir video_dir

chmod +x fetch_fpvsum_videos.sh

./fetch_fpvsum_videos.sh video_dir FPVSum_videolist.csv

Ground-Truth Scores

We invited more than 15 volunteers to perform video annotation. Given each video, annotators are asked to produce a summary that contains most of its important content and highlight segments using our designed human interface. We observe that most annotators would lose concentration on assigning scores for very long videos. Thus, for each video category we select about 35% of the video sequences to be annotated with ground-truth scores, while the remaining are viewed as the unlabeled ones.

The details of our annotation process are shown as follows:

  • The annotators require to select highlight/non-highlight segments in each video. They need to finish watching each video once, then they start the labeling process.
  • The annotators are asked to select the video parts which they consider interesting or important (i.e., mark the parts to red color using the interface). We note that an interesting part being marked may vary in any length.
  • The annotators are encouraged to produce the summary which accounts for 10% to 20% of the full video length.
  • Each frame would get an importance score which indicates how many annotators mark on this frame. We finally select frames ranked in the top 15% of all video frames as the highlight ones.

User Interface for Human Annotation

We provide a user interface (written in MATLAB language) for labeling video sequences. The interface shows each video excluding its audio track, ensuring annotators select highlight based on visual content only. Annotators are able to use the interface for moving forward and backward and modify their annotations at any time.

References

Hsuan-I Ho, Wei-Chen Chiu, and Yu-Chiang Frank Wang: Summarizing First-Person Videos from Third Persons’ Points of Views. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

fpvsum's People

Contributors

azuxmioy avatar

Stargazers

Devin Blitzer avatar MikeLi avatar  avatar QingsongZhao avatar Ming Chen avatar Solon avatar  avatar  avatar Akira MATSUDA avatar Will Brennan avatar qiao avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.