You are here

Three-Dimensional Scene Estimation from Monocular Videos with Applications in Video Analysis

Title: Three-Dimensional Scene Estimation from Monocular Videos with Applications in Video Analysis.
63 views
28 downloads
Name(s): Donate, Arturo, author
Liu, Xiuwen, professor directing thesis
Patrangenaru, Victor, university representative
Kumar, Piyush, committee member
Tyson, Gary, committee member
Department of Computer Science, degree granting department
Florida State University, degree granting institution
Type of Resource: text
Genre: Text
Issuance: monographic
Date Issued: 2011
Publisher: Florida State University
Place of Publication: Tallahassee, Florida
Physical Form: computer
online resource
Extent: 1 online resource
Language(s): English
Abstract/Description: This research explores the idea of extracting three-dimensional features from video clips, in order to aid various video analysis and mining tasks. Although video analysis problems are well-established in the literature, the use of three-dimensional information in is scarce due to the inherent difficulties of building such a system. When the only input to the system is a video stream with no previous knowledge of the scene or camera (a typical scenario in video analysis), extracting meaningful and accurate 3D representations becomes a very difficult task. However, several recently proposed methods have shown some progress in working towards this goal by applying techniques from various other topics including simultaneous localization and mapping, structure from motion, and 3D reconstruction. In the research presented here, I present two main contributions towards solving this problem. First, I propose a method capable of generating a three-dimensional representation of a scene as observed by a monocular video, using no previous information. The method exploits the movement of the camera while robustly tracking features over time in order to obtain multiple views of a scene and perform 3D reconstruction. This system performs automatic camera calibration, estimates the three-dimensional structure of the scene, and tracks the scene across time while refining its results as new frames are obtained. Additionally, the system can track a scene even under the presence of moving people, a limitation of most SLAM and SFM approaches available in the literature. Secondly, I present a method for extracting the three-dimensional pose and motion of a person in a video. The method extends previously published work related to two-dimensional human pose estimation by incorporating a human motion model and expands the two-dimensional pose onto three dimensions using several heuristics. Together, these methods yield an intrinsic 3D representation of the static background and the people in a scene which can be used to solve various video analysis tasks. To prove the feasibility of my proposed method, I show how it can be used to solve a selection of video analysis tasks. First, I show how a three-dimensional point cloud of the scene can be used along with robust feature tracking to detect shot- boundaries in the video. Next, I present an automatic approach to stereoscopic video conversion using no prior knowledge of the input video. Finally, I illustrate how a three-dimensional human model can be incorporated with simple linear classifiers to perform human action recognition with high classification results.
Identifier: FSU_migr_etd-0714 (IID)
Submitted Note: A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Degree Awarded: Summer Semester, 2011.
Date of Defense: June 24, 2011.
Keywords: 3D, structure from motion, video analysi, shot boundary detection, stereoscopic, activity recognition, image processing
Bibliography Note: Includes bibliographical references.
Advisory Committee: Xiuwen Liu, Professor Directing Thesis; Victor Patrangenaru, University Representative; Piyush Kumar, Committee Member; Gary Tyson, Committee Member.
Subject(s): Computer science
Persistent Link to This Record: http://purl.flvc.org/fsu/fd/FSU_migr_etd-0714
Owner Institution: FSU

Choose the citation style.
Donate, A. (2011). Three-Dimensional Scene Estimation from Monocular Videos with Applications in Video Analysis. Retrieved from http://purl.flvc.org/fsu/fd/FSU_migr_etd-0714