Skip to Main Content
 

Global Search Box

 
 
 

ETD Abstract Container

Abstract Header

Learning Latent Temporal Manifolds for Recognition and Prediction of Multiple Actions in Streaming Videos using Deep Networks

Abstract Details

2015, Doctor of Philosophy (Ph.D.), University of Dayton, Electrical Engineering.
Recognizing multiple types of actions appearing in a continuous temporal order from a streaming video is the key to many possible applications ranging from real-time surveillance to egocentric motion for human computer interaction. Current state of the art algorithms are more focused either on holistic video representation or on finding a specific activity in video sequences. But the major drawback is that these algorithms work only on applications pertaining to unconstrained video search from the web and requires the complete sequence for reporting what kind of actions are present. In this dissertation, we propose an algorithm to detect and recognize multiple actions in a streaming sequence at every instant. This approach was successful in recognizing the type of action being performed and also provides a percentage of completion of that action at every instant in real-time. This system is invariant to the number of frames and the speed at which the action is being performed. Apart from these benefits, the proposed model can also predict the motion descriptors at future instances corresponding to the action present. Since human motion is inherently continuous in nature, the algorithm presented in this dissertation computes novel motion descriptors based on the dense optical flow at every instant and evaluates their variations along the temporal domain using deep learning techniques. For each action type, we compute a non-linear transformation from motion descriptor space into the latent temporal space using stacked autoencoders where this transformation is learned from its training patterns. The latent features thus obtained, forms a temporal manifold where the transitions along it are modeled using the Conditional Restricted Boltzmann Machines (CRBMs). Using these trained autoencoders and CRBMs for every action type, we can make an inference into multiple latent temporal action manifolds at an instant from a set of streaming input frames. Our model achieved a high accuracy of 93% in recognizing actions per frame and was able to predict the future action instances with an accuracy of 84% for KTH dataset. Similarly, it was also tested with the UCF Sports dataset and achieved an accuracy of 84% in recognizing the action per-frame and around 69% of predictive capability. Therefore we believe that the proposed model can benefit applications in human computer interaction, gaming and IP surveillance where the action classification using temporal manifolds and its predictive capability are crucial.
Kimberly Kendricks (Committee Member)
Keigo Hirakawa (Committee Member)
Raul Ordonez (Committee Member)
Vijayan Asari (Committee Chair)
162 p.

Recommended Citations

Citations

  • Nair, B. M. (2015). Learning Latent Temporal Manifolds for Recognition and Prediction of Multiple Actions in Streaming Videos using Deep Networks [Doctoral dissertation, University of Dayton]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1429532297

    APA Style (7th edition)

  • Nair, Binu. Learning Latent Temporal Manifolds for Recognition and Prediction of Multiple Actions in Streaming Videos using Deep Networks . 2015. University of Dayton, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=dayton1429532297.

    MLA Style (8th edition)

  • Nair, Binu. "Learning Latent Temporal Manifolds for Recognition and Prediction of Multiple Actions in Streaming Videos using Deep Networks ." Doctoral dissertation, University of Dayton, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1429532297

    Chicago Manual of Style (17th edition)