2D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs

2020

IEEE TRANSACTIONS ON MULTIMEDIA

2D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs

01 Pubblicazione su rivista

Avola Danilo, Cascio Marco, Cinque Luigi, Foresti Gian Luca, Massaroni Cristiano, Rodola Emanuele

DOI: 10.1109/TMM.2019.2960588

ISSN: 1520-9210

Action recognition in video sequences is an inter-esting field for many computer vision applications, includingbehaviour analysis, event recognition, and video surveillance.In this work, a method based on 2D skeleton and two-branchstacked Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) cells is proposed. Unlike 3D skeletons,usually generated by RGB-D cameras, the 2D skeletons adoptedin this work are reconstructed starting from RGB video streams,therefore allowing the use of the proposed approach in bothindoor and outdoor environments. Moreover, any case of missingskeletal data is managed by exploiting 3D-Convolutional NeuralNetworks (3D-CNNs). Comparative experiments with severalkey works on KTH and Weizmann datasets show that themethod described in this paper outperforms the current state-of-the-art. Additional experiments on UCF Sports and IXMASdatasets demonstrate the effectiveness of our method in thepresence of noisy data and perspective changes, respectively.Further investigations on UCF Sports, HMDB51, UCF101, andKinetics400 highlight how the combination between the proposedtwo-branch stacked LSTM and the 3D-CNN-based network canmanage missing skeleton information, greatly improving theoverall accuracy. Moreover, additional tests on KTH and UCFSports datasets also show the robustness of our approach in thepresence of partial body occlusions. Finally, comparisons on UT-Kinect and NTU-RGB+D datasets show that the accuracy of theproposed method is fully comparable to that of works based on3D skeletons.

2D skeleton long short-term memory (LSTM) recurrent neu-ral networks (RNNs)