Multimodal Convolutional Neural Networks for Human Activity Recognition

Gadzicki, Konrad; Khamsehashari, Razieh; Zetzsche, Christoph

by Konrad Gadzicki, Razieh Khamsehashari, Christoph Zetzsche

Abstract:

We investigated multimodal fusion with convolu- tional neural networks (CNN) for activity recognition. Out of the number of possible modalities, we have focused on RGB video, optical flow video and skeleton data. Our work here makes use of the “NTU RGB+D” dataset, as preparation for a later application to a large-scale database (project “EASE”). By combining the output layers of state of the art CNNs, we have implemented a late fusion approach. In addition to the fused CNN architecture, we have also investigated the performance of the individual CNNs in unimodal mode, and could improve the performance of skeleton classification on this dataset with regard to the literature.

Download PDF

PDF URL: https://seafile.zfn.uni-bremen.de/d/f4c2feaf3c3640baa987/files/?p=/submitted_papers/Gadzicki-Khamsehashari-Zetzsche-2018-IROS-BDSR_camera-ready_2018-09-20.pdf

Reference:

Multimodal Convolutional Neural Networks for Human Activity Recognition (Konrad Gadzicki, Razieh Khamsehashari, Christoph Zetzsche), In IROS 2018: Workshop on Latest Advances in Big Activity Data Sources for Robotics & New Challenges, 2018.

Bibtex Entry:

@INPROCEEDINGS{gadzicki2018iros,
     author = {Gadzicki, Konrad and Khamsehashari, Razieh and Zetzsche, Christoph},
   keywords = {ownpub},
      title = {Multimodal Convolutional Neural Networks for Human Activity Recognition},
  booktitle = {IROS 2018: Workshop on Latest Advances in Big Activity Data Sources for Robotics & New Challenges},
       year = {2018},
   location = {Madrid},
        url = {https://seafile.zfn.uni-bremen.de/d/f4c2feaf3c3640baa987/files/?p=/submitted_papers/Gadzicki-Khamsehashari-Zetzsche-2018-IROS-BDSR_camera-ready_2018-09-20.pdf},
   abstract = {We investigated multimodal fusion with convolu-
tional neural networks (CNN) for activity recognition. Out of
the number of possible modalities, we have focused on RGB
video, optical flow video and skeleton data. Our work here
makes use of the “NTU RGB+D” dataset, as preparation for a
later application to a large-scale database (project “EASE”). By
combining the output layers of state of the art CNNs, we have
implemented a late fusion approach. In addition to the fused
CNN architecture, we have also investigated the performance
of the individual CNNs in unimodal mode, and could improve
the performance of skeleton classification on this dataset with
regard to the literature.},
        keywords = {EASE-H3},
}