Coder Social home page Coder Social logo

syc-hjy / deep-multi-sensory-object-categorization Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gtatiya/deep-multi-sensory-object-categorization

0.0 0.0 0.0 2.71 MB

Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration

Home Page: https://www.eecs.tufts.edu/~gtatiya/

Jupyter Notebook 100.00%

deep-multi-sensory-object-categorization's Introduction

Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration

Abstract:

When identifying an object and its properties, humans use features from multiple sensory modalities produced when manipulating the object. Motivated by this cognitive process, we propose a deep learning methodology for object category recognition which uses visual, auditory, and haptic sensory data coupled with exploratory behaviors (e.g., grasping, lifting, pushing, etc.). In our method, as the robot performs an action on an object, it uses Tensor-Train Gated Recurrent Unit network to process its visual data, and Convolutional Neural Networks to process haptic and auditory data. We propose a novel strategy to train a single neural network that inputs video, audio and haptic data, and demonstrate that its performance is better than separate neural networks for each sensory modality. The proposed method was evaluated on a dataset in which the robot explored 100 different objects, each belonging to one of 20 categories. While the visual information was the dominant modality for most categories, adding the additional haptic and auditory networks further improves the robot's category recognition accuracy. Our approach outperforms the previous published baseline for the dataset which used hand-crafted features for each modality. We also show that a robot does not need the sensory data from the entire interaction, but instead can make a good prediction early on during behavior execution.

drawing

Gyan Tatiya and Jivko Sinapov, "Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration", IEEE International Conference on Robotics and Automation (ICRA), Montreal, Canada, May 20-24, 2019.

Poster:

drawing

BibTeX:

@inproceedings{tatiya2019deep,
  title={Deep Multi-Sensory Object Category Recognition Using Interactive Behavioral Exploration},
  author={Tatiya, Gyan and Sinapov, Jivko},
  booktitle={2019 International Conference on Robotics and Automation (ICRA)},
  pages={7872--7878},
  year={2019},
  organization={IEEE}
}

Development Environment

For our research, we used 64-bit Ubuntu 16.04 based computer with 16 GB RAM, Intel Core i7-7700 CPU (3.20 GHz x 8 cores) and NVIDIA GeForce GTX 1060 (3GB RAM, 1280 CUDA Cores). Data pre-processing was performed using Python 3.5.4 and neural networks were implemented in widely used deep learning framework TensorFlow 1.7 with GPU support (cuDNN 7, CUDA 9).

Dependencies

Python 3.5.4 is used for development and following packages are required to run the code:

pip install tensorflow-gpu==1.8.0
pip install keras==2.2.0
pip install matplotlib==2.2.2
pip install numpy==1.14.5
pip install googledrivedownloader

Dataset

Video Classification

drawing

Video Network Hyper-parameters

The input dimension at each time step is 120 x 90 x 3 = 32,400 which is factorized as 2 x 90 x 90 x 2, the hidden layer is chosen to be 6 x 6 x 6 x 6 = 1,296 and the Tensor-Train ranks are [1, 4, 4, 4, 1]. A fully-connected layer for such a mapping would have required 41,990,400 (32,400 x 1,296) parameters to learn, however the input-to-hidden layer in TT-GRU required only 17,472 parameters.

We applied 0.25 Dropout for both input-to-hidden and hidden-to-hidden mappings in plain GRU as well as its TT modifications; and 0.01 ridge regularization for the single-layered classifier. The model was deployed in Keras and TensorFlow was used as backend. We trained for 150 epochs and used the Adam, an gradient-based stochastic optimization algorithm with a learning rate of 1 x 10^{-4}.

Sound Classification

Sound Network Architecture

Sound Network Hyper-parameters

We applied dropout with 0.5 probability for fully connected layers and the first convolutional layer. We trained our network using Adam optimization, with a learning rate 1 x 10^{-4} and stopped training after 300 iterations.

Haptic Classification

Haptic Network Architecture

Haptic Network Hyper-parameters

To reduce over-fitting during training, we used dropout after the fully connected layer with 0.5 probability. Training was performed with 400 training epochs using Adam optimization with a learning rate of 1 x 10^{-4}.

Multimodal Classification

drawing

Multimodal Network Hyper-parameters

This network was trained for 300 epochs using Adam optimization with learning rate 1 x 10^{-4}.

Results

Accuracy curve for all the interactions and sensory modalities

drawing

Recognition F-score for each category behavior, and sensory modality: (v)isual, (a)uditory, (h)aptic and (m)ultimodal

drawing

Alternate Configurations

Alternate Network Architecture for Video, Sound, Haptic, and Multimodal Networks are discussed here: Alternate Configurations

deep-multi-sensory-object-categorization's People

Contributors

gtatiya avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.