Coder Social home page Coder Social logo

sdml's Introduction

SDML

Peng Hu*, Liangli Zhen*, Dezhong Peng, Pei Liu, Scalable deep multimodal learning for cross-modal retrieval[C], International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2019: 635-644. (*denotes equal contribution, PyTorch Code)

Abstract

Cross-modal retrieval takes one type of data as the query to retrieve relevant data of another type. Most of existing cross-modal retrieval approaches were proposed to learn a common subspace in a joint manner, where the data from all modalities have to be involved during the whole training process. For these approaches, the optimal parameters of different modality-specific transformations are dependent on each other and the whole model has to be retrained when handling samples from new modalities. In this paper, we present a novel cross-modal retrieval method, called Scalable Deep Multimodal Learning (SDML). It proposes to predefine a common subspace, in which the between-class variation is maximized while the within-class variation is minimized. Then, it trains $m$ modality-specific networks for $m$ modalities (one network for each modality) to transform the multimodal data into the predefined common subspace to achieve multimodal learning. Unlike many of the existing methods, our method can train different modality-specific networks independently and thus be scalable to the number of modalities. To the best of our knowledge, the proposed SDML could be one of the first works to independently project data of an unfixed number of modalities into a predefined common subspace. Comprehensive experimental results on four widely-used benchmark datasets demonstrate that the proposed method is effective and efficient in multimodal learning and outperforms the state-of-the-art methods in cross-modal retrieval.

Framework

The general framework of the proposed SDML method. The m modality-specific neural networks (one network for each modality) can be trained separately since they do not share any trainable parameters. MAN

Results

Performance comparison in terms of mAP scores on the PKU XMedia dataset.

Performance comparison in terms of mAP scores on the Wikipedia dataset.

Performance comparison in terms of mAP scores on the MS-COCO dataset.

Citing SDML

If you find SDML useful in your research, please consider citing:

@inproceedings{hu2019deep,
  author = {Hu, Peng and Zhen, Liangli and Peng, Dezhong and Liu, Pei},
  title = {Scalable Deep Multimodal Learning for Cross-Modal Retrieval},
  booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
  series = {SIGIR'19},
  year = {2019},
  isbn = {978-1-4503-6172-9},
  location = {Paris, France},
  pages = {635--644},
  numpages = {10},
  url = {http://doi.acm.org/10.1145/3331184.3331213},
  doi = {10.1145/3331184.3331213},
  acmid = {3331213},
  publisher = {ACM},
  address = {New York, NY, USA},
  keywords = {cross-modal retrieval, multimodal learning, representation learning},
} 

sdml's People

Contributors

penghu-cs avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.