Coder Social home page Coder Social logo

clp-vision's Introduction

David Schlangen, 2019-04-07

clp-vision: Code for Working with Image Corpora

This repository collects code for working with (= preprocessing, extracting features from, etc) image corpora used by the CompLing-Potsdam group (formerly "dialogue systems group Bielefeld"). It started as a re-factorisation of the code from our ACL 2016 paper.

A collection of notebooks that make use of the joint preprocessed format created here can be found in the semantics with pictures repository.

Citation

If you make use of any material in here, please cite

David Schlangen, Natural Language Semantics with Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics, Proceedings of the International Conference on Computational Semantics (IWCS), 2019, Gothenburg, May

or

Schlangen D, Zarrieß S, Kennington C. Resolving References to Objects in Photographs using the Words-As-Classifiers Model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Berlin; 2016.

clp-vision's People

Contributors

davidschlangen avatar dependabot[bot] avatar nilinykh avatar pkhdipraja avatar rutescher avatar soerenetler avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

clp-vision's Issues

treatment of filenames in ADE is inconsistent

Problem: For ADE20k, our usual way of denoting an image through an image_id doesn't work. First, the images are inside of a nested structure, which cannot be predicted from the image id. E.g., ADE_train_00000994.jpg is to be found in training/a/abbey/. Second, the same image ID may occur in training/ and in testing/.

(Actually, I'm not so sure now anymore whether the imageID, which ultimately is coming from index_ade20k.mat, is the number in the filename. But I am fairly certain that the image_id that we used was non-unique w/o the split.)

So even knowing the split (which could be encoded into the image_id, by adding a constant number so that everything beyond that is from split B) is not enough to get the image; for that you need to know the category as well.

This problem surfaces at two places. During extraction, going through a dataframe like ade_objdf, as it is at the moment, is not enough, because that doesn't have the image category and the split. (Actually, it does have the split, as that is encoded into the image id.) So to get the image, one would need to load a different structure that goes from image id + split to the fully qualified path.

It is also a problem for our usual encoding of the image in the feature file, where we have only three numerical fields. This constraint makes it necessary to encode the split info (which minimally is needed to disambiguate the image_id) into the image id.

Possible solution:

  • Add the fields to the ADE dataframes, so that during extraction only the one dataframe needs to be consulted, in the same way as for all other corpora as well. (Rather than loading another dataframe with the mapping between id+split and full filename.)
  • Encode the split into the image id, in the feature file. Then, when one wants to go from feature row to the corresponding image, e.g. for visualisation of the image, one will need to have this mapping available. But that seems ok, since that is a special case and then the mapping dataframe can be explicitly loaded.

(But the API to get_image_filename should be cleaned up in any way, and split and category should be made into keyword arguments that are passed along to get_ade_filename.)

`mscoco_image_filename()` in `utils.py` cannot handle splits

At the moment, the function only looks in train2014. So when it gets an image_id for an image in val2014, it looks for it in train2014, and fails. (Thankfully.)

Three possible solutions:

  • Make split a keyword argument (to get_image_filename() first). Advantage is that this is general. Disadvantage is that the information about the split isn't present in the feature file,
    where there are only three fields for each image (corpus code, image id, region id).
  • Make a test first, and if image_id is not in train, try in val. Advantage: API doesn't change. Disadvantage: For each image in val, there would be one unnecessary file test.
  • Encode information about the split in some other way, directly on the image id. E.g., 0001 is train, but 0001.5 is val. Advantage: Compatible with format of feature file. Disadvantage: turns image_id into a float, where it used to be an unsigned int.

These solutions are not mutually exclusive, at least as far as this function is concerned.

use annoy to get nearest neighbours for image similarity as well

At the moment, the image similarity uses a homegrown, not very great way to keep the nearest neighbour indexing manageable. (In Preproc/sim_preproc.py.)

This should be using annoy, just as I later did in Preproc/CapEmbed/embed_captions_local.ipynb.

This however would mean that this needs to be changed in all notebooks in sempix that make use of image similarities, which may be quite a few.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.