clp-research / clp-vision Goto Github PK

Our code for working with language and vision corpora.

Jupyter Notebook 98.18% Python 1.82% Shell 0.01%

clp-vision's Introduction

David Schlangen, 2019-04-07

`clp-vision`: Code for Working with Image Corpora

This repository collects code for working with (= preprocessing, extracting features from, etc) image corpora used by the CompLing-Potsdam group (formerly "dialogue systems group Bielefeld"). It started as a re-factorisation of the code from our ACL 2016 paper.

A collection of notebooks that make use of the joint preprocessed format created here can be found in the semantics with pictures repository.

Citation

If you make use of any material in here, please cite

David Schlangen, Natural Language Semantics with Pictures: Some Language & Vision Datasets and Potential Uses for Computational Semantics, Proceedings of the International Conference on Computational Semantics (IWCS), 2019, Gothenburg, May

Schlangen D, Zarrieß S, Kennington C. Resolving References to Objects in Photographs using the Words-As-Classifiers Model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016). Berlin; 2016.

clp-vision's People

Contributors

Stargazers

Watchers

Forkers

soerenetler pkhdipraja davidschlangen rutescher nilinykh

clp-vision's Issues

treatment of filenames in ADE is inconsistent

Problem: For ADE20k, our usual way of denoting an image through an image_id doesn't work. First, the images are inside of a nested structure, which cannot be predicted from the image id. E.g., ADE_train_00000994.jpg is to be found in training/a/abbey/. Second, the same image ID may occur in training/ and in testing/.

(Actually, I'm not so sure now anymore whether the imageID, which ultimately is coming from index_ade20k.mat, is the number in the filename. But I am fairly certain that the image_id that we used was non-unique w/o the split.)

So even knowing the split (which could be encoded into the image_id, by adding a constant number so that everything beyond that is from split B) is not enough to get the image; for that you need to know the category as well.

This problem surfaces at two places. During extraction, going through a dataframe like ade_objdf, as it is at the moment, is not enough, because that doesn't have the image category and the split. (Actually, it does have the split, as that is encoded into the image id.) So to get the image, one would need to load a different structure that goes from image id + split to the fully qualified path.

It is also a problem for our usual encoding of the image in the feature file, where we have only three numerical fields. This constraint makes it necessary to encode the split info (which minimally is needed to disambiguate the image_id) into the image id.

Possible solution:

Add the fields to the ADE dataframes, so that during extraction only the one dataframe needs to be consulted, in the same way as for all other corpora as well. (Rather than loading another dataframe with the mapping between id+split and full filename.)
Encode the split into the image id, in the feature file. Then, when one wants to go from feature row to the corresponding image, e.g. for visualisation of the image, one will need to have this mapping available. But that seems ok, since that is a special case and then the mapping dataframe can be explicitly loaded.

(But the API to get_image_filename should be cleaned up in any way, and split and category should be made into keyword arguments that are passed along to get_ade_filename.)

Incorrect JSON value for some visgen bbdfs

Line 88 in ExtractFeats/extract.py should be obj_id instead of object_id to match the output of preproc.py.

`mscoco_image_filename()` in `utils.py` cannot handle splits

At the moment, the function only looks in train2014. So when it gets an image_id for an image in val2014, it looks for it in train2014, and fails. (Thankfully.)

Three possible solutions:

Make split a keyword argument (to get_image_filename() first). Advantage is that this is general. Disadvantage is that the information about the split isn't present in the feature file,
where there are only three fields for each image (corpus code, image id, region id).
Make a test first, and if image_id is not in train, try in val. Advantage: API doesn't change. Disadvantage: For each image in val, there would be one unnecessary file test.
Encode information about the split in some other way, directly on the image id. E.g., 0001 is train, but 0001.5 is val. Advantage: Compatible with format of feature file. Disadvantage: turns image_id into a float, where it used to be an unsigned int.

These solutions are not mutually exclusive, at least as far as this function is concerned.

the wac training files should use paths from config

mod02_refcoco_vgg.py for example has

    preproc_path = dsgv_home + '/Preproc/PreprocOut/'
    feats_path = dsgv_home + '/ExtractFeats/ExtractOut/'

But these paths should come from the config file.

make dask a requirement

dask should be in requirements.txt

use annoy to get nearest neighbours for image similarity as well

At the moment, the image similarity uses a homegrown, not very great way to keep the nearest neighbour indexing manageable. (In Preproc/sim_preproc.py.)

This should be using annoy, just as I later did in Preproc/CapEmbed/embed_captions_local.ipynb.

This however would mean that this needs to be changed in all notebooks in sempix that make use of image similarities, which may be quite a few.

clp-research / clp-vision Goto Github PK

clp-vision's Introduction

`clp-vision`: Code for Working with Image Corpora

Citation

clp-vision's People

Contributors

Stargazers

Watchers

Forkers

clp-vision's Issues

treatment of filenames in ADE is inconsistent

Incorrect JSON value for some visgen bbdfs

`mscoco_image_filename()` in `utils.py` cannot handle splits

the wac training files should use paths from config

make dask a requirement

use annoy to get nearest neighbours for image similarity as well

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

clp-research / clp-vision Goto Github PK

clp-vision's Introduction

clp-vision: Code for Working with Image Corpora

Citation

clp-vision's People

Contributors

Stargazers

Watchers

Forkers

clp-vision's Issues

Recommend Projects

Recommend Topics

Recommend Org

`clp-vision`: Code for Working with Image Corpora