Coder Social home page Coder Social logo

denotationgraph's Introduction

DenotationGraph

In order to generate a new denotation graph from a set of image captions, you will need to modify run.sh. You will need to change graph_name to the name of your corpus. (If necessary, you can also change corpus_dir to the directory that contains your corpus directory, i.e. /home/data/graph_corpora if your image caption data is contained in /home/data/graph_corpura/new_captions. Otherwise, simply put the directory containing your caption data in corpora.

Your graph_name should match the name of the directory containing your caption data (e.g. mpe_test_corpus). To start graph generation, your caption directory should contain one file, [graph_name].spell. This file is tab-delimited, and each line contains one caption ID followed by the corresponding caption. See corpora/mpe_test_corpus/mpe_test_corpus.spell for an example. The graph generation process assumes that caption IDs are formatted as image_id#caption_idx. Denotational similarities are computed based on shared images, so this information is important to the graph similarity computations.

The other file you will probably need is the list of images, one image ID per line, that are in the train split of your captions. Name this file img_train.lst (see corpora/mpe_test_corpus/img_train.lst for an example). The graph generation process only computes denotational similarities over the images specified in this file. If you intend to compute denotational similarities based on all of your captions, simply include all of the image IDs in this file.

Possible issues

Make sure the memory allocated for Java (line 39 in run.sh) is appropriate for your machine. If you have memory issues when generating the graph, try commenting out line 39 and using lines 41-42 instead.

Also check that the number of cores allocated for parsing (line 6 in run.sh) is appropriate.

In some cases, especially for large corpora, the coref ("entity") step of graph preprocessing can be extremely slow. If this is the case, replace line 29 with line 30. This step should not affect the computed denotational probabilities.

Reading the output

The preprocessed (chunked, parsed, POS-tagged) files will be located in corpora/graph_name/. The graph files will be located in corpora/graph_name/graph/. The denotational similarity files will be located in corpora/graph_name/graph/train/ (assuming that you defined the train images). The format of these files is described in preprocessing/corpora/notes.txt.

denotationgraph's People

Contributors

aylai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

denotationgraph's Issues

NullPointerException when generating graph for MS-COCO

I am trying to generate a denotation graph for the training split of MS-COCO, but got an exception:

Exception in thread "main" java.lang.NullPointerException
at rewriteRules.SplitXOrY.applyRule(SplitXOrY.java:42)
at structure.GraphGenerator.(GraphGenerator.java:71)
at structure.GraphGenerator.main(GraphGenerator.java:157)

No more errors are reported before.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.