Coder Social home page Coder Social logo

acleanbib's Introduction

acl-cleaner

Python tool for converting bibtex to the canonical ACL anthology format.

Usage

Installation

Clone the repository and run python3 script.py [BIBTEX] [--OUTPUT]

You can then run the tool by calling acleanbib [BIBTEX] [--OUTPUT]

Requirements

  1. click
  2. bibtexparser
  3. pandas
  4. tabulate

Licence

Authors

acleanbib was written by Olamilekan Wahab.

acleanbib's People

Contributors

olamyy avatar

Stargazers

Todor Mihaylov avatar Iyanuoluwa Ajao avatar Dave Howcroft avatar Matt Post avatar Ayokunle Paul avatar  avatar

Watchers

James Cloos avatar  avatar

Forkers

acl-org mjpost

acleanbib's Issues

write to STDOUT by default

By default, I think the program should read from / write to STDIN / STDOUT. You can do this by using argparse.FileType, and setting the defaults to sys.stdin and sys.stdout. For example, the program now seems to create a file cleaned_bib.bib which is hard to know.

not working on examples?

I ran the example by typing

python3 script.py citation-220873086.bib

It prints a message to STDOUT saying that it doesn't match the first paper. Why not? That one is in the Anthology and should return the following bibtex:

@inproceedings{mueller-schuetze-2011-improved,
    title = "Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes",
    author = "Mueller, Thomas  and
      Schuetze, Hinrich",
    booktitle = "Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2011",
    address = "Portland, Oregon, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/P11-2092",
    pages = "524--528",
}

Add --concise

It would be awesome to have a --concise option that would

  • Remove the publisher line from @inproceedings
  • Removed abstracts
  • Use a conference short code ("In Proc. ACL") instead of the default long booktitle "In the Proceedings of the Nth Annual Meeting...")

For the third item, you can use venues.yaml. We could expand that file to have both short and long names so that you could do a fuzzy match on them. This feature alone would really sell the script, I think.

general issues

Hi @Olamyy,

Thanks, it's starting to take shape! I think we're getting close to something we could distribute and use, but there are a few more issues. Here are some. Perhaps we could create separate issues for these, and then you could create PRs against my fork that you could comment on?

  • --show-output should be -v|--verbose. It should write to STDERR via a logger. Instead of printing a fancy table, it would be easier to read output like this:

     Matched key $KEY to Anthology paper ID $ID
    

    where $KEY and $ID are variables defined by the cite key and the match

  • --output should also allow -o. You shouldn't also need --write_to_output. --output should be an open file stream for outputting, either a file or sys.stdout, and you just print to it, whatever it is.

  • The input file stream should default to sys.stdin, so that any of the following use cases work:

     cat in.bib | ./script.py > out.bib
     ./script.py in.bib > out.bib
     ./script.py in.bib out.bib
    
  • --concise should default to "false". This should be a flag (that doesn't require an argument of 1 to turn on, just its presence should suffice)

Finally, when rewriting, you are losing author formatting. e.g.,

    author = "Erk, Katrin  and
      Smith, Noah A.",

gets output as

  author    = "Erk, Katrin  and
Smith, Noah A.",

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.