Coder Social home page Coder Social logo

app-template's Introduction

GeoDeepDive Application Template

A template and common set conventions for building applications to extract information from published and pre-processed documents in the GeoDeepDive infrastructure. For general information about this infrastucture, see the GeoDeepDive website.

Check out the wiki for more information on getting started.

License

CC-BY 4.0 International for application exceuction on GDD infrastructure applied to open access documents

CC-BY-NC 4.0 International for application exceuction on GDD infrastructure applied to select non-open access documents

app-template's People

Contributors

cambro avatar iross avatar jczaplew avatar jonhusson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

app-template's Issues

Example dataset issue

The example dataset in Getting Started appears corrupted for me. After downloading, I am unable to unzip the file (e.g. I get an "End-of-central-directory signature not found").

contact method

I am working on an application I would like to develop using the app-template. However, I don't see a contact email or method.

Remove example data from repo

Because the workflow involves downloading the generated subset from our servers, I think it makes sense to do this for the example data as well. That way, we can gitignore all files in the input directory.

testing application data set and steps

this needs to be added. currently, running the test application gives this:
cursor.execute("SELECT * FROM stringed_instruments_sentences_nlp352;") psycopg2.ProgrammingError: relation "stringed_instruments_sentences_nlp352" does not exist LINE 1: SELECT * FROM stringed_instruments_sentences_nlp352;

The reason is because the tables defined by the steps to import the testing set are not correct. I would make the testing set data = 1 data set for all steps involved here. So, let's make this data from "Getting Started":

curl -o example_input.zip https://geodeepdive.org/dev_subsets/example_input.zip unzip -j example_input.zip -d ./input rm example_input.zip

Be the same data that is used by the example application, and perhaps get rid of weird table names etc. in the postgres database.

Create list of available software

What software will be available on the GeoDeepDive infrastructure when an application is run? Add this to the "Running on GeoDeepDive Infrastructure" of the README.

GDD articles

** Collection of Articles, presentations, etc. about GeoDeepDive**

  • working compilation - not yet sorted, formatted, or curated
  • Is there any reason to maintain a central list of conference presentations, poster sessions, etc?

Article: COMPUTERS READ THE FOSSIL RECORD (View details)
Callaway, Ewen
Nature, Jul 2, 2015, Vol.523(7558), pp.115-116
url: http://www.nature.com/news/computers-read-the-fossil-record-1.17868
doi: 10.1038/523115a

Conference Proceeding: Stromatolite distribution in space and time; a machine-reading assisted quantitative analysis
Wilcots, Julia ; Husson, Jon ; Peters, Shanan E.
Geological Society of America, 2015 annual meeting & exposition, 2015, Vol.47(7), pp.365
URL: https://gsa.confex.com/gsa/2015AM/webprogram/Paper269475.html
Contents/Summary: Stromatolites are products of microbial interactions between carbonate sediments

GeoDeepDive: statistical inference using familiar data-processing languages
Zhang, Ce ; Govindaraju, Vidhya ; Borchardt, Jackson ; Foltz, Tim ; Ré, Christopher ; Peters, Shanan
Proceedings of the 2013 ACM SIGMOD International Conference on management of data, 22 June 2013, pp.993-996
URL: https://doi.org/10.1145/2463676.2463680.

Newspaper Article Scientists search 3 million publications to unlock sea change secret (View details)
Lumb, David
Engadget, Mar 30, 2017
url: https://www.engadget.com/2017/03/30/scientists-search-3-million-publications-to-unlock-sea-change-se/

Article A Machine Reading System for Assembling Synthetic Paleontological Databases
Shanan E. Peters , Ce Zhang, Miron Livny, Christopher Ré
PLOS one
Published: December 1, 2014
url: https://doi.org/10.1371/journal.pone.0113523

Newspaper Article Welcome to the Data Driven World (View details)
By Joseph Marks
Nextgov.com (Online), Apr 1, 2013
url: http://www.govexec.com/magazine/features/2013/04/welcome-data-driven-world/62196/

The Paleobiology Database application programming interface
Shanan E. Peters and Michael McClennen
Paleobiology, 42(1), 1-7.
doi: 10.1017/pab.2015.39

Article GeoDeepDive: Bringing dark data to light
Leandra Marshall
AGU Blogosphere: GeoSpace
posted May 31, 2016
url: http://blogs.agu.org/geospace/2016/05/31/geodeepdive-bringing-dark-data-light/

Articles and theses that reference GeoDeepDive:
Article Horses in the Cloud: big data exploration and mining of fossil and extant Equus (Mammalia: Equidae)
Macfadden, Bruce ; Guralnick, Robert
Paleobiology, Feb 2017, Vol.43(1), pp.1-14
URL: https://doi.org/10.1017/pab.2016.42

Comment from Aimee: I have a longer list of articles and theses that pertain to, or reference, the computing aspects of GeoDeepDive (most by Christopher Re or he is an advisor/committee member) - is there any interest in keeping a running list of these? ]

App creation tutorial

A thorough walkthrough of the app creation process would be incredibly valuable. As a wiki page would be ideal.

enhance input readme

would be good to provide clear descriptions of the source of each file product. For example, fonts.txt has this:

Information: Fonttype/Formatting recognition via a custom script. Utilizes the output of the Cuneiform OCR process.

Would be useful to supply exact pathway from document to product for each file/file group

Config updates

  • Add ability to specify entire dictionaries in the config file
  • Specify language
  • Email address

Python 3.x support?

I'd like to use a library (tensorflow_hub) that requires Python 3.5. Is there a way that I can do this? Python is a "supported language", but it's not clear what versions are supported.

credentials or credentials.yml? One or other but not both

The instructions do not agree with expectations of setup.sh. Line 19 in setup.sh read:

eval $(parse_yaml credentials)

But instruction Wiki said "edit the file credentials.yml". I updated setup.sh on my local to be consistent with Wiki, should probably update here too but wanted to check before doing the update. Also .gitignore expects credentials.yml so changed line 19:

eval $(parse_yaml credentials.yml)

terms fuzzy matching

ideally some amount of fuzzy matching to terms would be automatically used to improve recall. Example would include pluralizations (stromatolite, stromatolites) hyphenations/slashes (stromatolitic-thrombolitic strom/throm).

Fancy option might include flag for explicitly not doing this.

Require docid-sentid output

Not sure the best way to handle this in the directions, code, etc...but we need to require application users to output a CSV in the /output folder that contains docids-sentid pairs that were used in the generation of their output. From this file, we can generate a reference report for a given run of the application

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.