Coder Social home page Coder Social logo

heta-io / tap Goto Github PK

View Code? Open in Web Editor NEW
17.0 11.0 17.0 87.5 MB

Text Analytics Pipeline (TAP)

Home Page: https://heta-io.github.io/tap/

License: Apache License 2.0

Scala 88.49% CSS 11.08% JavaScript 0.43%
academic-writing-analytics text-analytics play-framework graphql text-analytics-pipeline natural-language-processing

tap's Introduction

Build Status https://img.shields.io/badge/license-Apache%202.0-blue.svg

Text Analytics Pipeline (TAP)

TAP provides Text Analytics services via a GraphQL API. It is written in Scala, and uses the Play Framework, Sangria GraphQL and GraphiQL. TAP currently provides back-end analytics services for:

  • AcaWriter, a web application for Academic Writing Analytics
  • GoingOK, a web application for collecting and analysing personal reflective writing
  • Metacognition Discovery
  • A variety of Jupyter Notebooks used for tutoring in Text Analytics

If you are using TAP as a backend for something, please share by editing this page and openning a pull-request

Documentation

Refer to our current documentation here. For more background on how TAP is being used, see the main HETA webpage

Development

TAP was originally created by Andrew Gibson at the Connected Intelligence Centre (CIC), University of Technology Sydney (UTS). Development is ongoing through HETA and the founding ATN universities.

We welcome contributions to TAP. If you are interesting in contributing, take a look at the developer section of the docs, take a look at the current Issues, and or contact one of the current maintainers:

A big thank you to our contributors.

tap's People

Contributors

andrewresearch avatar josephharfouch avatar violentcrumble avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tap's Issues

Add a health enpoint

Need a health endpoint that responds with a status 200 and basic json:

{
  "message": "ok"
}

Shift model loading and NLP startup to application start

Rather than wait for first query, ensure that Factorie has loaded all models at application start time.
Also ensure the language is loaded for LanguageTool.

It would be useful to be able to query the results of this startup process. Need to write this information to whatever gets used for logging system analytics and metadata.

Generate a warning message when unexpected analytics

For example, html encoded text prevents the sentence parser from processing sentences so no results are returned. In this case, it would be helpful to return a sensible message that states that the query appeared to be encoded rather than in raw text form.

Add dl4j to enable ML capabilities

In particular, we need a common way of creating word and sentence vectors, so ideally we'll create an existing model or models, and then TAP can process new text using existing models.

Create documentation for TAP output

Can we have documentation on what features are outputted from TAP please? We will need information on what each module does and if the output is at sentence or word level, since the modules work at different levels. Also, the TAP Athanor tags, for example, are different from the XIP parser analytical tags - can we have doc on what they mean?

Fix dependencies in token annotations

Need to find correct Factorie variables for representing the depencies (as close to UD as possible) and ensure that these are correctly mapped to the token annotations.

This also needs documenting, as at the moment it could be a significant source of confusion (i.e. what is parent and child? What if there are multiple children?)

Add NER to token annotations

Need to add NER to token annotations, but care needs to be taken that it is not too time consuming to produce.

We may need to set some additional input variables, so that this can be an option?

Need an academic section splitter

We need to be able to load a text file that has common academic section headings (such as introduction, methodology/methods, results, discussion, conclusion) and retrieves the appropriate sections.

This would need to be able to pull text between major sections and strip out sub-headings.

It would also need to indicate a confidence level (i.e. if the sections were easily identifiable and therefore high confidence, or if there were complications which reduce the confidence of getting a good section).

It would be good for this code to be generalisable, so that 'profiles' could be created to suit different academic publishing formats.

Create stub files to enable commencement of work on tap documentation

The project is using paradox and sbt-site to create documentation as part of the build process.

The documentation is stored on GitHub pages.

We need the current project to be updated with stub documentation files in order to build structured documentation with sbt-site, and the GitHub pages updated with the stub pages to reflect the structure of the documentation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.