heta-io / tap Goto Github PK

View Code? Open in Web Editor NEW

17.0 11.0 17.0 87.5 MB

Text Analytics Pipeline (TAP)

Home Page: https://heta-io.github.io/tap/

License: Apache License 2.0

Scala 88.49% CSS 11.08% JavaScript 0.43%

academic-writing-analytics text-analytics play-framework graphql text-analytics-pipeline natural-language-processing

tap's Introduction

Text Analytics Pipeline (TAP)

TAP provides Text Analytics services via a GraphQL API. It is written in Scala, and uses the Play Framework, Sangria GraphQL and GraphiQL. TAP currently provides back-end analytics services for:

AcaWriter, a web application for Academic Writing Analytics
GoingOK, a web application for collecting and analysing personal reflective writing
Metacognition Discovery
A variety of Jupyter Notebooks used for tutoring in Text Analytics

If you are using TAP as a backend for something, please share by editing this page and openning a pull-request

Documentation

Refer to our current documentation here. For more background on how TAP is being used, see the main HETA webpage

Development

TAP was originally created by Andrew Gibson at the Connected Intelligence Centre (CIC), University of Technology Sydney (UTS). Development is ongoing through HETA and the founding ATN universities.

We welcome contributions to TAP. If you are interesting in contributing, take a look at the developer section of the docs, take a look at the current Issues, and or contact one of the current maintainers:

A big thank you to our contributors.

tap's People

Contributors

Stargazers

Watchers

Forkers

josephharfouch quanie antonetteshibani umsi-data-science andrewresearch uts-cic curtinids martinbarge uio-cell huyuxin0429 capesepias

tap's Issues

Implement text/metrics

From @andrewresearch on May 26, 2017 1:59

Can include sentence length

Copied from original issue: uts-cic/tap-api#6

Add a health enpoint

Need a health endpoint that responds with a status 200 and basic json:

{
  "message": "ok"
}

Add endpoint for raw athanor

Shift model loading and NLP startup to application start

Rather than wait for first query, ensure that Factorie has loaded all models at application start time.
Also ensure the language is loaded for LanguageTool.

It would be useful to be able to query the results of this startup process. Need to write this information to whatever gets used for logging system analytics and metadata.

Add moves unit and integration tests

This can be copied from athanor server, but also need to include basic connection integration tests

Add vocabulary unit tests

Need to identify temporal markers

Implement text/expressions

From @andrewresearch on May 26, 2017 1:58

Need to implement expression analysis and attach to endpoint

Copied from original issue: uts-cic/tap-api#5

Implement sentence length detection for journalism writing

From @andrewresearch on May 24, 2017 4:13

Copied from original issue: uts-cic/tap-api#4

Create python notebook for annotations example

Create python notebook for vocabulary example

Implement section level sentiment analysis

This is different than the affective expressions, but could contribute to them.

Generate a warning message when unexpected analytics

For example, html encoded text prevents the sentence parser from processing sentences so no results are returned. In this case, it would be helpful to return a sensible message that states that the query appeared to be encoded rather than in raw text form.

Add spelling unit tests

Create python notebook for spelling example

Add annotations unit tests

Travis CI Integration - fix dependencies

From @andrewresearch on July 13, 2017 2:6

Allow for nlytx-commons dependency for travis CI build.

Copied from original issue: uts-cic/tap-api#18

Automate blue-green deployment of releases to AWS ECS

We need to automate the deployment of releases to ECS.

Suggested approach is blue/green deployment

Code is available on the awslabs github page

Create python notebook for metrics example

Add dl4j to enable ML capabilities

In particular, we need a common way of creating word and sentence vectors, so ideally we'll create an existing model or models, and then TAP can process new text using existing models.

Add syllables unit tests

Add custom http exceptions for malformed graphQL queries

Create documentation for TAP output

Can we have documentation on what features are outputted from TAP please? We will need information on what each module does and if the output is at sentence or word level, since the modules work at different levels. Also, the TAP Athanor tags, for example, are different from the XIP parser analytical tags - can we have doc on what they mean?

Users need to be able to initiate analysis on selected S3 corpora

Fix dependencies in token annotations

Need to find correct Factorie variables for representing the depencies (as close to UD as possible) and ensure that these are correctly mapped to the token annotations.

This also needs documenting, as at the moment it could be a significant source of confusion (i.e. what is parent and child? What if there are multiple children?)

Implement text/complexity

Implement monitor (stats)

Create demo scala notebook template

Create demo python notebook template

Create python notebook for expressions example

Implement text/syllables

Monitoring metrics for TAP

Need to be able to monitor tap performance and the retrieve near real time stats

Update scala-docs and test with sbt

From @andrewresearch on July 13, 2017 8:3

Copied from original issue: uts-cic/tap-api#22

Add query time to analytics results

Implement affective expressions

Test issue

Add NER to token annotations

Need to add NER to token annotations, but care needs to be taken that it is not too time consuming to produce.

We may need to set some additional input variables, so that this can be an option?

Implement text/spelling

Implement AWA specific GraphQL endpoint to provide for additional AWA specific queries

As AWA may need to interact with TAP differently from other services, a dedicated AWA graphql endpoint would allow AWA specific schema additions. Also it may be possible to manage performance better by collecting different stats for AWA and general requests.

Separate parsing results in query

From @andrewresearch on August 31, 2017 0:51

Copied from original issue: uts-cic/tap-graphql#5

Add metrics unit tests

Add expressions unit tests

Need an academic section splitter

We need to be able to load a text file that has common academic section headings (such as introduction, methodology/methods, results, discussion, conclusion) and retrieves the appropriate sections.

This would need to be able to pull text between major sections and strip out sub-headings.

It would also need to indicate a confidence level (i.e. if the sections were easily identifiable and therefore high confidence, or if there were complications which reduce the confidence of getting a good section).

It would be good for this code to be generalisable, so that 'profiles' could be created to suit different academic publishing formats.

Create stub files to enable commencement of work on tap documentation

The project is using paradox and sbt-site to create documentation as part of the build process.

The documentation is stored on GitHub pages.

We need the current project to be updated with stub documentation files in order to build structured documentation with sbt-site, and the GitHub pages updated with the stub pages to reflect the structure of the documentation.

This could solve issue #17 and #31

heta-io / tap Goto Github PK

tap's Introduction

Text Analytics Pipeline (TAP)

Documentation

Development

tap's People

Contributors

Stargazers

Watchers

Forkers

tap's Issues

Recommend Projects

Recommend Topics

Recommend Org