Coder Social home page Coder Social logo

sner's Introduction

CircleCI

Serverless Stanford Named Entity Recognizer

This project enables you to deploy the Stanford Named Entity Recognizer (NER) to a "serverless" environment based on AWS Lambda and API Gateway.

Why?

The general advantages of serverless computing include cost, scalability and productivity. Specifically, these translate to:

  • The ability to analyse text in virtually any environment - most notably from the browser
  • Processing a large number of texts concurrently - potentially thousands
  • Ease and speed of iteration - just deploy with one command after making changes to your models or label interpretation logic

How?

Getting started

  1. Make sure you have the following installed on your machine:

    Or

  2. Sign up for an AWS account

  3. Configure your AWS credentials for deployment with the Serverless framework. Make sure these are set up as the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY if working with docker.

  4. Install dependencies:

    • With docker:

      docker build -t sner .
      

      Or

    • With Node/JDK/Maven: Install the Serverless dependencies using the command in the project root directory:

       npm install
      

Deploying to AWS

With docker:

docker run --rm -it  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY sner npm run deploy -- --stage=dev

Or

With Node/JDK/Maven:

npm run deploy -- --stage=dev

You should see your POST and GET endpoints displayed after a successful deployment e.g.

...
endpoints:
  POST - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
  GET - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
...

Trying it out

You can try using the GET endpoint by simply appending the query parameter "text" to it along with the text you wish to analyse e.g.

https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities?text=Stanford University is located in Silicon Valley and was founded in November 1885

Response:

{
  "ORGANIZATION": [
    {
      "name": "Stanford University",
      "count": 1
    }
  ],
  "LOCATION": [
    {
      "name": "Silicon Valley",
      "count": 1
    }
  ],
  "DATE": [
    {
      "name": "November 1885",
      "count": 1
    }
  ]
}

Example payload for the POST endpoint:

{
  "text": "Stanford University is located in Silicon Valley and was founded in November 1885"
}

What?

Label interpretation logic

The "business logic" lives in the EntityExtractor class and processes text in the following way:

  1. Finds labels associated with each word in a string using the CoreNLP library
  2. Filters the labels to leave only those corresponding to named entities
  3. Extracts the names, types and number of times each entity occurs in the text from the remaining labels
  4. Groups the entity names and counts by their types

Configuration

The pom.xml and serverless.yml files contain most of the important settings in this project.

  • Select the models you wish to use in the pom.xml <properties> and <build> sections:
<project>
<!--...-->
<properties>
    <!--...-->
    <ner.model1>english.all.3class.distsim</ner.model1>
    <ner.model2>english.conll.4class.distsim</ner.model2>
    <ner.model3>english.muc.7class.distsim</ner.model3>
    <!--...-->
</properties>
<!--...-->
  <build>
    <plugins>
      <!--...-->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <!--...-->
        <configuration>
          <!--...-->
          <filters>
            <filter>
              <!-- This minimises the output jar file size to remain within the [Lambda limits](https://docs.aws.amazon.com/lambda/latest/dg/limits.html) by only including your selected models -->
              <includes>
                <include>${ner.prefix}${ner.model1}.*</include>
                <include>${ner.prefix}${ner.model2}.*</include>
                <include>${ner.prefix}${ner.model3}.*</include>
              </includes>
            </filter>
          </filters>
        </configuration>
        <!--...-->
      </plugin>
    <!--...-->
    </plugins>
  </build>
  <!--...-->
</project>

  • Update the CoreNLP library version in the pom.xml <properties> section:
<properties>
    <nlp.version>3.9.1</nlp.version>
    <!--...-->
</properties>

sner's People

Contributors

jabrythehutt avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.