Serverless Stanford Named Entity Recognizer

This project enables you to deploy the Stanford Named Entity Recognizer (NER) to a "serverless" environment based on AWS Lambda and API Gateway.

Why?

The general advantages of serverless computing include cost, scalability and productivity. Specifically, these translate to:

The ability to analyse text in virtually any environment - most notably from the browser
Processing a large number of texts concurrently - potentially thousands
Ease and speed of iteration - just deploy with one command after making changes to your models or label interpretation logic

How?

Getting started

Make sure you have the following installed on your machine:
- Docker
Or
- Node >= 8
- JDK >= 8
- Maven
Sign up for an AWS account
Configure your AWS credentials for deployment with the Serverless framework. Make sure these are set up as the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY if working with docker.
Install dependencies:
- With docker:
```
docker build -t sner .
```
  Or
- With Node/JDK/Maven: Install the Serverless dependencies using the command in the project root directory:
```
 npm install
```

Deploying to AWS

With docker:

docker run --rm -it  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY sner npm run deploy -- --stage=dev

With Node/JDK/Maven:

npm run deploy -- --stage=dev

You should see your POST and GET endpoints displayed after a successful deployment e.g.

...
endpoints:
  POST - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
  GET - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
...

Trying it out

You can try using the GET endpoint by simply appending the query parameter "text" to it along with the text you wish to analyse e.g.

https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities?text=Stanford University is located in Silicon Valley and was founded in November 1885

Response:

{
  "ORGANIZATION": [
    {
      "name": "Stanford University",
      "count": 1
    }
  ],
  "LOCATION": [
    {
      "name": "Silicon Valley",
      "count": 1
    }
  ],
  "DATE": [
    {
      "name": "November 1885",
      "count": 1
    }
  ]
}

Example payload for the POST endpoint:

{
  "text": "Stanford University is located in Silicon Valley and was founded in November 1885"
}

What?

Label interpretation logic

The "business logic" lives in the EntityExtractor class and processes text in the following way:

Finds labels associated with each word in a string using the CoreNLP library
Filters the labels to leave only those corresponding to named entities
Extracts the names, types and number of times each entity occurs in the text from the remaining labels
Groups the entity names and counts by their types

Configuration

The pom.xml and serverless.yml files contain most of the important settings in this project.

Select the models you wish to use in the pom.xml <properties> and <build> sections:

<project>
<!--...-->
<properties>
    <!--...-->
    <ner.model1>english.all.3class.distsim</ner.model1>
    <ner.model2>english.conll.4class.distsim</ner.model2>
    <ner.model3>english.muc.7class.distsim</ner.model3>
    <!--...-->
</properties>
<!--...-->
  <build>
    <plugins>
      <!--...-->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <!--...-->
        <configuration>
          <!--...-->
          <filters>
            <filter>
              <!-- This minimises the output jar file size to remain within the [Lambda limits](https://docs.aws.amazon.com/lambda/latest/dg/limits.html) by only including your selected models -->
              <includes>
                <include>${ner.prefix}${ner.model1}.*</include>
                <include>${ner.prefix}${ner.model2}.*</include>
                <include>${ner.prefix}${ner.model3}.*</include>
              </includes>
            </filter>
          </filters>
        </configuration>
        <!--...-->
      </plugin>
    <!--...-->
    </plugins>
  </build>
  <!--...-->
</project>

Update the CoreNLP library version in the pom.xml <properties> section:

<properties>
    <nlp.version>3.9.1</nlp.version>
    <!--...-->
</properties>

Change the AWS Lambda name, memory, region in the serverless.yml file
Configure your endpoints in the serverless.yml file

andy-wagner / sner Goto Github PK

sner's Introduction

Serverless Stanford Named Entity Recognizer

Why?

How?

Getting started

Deploying to AWS

Trying it out

What?

Label interpretation logic

Configuration

sner's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent