John Snow Labs: State-of-the-art NLP in Python

The John Snow Labs library provides a simple & unified Python API for delivering enterprise-grade natural language processing solutions:

15,000+ free NLP models in 250+ languages in one line of code. Production-grade, Scalable, trainable, and 100% open-source.
Open-source libraries for Responsible AI (NLP Test), Explainable AI (NLP Display), and No-Code AI (NLP Lab).
1,000+ healthcare NLP models and 1,000+ legal & finance NLP models with a John Snow Labs license subscription.

Homepage: https://www.johnsnowlabs.com/

Docs & Demos: https://nlp.johnsnowlabs.com/

Features

🚀 Spark-NLP : State of the art NLP at scale!
🤖 NLU : 1 line of code to conquer NLP!
🕶 Visual NLP : Empower your NLP with a set of eyes!
💊 Healthcare NLP : Heal the world with NLP!
⚖ Legal NLP : Bring justice with NLP!
💲 Finance NLP : Understand Financial Markets with NLP!
🎨 NLP-Display Visualize and Explain NLP!
📊 NLP-Test : Deliver Reliable, Safe and Effective Models!
🔬 NLP-Lab : No-Code Tool to Annotate & Train new Models!

Installation

! pip install johnsnowlabs

from johnsnowlabs import nlp
nlp.load('emotion').predict('Wow that was easy!')

See the documentation for more details.

Usage

These are examples of getting things done with one line of code. See the General Concepts Documentation for building custom pipelines.

# Example of Named Entity Recognition
nlp.load('ner').predict("Dr. John Snow is an British physician born in 1813")

Returns :

entities	entities_class	entities_confidence
John Snow	PERSON	0.9746
British	NORP	0.9928
1813	DATE	0.5841

# Example of Question Answering 
nlp.load('answer_question').predict("What is the capital of Paris")

Returns :

text	answer
What is the capital of France	Paris

# Example of Sentiment classification
nlp.load('sentiment').predict("Well this was easy!")

Returns :

text	sentiment_class	sentiment_confidence
Well this was easy!	pos	0.999901

nlp.load('ner').viz('Bill goes to New York')

Returns:
For a full overview see the 1-liners Reference and the Workshop.

Use Licensed Products

To use John Snow Labs' paid products like Healthcare NLP, [Visual NLP], [Legal NLP], or [Finance NLP], get a license key and then call nlp.install() to use it:

! pip install johnsnowlabs
# Install paid libraries via a browser login to connect to your account
from johnsnowlabs import nlp
nlp.install()
# Start a licensed session
nlp.start()
nlp.load('en.med_ner.oncology_wip').predict("Woman is on  chemotherapy, carboplatin 300 mg/m2.")

Usage

These are examples of getting things done with one line of code. See the General Concepts Documentation for building custom pipelines.

# visualize entity resolution ICD-10-CM codes 
nlp.load('en.resolve.icd10cm.augmented')
    .viz('Patient with history of prior tobacco use, nausea, nose bleeding and chronic renal insufficiency.')

returns:

# Temporal Relationship Extraction&Visualization
nlp.load('relation.temporal_events')\
    .viz('The patient developed cancer after a mercury poisoning in 1999 ')

returns:

Helpful Resources

Take a look at the official Johnsnowlabs page page: https://nlp.johnsnowlabs.com for user documentation and examples

Resource	Description
General Concepts	General concepts in the Johnsnowlabs library
Overview of 1-liners	Most common used models and their results
Overview of 1-liners for healthcare	Most common used healthcare models and their results
Overview of all 1-liner Notebooks	100+ tutorials on how to use the 1 liners on text datasets for various problems and from various sources like Twitter, Chinese News, Crypto News Headlines, Airline Traffic communication, Product review classifier training,
Connect with us on Slack	Problems, questions or suggestions? We have a very active and helpful community of over 2000+ AI enthusiasts putting Johnsnowlabs products to good use
Discussion Forum	More indepth discussion with the community? Post a thread in our discussion Forum
Github Issues	Report a bug
Custom Installation	Custom installations, Air-Gap mode and other alternatives
The `nlp.load(<Model>)` function	Load any model or pipeline in one line of code
The `nlp.load(<Model>).predict(data)` function	Predict on `Strings`, `List of Strings`, `Numpy Arrays`, `Pandas`, `Modin` and `Spark Dataframes`
The `nlp.load(<train.Model>).fit(data)` function	Train a text classifier for `2-Class`, `N-Classes` `Multi-N-Classes`, `Named-Entitiy-Recognition` or `Parts of Speech Tagging`
The `nlp.load(<Model>).viz(data)` function	Visualize the results of `Word Embedding Similarity Matrix`, `Named Entity Recognizers`, `Dependency Trees & Parts of Speech`, `Entity Resolution`,`Entity Linking` or `Entity Status Assertion`
The `nlp.load(<Model>).viz_streamlit(data)` function	Display an interactive GUI which lets you explore and test every model and feature in Johnsowlabs 1-liner repertoire in 1 click.

License

This library is licensed under the Apache 2.0 license. John Snow Labs' paid products are subject to this End User License Agreement.
By calling nlp.install() to add them to your environment, you agree to its terms and conditions.

test factory	test type	pass count	fail count	pass rate	minimum pass rate	pass
Perturbation	uppercase	34	16	68%	75%	False

Improve sampling method for `noise_prob` param by replacing with new `noise_proportion` param in robustness testing

noise_proportion = 0.5

# step 1: apply perturbation to all samples
1000 sentences -> apply contraction

# step 2: sample as many successfully augmented sentences as possible to reach noise_proportion
# we don't mind if some are already augmented
50 samples successfully contracted (augmented) + 100 already contracted

50 augmented + 50 (random.sample(n=50) from 1000 - 50) -> we don't mind if sampled ones are already augmented


noise_proportion = 0.5

contraction -> 5 samples to augment + 5 original samples -> f1 score: 0.60

uppercase -> 500 samples to augment + 500 original samples -> f1 score: 0.75

samples to augment == 0 -> "No samples to apply {test_name}, skipping this test."

samples to augment < 50  -> "Low number of samples ({n_samples}) to apply {test_name} to."
                            "F1-Score may not be representative of true perturbation effect."

total sentences —> 1000
noise_prob —> 0.5
for low augmentation coverage —> add, strip punc, accent_conversion, entity_swapping, add contraction
For this ones, we can apply some samples —> not all
add_punction —> sentence already have punctuation —> skip
Not all sentences can be contracted —> is not —> isn’t
1000 sentences —> noise prob 0.5 —> we can try to apply augmentation to only around 500, bcs of the noise prob
1000 —> 500 (added with noise prob) — 500 (will be searched for contraction)
Among 500 samples —> 25 contraction augmentation
While we are testing our perturbation —> perturbation set contains 500 original sentence and 25 augmented samples
Problem 1 -> we know 500 of them (original sentences) will be correct.
total noise samples will be 500 + 25 but we are comparing only 25 of them
this cause high f1 —> it seems like model don’t have problem in this perturbation test

johnsnowlabs / langtest Goto Github PK

langtest's Introduction

John Snow Labs: State-of-the-art NLP in Python

Features

Installation

Usage

Use Licensed Products

Usage

Helpful Resources

License

langtest's People

Contributors

Stargazers

Watchers

Forkers

langtest's Issues

Description

Tasks

Parked Ideas 🚗

Parked Ideas 🚗

Description

Tasks

Adaptations

Improvements

Bug Fixes

Adaptations

Improvements

Bug Fixes

Adaptations

Improvements

Bug Fixes

Adaptations

Improvements

Bug Fixes

Adaptations

Improvements

Bug Fixes

Parked Ideas 🚗

Description

Tasks

Description

Tasks

Recommend Projects

Recommend Topics

Recommend Org