Coder Social home page Coder Social logo

explosion / prodigy-recipes Goto Github PK

View Code? Open in Web Editor NEW
463.0 27.0 116.0 15.92 MB

๐Ÿณ Recipes for the Prodigy, our fully scriptable annotation tool

Home Page: https://prodi.gy

Python 19.10% HTML 0.68% Makefile 0.18% Jupyter Notebook 80.00% JavaScript 0.04%
artificial-intelligence machine-learning data-science annotation annotation-tool data-annotation machine-teaching labeling-tool computer-vision spacy

prodigy-recipes's Introduction

Prodigy Recipes

This repository contains a collection of recipes for Prodigy, our scriptable annotation tool for text, images and other data. In order to use this repo, you'll need a license for Prodigy โ€“ see this page for more details. For questions and bug reports, please use the Prodigy Support Forum. If you've found a mistake or bug, feel free to submit a pull request.

โœจ Important note: The recipes in this repository aren't 100% identical to the built-in recipes shipped with Prodigy. They've been edited to include comments and more information, and some of them have been simplified to make it easier to follow what's going on, and to use them as the basis for a custom recipe.

๐Ÿ“‹ Usage

Once Prodigy is installed, you should be able to run the prodigy command from your terminal, either directly or via python -m:

python -m prodigy

The prodigy command lists the built-in recipes. To use a custom recipe script, simply pass the path to the file using the -F argument:

python -m prodigy ner.teach your_dataset en_core_web_sm ./data.jsonl --label PERSON -F prodigy-recipes/ner/ner_teach.py

You can also use the --help flag for an overview of the available arguments of a recipe, e.g. prodigy ner.teach -F ner_teach_.py --help.

Some things to try

You can edit the code in the recipe script to customize how Prodigy behaves.

  • Try replacing prefer_uncertain() with prefer_high_scores().
  • Try writing a custom sorting function. It just needs to be a generator that yields a sequence of example dicts, given a sequence of (score, example) tuples.
  • Try adding a filter that drops some questions from the stream. For instance, try writing a filter that only asks you questions where the entity is two words long.
  • Try customizing the update() callback, to include extra logging or extra functionality.

๐Ÿณ Recipes

Named Entity Recognition

Recipe Description
ner.teach Collect the best possible training data for a named entity recognition model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next.
ner.match Suggest phrases that match a given patterns file, and mark whether they are examples of the entity you're interested in. The patterns file can include exact strings or token patterns for use with spaCy's Matcher.
ner.manual Mark spans manually by token. Requires only a tokenizer and no entity recognizer, and doesn't do any active learning. Optionally, pre-highlight spans based on patterns.
ner.fuzzy_manual Like ner.manual but use FuzzyMatcher from spaczz library to pre-highlight candidates.
ner.manual.bert Use BERT word piece tokenizer for efficient manual NER annotation for transformer models.
ner.correct Create gold-standard data by correcting a model's predictions manually. This recipe used to be called ner.make_gold.
ner.silver-to-gold Take an existing "silver" dataset with binary accept/reject annotations, merge the annotations to find the best possible analysis given the constraints defined in the annotations, and manually edit it to create a perfect and complete "gold" dataset.
ner.eval_ab Evaluate two NER models by comparing their predictions and building an evaluation set from the stream.
ner_fuzzy_manual Mark spans manually by token with suggestions from spaczz fuzzy matcher pre-highlighted.

Text Classification

Recipe Description
textcat.manual Manually annotate categories that apply to a text. Supports annotation tasks with single and multiple labels. Multiple labels can optionally be flagged as exclusive.
textcat.correct Correct the textcat model's predictions manually. Predictions above the acceptance threshold will be automatically preselected (0.5 by default). Prodigy will infer whether the categories should be mutualy exclusive based on the component configuration.
textcat.teach Collect the best possible training data for a text classification model with the model in the loop. Based on your annotations, Prodigy will decide which questions to ask next.
textcat.custom-model Use active learning-powered text classification with a custom model. To demonstrate how it works, this demo recipe uses a simple dummy model that "predicts" random scores. But you can swap it out for any model of your choice, for example a text classification model implementation using PyTorch, TensorFlow or scikit-learn.

Terminology

Recipe Description
terms.teach Bootstrap a terminology list with word vectors and seeds terms. Prodigy will suggest similar terms based on the word vectors, and update the target vector accordingly.

Image

Recipe Description
image.manual Manually annotate images by drawing rectangular bounding boxes or polygon shapes on the image.
image-caption Annotate images with captions, pre-populate captions with image captioning model implemented in PyTorch and perform error analysis.
image.frozenmodel Model in loop manual annotation using Tensorflow's Object Detection API.
image.servingmodel Model in loop manual annotation using Tensorflow's Object Detection API. This uses Tensorflow Serving
image.trainmodel Model in loop manual annotation and training using Tensorflow's Object Detection API.

Other

Recipe Description
mark Click through pre-prepared examples, with no model in the loop.
choice Annotate data with multiple-choice options. The annotated examples will have an additional property "accept": [] mapping to the ID(s) of the selected option(s).
question_answering Annotate question/answer pairs with a custom HTML interface.

Community recipes

Recipe Author Description
phrases.teach @kabirkhan Now part of sense2vec.
phrases.to-patterns @kabirkhan Now part of sense2vec.
records.link @kabirkhan Link records across multiple datasets using the dedupe library.

Tutorial recipes

These recipes have made an appearance in one of our tutorials.

Recipe Description
span-and-textcat Do both spancat and textcat annotations at the same time. Great for chatbots!
terms.from-ner Generate terms from previous NER annotations.
audio-with-transcript Handles both manual audio annotation as well as transcription.
progress Demo of an update-callback that tracks annotation speed.

๐Ÿ“š Example Datasets and Patterns

To make it even easier to get started, we've also included a few example-datasets, both raw data as well as data containing annotations created with Prodigy. For examples of token-based match patterns to use with recipes like ner.teach or ner.match, see the example-patterns directory.

prodigy-recipes's People

Contributors

abhijit-2592 avatar christopher-delphai avatar dependabot[bot] avatar honnibal avatar ines avatar jette16 avatar kabirkhan avatar koaning avatar magdaaniol avatar stefan-it avatar svlandeg avatar wesslen avatar zbenmo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.