Coder Social home page Coder Social logo

jwinman91 / ai-ner Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 511 KB

An AI-powered, but model-agnostic name-entity recognition toolkit.

License: Apache License 2.0

Python 93.92% Dockerfile 3.64% Shell 2.43%
anonymization name-entity-recognition ner nlp-machine-learning de-identification pii pii-anonymization python

ai-ner's Introduction

AI-Name-Entity-Recognizer (AI-NER): Text Editing with Language Models

This repository is designed for editing input text using a Language Model. It allows users to apply various editing prompts and various models defined in configuration files to modify the input text.

Currently, the editing prompts are written to recognize and replace name entities such as names or locations from free text and replaces all occurrences with a placeholder defined in the prompt config file.

This project aims to stay model agnostic (i.e. it can be used with a model of the user's choice) and therefore avoid any vendor lock-in.

This software functions in a way like a smart editor. E.g. it can anonymize names in a text or exchange name entities for a batch of emails.

Table of Contents

Installation

To use the AI-NER, follow these steps:

  1. Clone the repository:
git clone https://github.com/jWinman91/ai-extractor.git
cd ai-extractor
  1. Install the required dependencies:
pip install -r requirements.txt
  1. Download a model of your choice into models. I recommend the following models from Hugging Face for German text:

Each model can be downloaded by using wget, e.g.: wget https://huggingface.co/TheBloke/SauerkrautLM-7B-v1-mistral-GGUF/resolve/main/sauerkrautlm-7b-v1-mistral.Q4_0.gguf

Configuration

In order to use this repository, several configuration need to be set for the model as well as the NER tasks to extract name entities. These can be set in two types of configuration files, config_model and config_prompt.

  • config_model sets all configurations necessary for the respective model.
  • config_promptsets the configurations for the NER tasks (e.g. which model to choose and with what to replace the identified name entity).

TODO

Usage

After setting the configuration and downloading one (or more) of the models, you can simply use AI-NER by running:

python main.py $PATH_TO_INPUT $PATH_TO_OUTPUT

Example

An example text file is added in data/input/email_example_de.txt, which is a self-written email in German. There are also pre-defined config_model and config_prompt files. By running AI-NER with the anonymize_example_email.yaml prompt configuration and german_mistral.yaml as well as the flair.yaml model configuration, we can now anonymize certain entities in the example email.

Below are an image of before and after running python main.py on the email using the anonymize_emails-NER.yaml config file.

Email before Email after

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • NLTK - Natural Language Toolkit used for sentence tokenization.
  • Hugging Face - Framework for working with state-of-the-art natural language processing models.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.