Coder Social home page Coder Social logo

Comments (7)

omri374 avatar omri374 commented on May 17, 2024

Hi @Idomingog, Presidio only supports spaCy and Stanza as NLP engines. However, you can use Flair as an additional recognizer within presidio. In this scenario, presidio would call the Flair recognizer the same way it would call any other recognizer, and extract results coming from flair.

See some details here:

from presidio-research.

Idomingog avatar Idomingog commented on May 17, 2024

Thanks for your advices. I have included a new flair recognizer to my project.

from presidio-research.

omri374 avatar omri374 commented on May 17, 2024

Thanks! If you'd like to add this as a sample to the main repo, this would be a great contribution and I would be happy to help with reviewing or anything else.

from presidio-research.

Idomingog avatar Idomingog commented on May 17, 2024

Hi,

I'm very happy to share the class, but it's not "finished". It works for my needs, but there are a couple of things to improve.

  • The code doesn't accept languages, it's set up for spanish.
  • I didn't find how to manage the flair entity tags [MISC, ORG, PER, LOC], I don't know how to find it, I found the original ones like E-LOC, I-LOC, S-LOC, etc.... I tried to define it like a list, but with neither of two options I could use it in the analyzer.analyze(text, language, entities, score_threshold) call. The selection of the entities doesn't work.
  • I'm not sure if it's the best option to load the model.
  • For sure there are other issues that I didn't notice.

Best regards.

from presidio_analyzer import AnalyzerEngine, EntityRecognizer
from flair.data import Sentence
from flair.models import SequenceTagger

class EsFlairRecognizer(EntityRecognizer):

    def __init__(
        self,
        supported_language: str = "es",
        supported_entities: List[str] = [],
        ner_strength: float = 0.85,
        name: str = "esflairRecognizer",
        version: str = "0.1",
        model: SequenceTagger = None,
    ):
        self.supported_language = supported_language
        self.supported_entities = supported_entities
        self.ner_strength = ner_strength
        self.version = version   
        self.name = name
        self.model = SequenceTagger.load("flair/ner-spanish-large")
        
        super().__init__(
            supported_entities=self.get_supported_entities(),
            supported_language=supported_language,
            name="Flair Analytics"
        )
    
    def get_supported_entities(self) -> List[str]:
        """
        Supported Entities by flair/ner-spanish-large model.

        :return: List of the supported entities.
        """
        return self.model.tag_dictionary.get_items() #['E-LOC', 'I-LOC', 'S-LOC', ....]   
    
    def load(self) -> None:
        """No loading is required."""
        pass


    
    def analyze(
        self, text: str, entities: List[str] = [], nlp_artifacts: NlpArtifacts = None
    ) -> List[RecognizerResult]:
        """
        Analyze text using Text Analytics.

        :param text: The text for analysis.
        :param entities: Not working properly for this recognizer.
        :param nlp_artifacts: Not used by this recognizer.
        :return: The list of Presidio RecognizerResult constructed from the recognized
            Flair detections.
        """
            
        sentences = Sentence(text)        
        
        self.model.predict(sentences)
    

        return [
            self._convert_to_recognizer_result(categorized_entity)
            for categorized_entity in sentences.get_spans('ner')
        ]


    def _convert_to_recognizer_result(
        self, categorized_entity
    ) -> RecognizerResult:

        entity_type = categorized_entity.tag
        
        explanation = EsFlairRecognizer._build_explanation(
            original_score=round(categorized_entity.score, 2),
            entity_type=entity_type
        )
        
        flair_results = RecognizerResult(
            entity_type=entity_type,
            start=categorized_entity.start_pos,
            end=categorized_entity.end_pos,
            score=round(categorized_entity.score, 2),
            analysis_explanation=explanation
        )
        
        return flair_results


    @staticmethod
    def _build_explanation(
        original_score: float, entity_type: str
    ) -> AnalysisExplanation:
        """
        Create explanation for why this result was detected.

        :param original_score: Score given by this recognizer
        :param explanation: Explanation string
        :return:
        """
        explanation = AnalysisExplanation(
            recognizer=EsFlairRecognizer.__class__.__name__,
            original_score=original_score,
            textual_explanation=f"Identified as {entity_type} by Flair Recognizer",
        )

        return explanation

from presidio-research.

omri374 avatar omri374 commented on May 17, 2024

This is great! I do have some suggestions to generalize this code. If you'd create a PR on the Presidio repo, I'd be happy to provide specific comments and improvements. Would that work?

If you do, consider putting it under the samples folder, as we would rather not have flair (and torch) as a dependency for Presidio at this point in time.

from presidio-research.

Idomingog avatar Idomingog commented on May 17, 2024

Hi, I just put an example with the code in the samples folder and open a PR.
Thanks for your help.

from presidio-research.

omri374 avatar omri374 commented on May 17, 2024

Thanks!! I quickly took a look and it is great. We'll do a more formal review in the next few days

from presidio-research.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.