Coder Social home page Coder Social logo

anujsahani01 / classification-project Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 1.0 1.43 MB

Intent and Entity Extraction and Classification from audio files

Jupyter Notebook 100.00%
bert entity-recognition huggingface intent-detection pyannotate speech-recognition speech-to-text wav2vec2 xlmroberta

classification-project's Introduction

Intent and Entity Extraction and Classification from audio files

image

Description:

The objective of this project is to develop a system that performs intent and entity extraction from audio files. The system involves converting speech to text while incorporating speaker recognition, and utilizes pretrained models from the 🤗 (Hugging Face) library for entity and intent classification. The end result is a JSON file containing the audio data, along with accurately identified intents and classified entities.

Project Overview:

  • Speaker Recognition: In this stage, the pyannote/voice-activity-detection model is employed. This model follows a two-stage process. First, it extracts relevant features from the audio signals. Second, a classification algorithm is applied to these extracted features, resulting in the identification of the speaker.

  • Speech to Text conversion: The jonatasgrosman/wav2vec2-large-xlsr-53-english model is used for converting speech to text. This model is based on Facebook's wav2vec2 model and has been fine-tuned for speech-to-text conversion. It comprises a speech-to-text processor that combines a feature extractor and a tokenizer. The audio is tokenized into numpy arrays or tensors, producing a textual representation of the speech.

  • Intents Recognition: Intent recognition is accomplished using the qanastek/XLMRoberta-Alexa-Intents-Classification model. This model is trained on a large dataset of text data that is labeled with specific intents. It is capable of classifying the extracted text into predefined intent categories, providing insight into the purpose or goal behind the spoken words.

  • Entity Recognition and Classification: The huggingface-course/bert-finetuned-ner model is employed for entity recognition and classification. This model is trained on extensive text data where entities are labeled. It is able to identify and classify various entities present in the extracted text, such as names, dates, locations, and more. By utilizing this model, the system accurately recognizes and categorizes entities within the spoken text.

Used following 🤗models for performing tasks:

  • pyannote/voice-activity-detection: This model performs speaker recognition by extracting useful features from audio signals and applying a classification algorithm to determine the speaker's identity.

  • jonatasgrosman/wav2vec2-large-xlsr-53-english: This model is specifically designed for speech-to-text conversion. It is based on Facebook's wav2vec2 model and includes a speech-to-text processor that combines feature extraction and tokenization to convert audio into numpy arrays or tensors.

  • qanastek/XLMRoberta-Alexa-Intents-Classification: This model is trained on a large corpus of text data with labeled intents. It can classify the extracted text into different intent categories, providing insight into the purpose behind the spoken words.

  • huggingface-course/bert-finetuned-ner: This model is trained on extensive text data with labeled entities. It performs entity recognition and classification, identifying and categorizing various entities present in the extracted text.

By utilizing these pretrained models and combining the different stages, the system can accurately extract intents and entities from audio files. The resulting JSON file provides a structured representation of the audio data, along with the associated intents and entities. This enables further analysis and processing of the audio content in various domains, such as voice assistants, call center analytics, and automated transcription services.

Feedback

If you have any feedback, please reach out to me at: LinkedIn

Author: @anujsahani01

classification-project's People

Contributors

anujsahani01 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

iamazadak

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.