Coder Social home page Coder Social logo

klaam's Introduction

Motivation

As you know machine learning has proven its importance in many fields, like computer vision, NLP, reinforcement learning, adversarial learning, etc .. Unfortunately, there is a little work to make machine learning accessible for Arabic-speaking people.

Goal

Our goal is to enrich the Arabic content by creating open-source projects and open the community eyes on the significance of machine learning. We want to create interactive applications that allow novice Arabs to learn more about machine learning and appreciate its advances.

Challenges

Arabic language has many complicated features compared to other languages. First, Arabic language is written right to left. Second, it contains many letters that cannot be pronounced by most foreigners like ض ، غ ، ح ، خ، ظ. Moreover, Arabic language contains special characters called Diacritics which are special characters that help readers pronounced words correctly. For instance the statement السَّلامُ عَلَيْكُمْ وَرَحْمَةُ اللَّهِ وَبَرَكَاتُهُ containts special characters after most of the letters. The diactrics follow special rules to be given to a certain character. These rules are construct a complete area called النَّحْوُ الْعَرَبِيُّ. Compared to English, the Arabic language words letters are mostly connected اللغة as making them disconnected is difficult to read ا ل ل غ ة. Finally, there as many as half a billion people speaking Arabic which resulted in many dialects in different countires.

Procedure

Our procedure is generalized and can be generalized to many language models not just Arabic. This standrized approach takes part as multiple steps starting from training on colab then porting the models to the web.

Models

Name Description Notebook Demo
Arabic Diacritization Simple RNN model ported from Shakkala
Arabic2English Translation seq2seq with Attention
Arabic Poem Generation CharRNN model with multinomial distribution
Arabic Words Embedding N-Grams model ported from Aravec
Arabic Sentiment Classification RNN with Bidirectional layer
Arabic Image Captioning Encoder-Decoder architecture with attention
Arabic Word Similarity Embedding layers using cosine similarity
Arabic Digits Classification Basic RNN model with classification head
Arabic Speech Recognition Basic signal processing and classification
Arabic Object Detection SSD Object detection model
Arabic Poems Meter Classification Bidirectional GRU
Arabic Font Classification CNN
Arabic Text Detection Optical Character Recognition (OCR)

Datasets

Name Description
Arabic Digits 70,000 images (28x28) converted to binary from Digits
Arabic Letters 16,759 images (32x32) converted to binary from Letters
Arabic Poems 146,604 poems scrapped from aldiwan
Arabic Translation 100,000 paralled arabic to english translation ported from OpenSubtitles
Product Reviews 1,648 reviews on products ported from Large Arabic Resources For Sentiment Analysis
Image Captions 30,000 Image paths with captions extracted and translated from COCO 2014
Arabic Wiki 4,670,509 words cleaned and processed from Wikipedia Monolingual Corpora
Arabic Poem Meters 55,440 verses with their associated meters collected from aldiwan
Arabic Fonts 516 100×100 images for two classes.

Tools

To make models easily accessible by contributers, developers and novice users we use two approaches

Google Colab

Google colaboratory is a free service that is offered by Google for research purposes. The interface of a colab notebook is very similar to jupyter notebooks with slight differences. Google offers three hardware accelerators CPU, GPU and TPU for speeding up training. We almost all the time use GPU because it is easier to work with and acheives good results in a reasonable time. Check this great tutorial on medium.

TensorFlow.js

TensorFlow.js is part of the TensorFlow ecosystem that supports training and inference of machine learning models in the browser. Please check these steps if you want to port models to the web:

  1. Use keras to train models then save the model as model.save('keras.h5')

  2. Install the TensorFlow.js converter using pip install tensorflowjs

  3. Use the following script to tensorflowjs_converter --input_format keras keras.h5 model/

  4. The model directory will contain the files model.json and weight files same to group1-shard1of1

  5. Finally you can load the model using TensorFlow.js

Check this tutorial that I made for the complete procedure.

Website

We developed many models to run directly in the browser. Using TensorFlow.js the models run using the client GPU. Since the webpage is static there is no risk of privacy or security. You can visit the website here . Here is the main intefrace of the website

The added models so far

Poems Generation

English Translation

Words Embedding

Sentiment Classification

Image Captioning

Diactrization

Contribution

Check the CONTRIBUTING.md for a detailed explanantion about how to contribute.

Resources

As a start we will start on Github for hosting the website, models, datasets and other contents. Unfortunately, there is a limitation on the space that will hunt us in the future. Please let us know what you suggest on that matter.

Contributors

Thanks goes to these wonderful people (emoji key):


MagedSaeed

🎨 🤔 📦

March Works

🤔

Mahmoud Aslan

🤔 💻

This project follows the all-contributors specification. Contributions of any kind welcome!

Citation

@inproceedings{alyafeai-al-shaibani-2020-arbml,
    title = "{ARBML}: Democritizing {A}rabic Natural Language Processing Tools",
    author = "Alyafeai, Zaid  and
      Al-Shaibani, Maged",
    booktitle = "Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.nlposs-1.2",
    pages = "8--13",
}

klaam's People

Contributors

ahmed-ashraf-marzouk avatar ma7dev avatar magedsaeed avatar mustafa0x avatar zaidalyafeai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

klaam's Issues

Improving the classification model

The classification model needs improvement. The accuracy on the test set is around 62% on the five classes. Here is the model used

class Wav2Vec2ClassificationModel(Wav2Vec2PreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        
        self.wav2vec2 = Wav2Vec2Model(config)
        
        self.inner_dim = 128
        self.feature_size = 999
        
        self.tanh = nn.Tanh()
        self.linear1 = nn.Linear(1024, self.inner_dim)
        self.linear2 = nn.Linear(self.inner_dim*self.feature_size, 5)
        self.init_weights()
        
    def freeze_feature_extractor(self):
        self.wav2vec2.feature_extractor._freeze_parameters()

    def forward(
        self,
        input_values,
        attention_mask=None,
        output_attentions=None,
        output_hidden_states=None,
        return_dict=None,
        labels=None
    ):
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        
        outputs = self.wav2vec2(
            input_values,
            attention_mask=attention_mask,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
            return_dict=return_dict,
        )
        x = self.linear1(outputs[0]) 
        x = self.tanh(x)
        x = self.linear2(x.view(-1, self.inner_dim*self.feature_size))
        return {'logits':x}

argparse.ArgumentError appears when trying to train the module

I tried to train the module with both scripts that are in the readme file, and both resulted in argparse.ArgumentError
I tried running:

python run_mgb3.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --output_dir=/path/to/output \
    --cache_dir=/path/to/cache/ \
    --freeze_feature_extractor \
    --num_train_epochs="50" \
    --per_device_train_batch_size="32" \
    --preprocessing_num_workers="1" \
    --learning_rate="3e-5" \
    --warmup_steps="20" \
    --evaluation_strategy="steps"\
    --save_steps="100" \
    --eval_steps="100" \
    --save_total_limit="1" \
    --logging_steps="100" \
    --do_eval \
    --do_train \

and also

python run_common_voice.py \
    --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
    --dataset_config_name="ar" \
    --output_dir=/path/to/output/ \
    --cache_dir=/path/to/cache \
    --overwrite_output_dir \
    --num_train_epochs="1" \
    --per_device_train_batch_size="32" \
    --per_device_eval_batch_size="32" \
    --evaluation_strategy="steps" \
    --learning_rate="3e-4" \
    --warmup_steps="500" \
    --fp16 \
    --freeze_feature_extractor \
    --save_steps="10" \
    --eval_steps="10" \
    --save_total_limit="1" \
    --logging_steps="10" \
    --group_by_length \
    --feat_proj_dropout="0.0" \
    --layerdrop="0.1" \
    --gradient_checkpointing \
    --do_train --do_eval \
    --max_train_samples 100 --max_val_samples 100

and both codes resulted in this Error:

_2022-04-24 19:02:16.824403: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-04-24 19:02:16.824670: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_mgb3.py", line 523, in
main()
File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_mgb3.py", line 263, in main
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 71, in init
self._add_dataclass_arguments(dtype)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 166, in _add_dataclass_arguments
self._parse_dataclass_field(parser, field)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\transformers\hf_argparser.py", line 137, in _parse_dataclass_field
parser.add_argument(field_name, **kwargs)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1440, in add_argument
return self._add_action(action)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1805, in _add_action
self._optionals._add_action(action)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1642, in _add_action
action = super(_ArgumentGroup, self)._add_action(action)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1454, in _add_action
self._check_conflict(action)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1591, in _check_conflict
conflict_handler(action, confl_optionals)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3312.0_x64__qbz5n2kfra8p0\lib\argparse.py", line 1600, in handle_conflict_error
raise ArgumentError(action, message % conflict_string)
argparse.ArgumentError: argument --gradient_checkpointing: conflicting option string: --gradient_checkpointing

I reproduced the error by running it on another machine and still got it.
Any suggestions to how to fix it?

How to capture voice from audio device

Hi,
I had a look at how to get text from an audio file, but did not get his to extract the voice directly from the audio device through specking. i.e. without saving the voice into a wave file

why arabic tts doesn't work for some arabic test samples?

thanks for this awesome work,
i was using this notebook https://github.com/ARBML/klaam/blob/main/notebooks/demo.ipynb to test some samples.

this is the code i tried :

from klaam import TextToSpeech
from IPython.display import Audio

root_path = "./"
prepare_tts_model_path = "./cfgs/FastSpeech2/config/Arabic/preprocess.yaml"
model_config_path = "./cfgs/FastSpeech2/config/Arabic/model.yaml"
train_config_path = "./cfgs/FastSpeech2/config/Arabic/train.yaml"
vocoder_config_path = "./cfgs/FastSpeech2/model_config/hifigan/config.json"
speaker_pre_trained_path = "./data/model_weights/hifigan/generator_universal.pth.tar"

model = TextToSpeech(prepare_tts_model_path, model_config_path, train_config_path, vocoder_config_path, speaker_pre_trained_path,root_path)

text = 'وہ ابو بکر کو صلاہ کی رہنمائی کیا جاتا ہے ہمارے لئے یہ ایک بڑی سوال ہے۔ بہت سوال ہے۔ یہ ایک قیمتی سوال ہے جو کسی کو صلاہ کی رہنمائی کیا جاتا ہے جب وہ زندگی ہے اور وہ مسجد میں ہے اور وہ کماند ہے اور وہ کہتا ہے اللہ اور اس کی رسول کو کوئی باقر سے درمی نہیں دے اور جب وہ ابو بکر کو نہیں جانتے ہیں اور امر کو بھی جانتا ہے۔'
model.synthesize(text)
Audio("sample.wav")

and i get this error :


Downloading...
From: https://drive.google.com/uc?id=1J7ZP_q-6mryXUhZ-8j9-RIItz2nJGOIX
To: /content/klaam/model.pth.tar
100%|██████████| 418M/418M [00:06<00:00, 62.4MB/s]
Removing weight norm...
skipped
['b', 'r']
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-6-a135184f6c06>](https://localhost:8080/#) in <module>
     12 
     13 text = 'وہ ابو بکر کو صلاہ کی رہنمائی کیا جاتا ہے ہمارے لئے یہ ایک بڑی سوال ہے۔ بہت سوال ہے۔ یہ ایک قیمتی سوال ہے جو کسی کو صلاہ کی رہنمائی کیا جاتا ہے جب وہ زندگی ہے اور وہ مسجد میں ہے اور وہ کماند ہے اور وہ کہتا ہے اللہ اور اس کی رسول کو کوئی باقر سے درمی نہیں دے اور جب وہ ابو بکر کو نہیں جانتے ہیں اور امر کو بھی جانتا ہے۔'
---> 14 model.synthesize(text)
     15 Audio("sample.wav")

3 frames
[/content/klaam/klaam/external/FastSpeech2/phonetise/phonetise_arabic.py](https://localhost:8080/#) in phonetise(text)
    612                 for pronunciation in pronunciations:
    613                     stressIndex = findStressIndex(pronunciation)
--> 614                     if stressIndex < len(pronunciation) and stressIndex != -1:
    615                         pronunciation[stressIndex] += "'"
    616                     else:

TypeError: '<' not supported between instances of 'str' and 'int'


instead of throwing errors,i expected arabic tts to discard unknown chars automatically like this tts : https://tts.readthedocs.io/en/latest/

can you suggest me any arabic text processing technique that i can use before doing model.synthesize(text) everytime so that the model doesn't throw error like TypeError: '<' not supported between instances of 'str' and 'int' for arabic samples? thanks in advance.

assert batch_size * group_size < len(dataset) AssertionError when I train the model

hello everyone,

@zaidalyafeai @mustafa0x @elgeish @MagedSaeed

I tried to train the model in my dataset and this error comes out could you please help me this is Traceback (most recent call last):
File "/content/drive/MyDrive/FastSpeech2/train.py", line 198, in
main(args, configs)
File "/content/drive/MyDrive/FastSpeech2/train.py", line 32, in main
assert batch_size * group_size < len(dataset)
AssertionError

thank you

Missing file

Hello Mustafa & Ziad,

I have checked your awesome work, which is really helpful to me, but I have a question please ,,I am new to this field, so could you please share with me a good reference to understand the difference between hifi-GAN and Mel-GAN?, I have checked a lot of references over the internet, but they were not that helpful!
Also I have another question related to the vocoder and speaker used, when I tried different combinations I have listened and was able to know that the vocoder HiFi-GAN and the speaker universal is the best combination,,but when I tried combination LJSpeech & HIFI-GAN , I received error that the file generator_LJSpeech.pth.tar does not exist, and when I checked the files and the code, I can see the code points to this directoryFastSpeech2/hifigan/generator_LJSpeech.pth.tar but , this file does not exist "generator_LJSpeech.pth.tar"

Error opening training file, File contains data in an unknown format.

Hi Ziad,
I tried running this script that is available in the readme file to the train the MSA model:

python run_common_voice.py --model_name_or_path="facebook/wav2vec2-large-xlsr-53" --dataset_config_name="ar" --output_dir=/path/to/output/ --cache_dir=/path/to/cache --overwrite_output_dir="yes" --num_train_epochs="1" --per_device_train_batch_size="32" --per_device_eval_batch_size="32" --evaluation_strategy="steps" --learning_rate="3e-4" --warmup_steps="500" --fp16="no" --freeze_feature_extractor="yes" --save_steps="10" --eval_steps="10" --save_total_limit="1" --logging_steps="10" --group_by_length="no" --feat_proj_dropout="0.0" --layerdrop="0.1" --do_train="yes" --do_eval="yes" --max_train_samples 100 --max_val_samples 100

And I got this message:

_Traceback (most recent call last):
File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 511, in
main()
File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 400, in main
train_dataset = train_dataset.map(
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 1955, in map
return self._map_single(
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 520, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 487, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\fingerprint.py", line 458, in wrapper
out = func(self, *args, **kwargs)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 2320, in map_single
example = apply_function_on_filtered_inputs(example, i, offset=offset)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 2220, in apply_function_on_filtered_inputs
processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\datasets\arrow_dataset.py", line 1915, in decorated
result = f(decorated_item, *args, **kwargs)
File "C:\Users\user\PycharmProjects\pythonProject1\klaam\run_common_voice.py", line 394, in speech_file_to_array_fn
speech_array, sampling_rate = torchaudio.load(batch["path"])
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\soundfile_backend.py", line 197, in load
with soundfile.SoundFile(filepath, "r") as file
:
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 1183, in _open
_error_check(_snd.sf_error(file_ptr),
File "C:\Users\user\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\soundfile.py", line 1357, in error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening '/path/to/cache\downloads\extracted\31455a499a0212b1751dd0c1547b0d360037f6a8c0a69178647a45a577d0ff67\cv-corpus-6.1-2020-12-11/ar/clips/common_voice_ar_19225971.mp3': File contains data in an unknown format
.

I think the reason behind it is that the training files are in .mp3 instead of .wav
Any suggestions to how I can tackle this problem?

Add license

Maybe it's better to add a license to the repo?

ASR transcribe() works only for the first 8 seconds

transcribe works for the first 8 seconds of the audio only

meaning if the text should've been:

........ السلام عليكم ورحمة الله وبركاته ......
the ASR outputs:
........ السلام عليكم ورحمة

assuming the 8 second mark is right after ورحمة

Speech Recognition Error

Speech Recognition

OSError: Can't load config for 'Zaid/wav2vec2-large-xlsr-dialect-classification'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Zaid/wav2vec2-large-xlsr-dialect-classification' is the correct path to a directory containing a config.json file

[bug] conda: command not found

installing on a python pip environment that was not installed via conda throws an error

logs :

Setting conda init...
./install.sh: line 40: conda: command not found
./install.sh: line 41: /etc/profile.d/conda.sh: No such file or directory
Setting environment... (envs/environment.yml)
./install.sh: line 48: conda: command not found
Activating environment... (klaam)
./install.sh: line 51: conda: command not found
Upgrading pip...
./install.sh: line 55: python: command not found
Updating poetry config...
./install.sh: line 59: python: command not found
Installing dependencies using poetry...
./install.sh: line 62: python: command not found

[Error] Module installation error while running in Colab.

I have tested the klaam/notebook/demo.ipynb file on Google Colab. It raised an error of missing modules.

Error Message:
klaamissue

When I install the missing modules manually using !pip install <module-name> it works well.
So, I think there is a problem with the !pip install -r requirements.txt.

I printed the output of the installation in a separate file:
klaamissue2

After some search, I wasn't able to solve the problem. I will be thankful if you can advise.

Timestamps

Thank you for this work -- شكرا! I tested this briefly and found the results to be quite good. Is there any way to get time-stamped results? (My use case is forced alignment)

[Proposal] Codebase refactoring

To organize the code and introduce testing and continuous integration, it would be beneficial to refactor the entire codebase.

TL;DR

  • Re-organizing the codebase to follow best practices and to introduce testing and continuous integration.
  • Separating logic to import the package as a separate module, scripts to localize scripts that were used for train/inference of the logic, notebooks to localize demos and simple scripts that were written as notebooks, and tests to test the logic
  • Adding GitHub Actions to test build, logic of the package, auto-generate docs, and to publish the package to pypi
  • Moving from pip and requirements.txt setup to conda for environment management and poetry for packages management. This will ease the development as the project scales.

Codebase refactoring

Mapping
file/dir action placement
FastSpeed2/* moved kaalm/external/FastSpeed2/*
dialect_speech_corpus moved klaam/speech_corpus/dialect.py
egy_speech_corpus moved klaam/speech_corpus/egy.py
mor_speech_corpus moved klaam/speech_corpus/mor.py
samples moved samples
.gitignore moved .gitignore
LICENSE moved LICENSE
README.md moved README.md
audio_utils.py moved klaam/utils/audio.py
demo.ipynb moved notebooks/demo.ipynb
demo_with_mic.ipynb moved notebooks/demo_with_mix.ipynb
inference.ipynb moved notebooks/inference.ipynb
klaam.py moved klaam/run.py
klaam_logo.PNG moved misc/klaam_logo.png
models.py moved klaam/models/wav2vec.py
processors.py moved klaam/processors/custom_wave2vec.py
requirements.txt removed  
run.sh moved scripts/run.sh
run_classifier.py moved scripts/run_classifier.py
run_common_voice.py moved scripts/run_common_voice.py
run_mgb3.py moved scripts/run_mgb3.py
run_mgb5.py moved scripts/run_mgb5.py
sample_run.sh moved scripts/sample_run.sh
utils.py moved klaam/utils/utils.py
  added docs
  added tests
  added .github
  added output
  added environment.yml
  added install.sh
  added mypi.ini
  added pyproject.toml
  added pytest.ini
  added ckpts
Tree Structure
root level 1 level2 description
.github     github stuff (e.g. github issue templates, github actions workflows, etc.)
  workflows    
    build.yml to test building of the package
    publish.yml to publish the package to pypi
    tests.yml to run tests
    docs.yml to generate documentation
klaam     the logic for the package
  utils    
    audio.py  
    utils.py  
  models    
    wav2vec.py  
  processors    
    wave2vec.py  
  external    
    FastSpeed2/*  
  speech_corpus    
    dialect.py  
    egy.py  
    mor.py  
  run.py    
notebooks      
  demo.ipynb    
  demo_with_mix.ipynb    
  inference.ipynb    
       
scripts     set of scripts to be used to train/evaluate or anything external from the logic of the package
  run.sh    
  run_classifier.py    
  run_common_voice.py    
  run_mgb3.py    
  run_mgb5.py    
  sample_run.sh    
tests     set of tests to test logics within klaam
  test_*.py    
  conftest.py    
misc      
  klaam_logo.png    
samples      
  demo.wav    
ckpts ...   checkpoints of pre-trained models that were downloaded
docs ...   documentation files
output ...    
environment.yml     conda environment definition
install.sh     installing script to setup conda environment and install dependecies using poetry
mypy.ini     pylint configuration
pyproject.toml     package definition and list of dependecies to be installed
pytest.ini     pytest configuration
LICENSE      
README.md      
.gitignore      

Environment/dependencies packages

  • conda is used to manage the environment and install essential libraries that are big/core to the package, e.g. TensorFlow, PyTorch, cudatools, etc.
  • poetry is used to manage dependencies and setup the package
  • pytest is used to enable unit/integration testing of the codebase

Commands

  • poetry add PACKAGE - to add a package (this will append to pyproject.toml)
    • If the package installation failed and couldn't find another way to add the package, then install it using conda and add to enviroment.yml manually. (leave a comment next to the line)
    • Check on the web for the right channels when install packages using conda
  • poetry install - to install the package (package_name)
  • pytest tests - to run all tests manually
  • pytest tests/TEST_PATH - to run a specific test file (check pytest documentation for more information)

Edit - added the following sections: env/dep packages and commands

A question

hello everyone ,

does your implementation used Fastspeech (2s ) or not ?

I just want to make sure , Thank you for your work

Error

When I run the training/final step, I get this error can you advise?
^CTraceback (most recent call last):
File "train.py", line 198, in
main(args, configs)
File "train.py", line 93, in main
nn.utils.clip_grad_norm_(model.parameters(), grad_clip_thresh)
File "/home/layan/.local/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 36, in clip_grad_norm_
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type).to(device) for p in parameters]), norm_type)
File "/home/layan/.local/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 36, in
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type).to(device) for p in parameters]), norm_type)
File "/home/layan/.local/lib/python3.6/site-packages/torch/functional.py", line 1293, in norm
return _VF.norm(input, p, dim=_dim, keepdim=keepdim) # type: ignore
File "/home/layan/.local/lib/python3.6/site-packages/torch/_VF.py", line 25, in getattr
def getattr(self, attr):

image

Error loading model

404 Client Error: Not Found for url: https://huggingface.co/Zaid/wav2vec2-large-xlsr-53-arabic-egyptian/resolve/main/tf_model.h5

OSError: Can't load weights for 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian'. Make sure that:

  • 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian' is a correct model identifier listed on 'https://huggingface.co/models'

  • or 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian' is the correct path to a directory containing a file named one of pytorch_model.bin, tf_model.h5, model.ckpt.

these 2 errors appears as i am running however i modified the code from :
if lang == 'egy':
model_dir = 'Zaid/wav2vec2-large-xlsr-53-arabic-egyptian'
elif lang == 'msa':
model_dir = 'elgeish/wav2vec2-large-xlsr-53-arabic'

to :

if lang == "egy":
model_dir = Wav2Vec2ForCTC.from_pretrained("Zaid/wav2vec2-large-xlsr-53-arabic-egyptian")
elif lang == "msa":
model_dir = Wav2Vec2ForCTC.from_pretrained("elgeish/wav2vec2-large-xlsr-53-arabic")
self.bw = True

as its written in the hugging face site but still not working . Thanks in advance

Sampling rate modifications

Hello @zaidalyafeai

For our bachelor thesis a friend and me started working on dialect classifcation a while ago, now we came across your repo and your working with the same corpus as we did. We want to investigate how the length of the provided samples is influencing the trained classifier when using wav2vec-xlsr as the base-model.

After some investigation of your code, we were wondering why you just read the first 20 seconds of each file. Is this not somewhat contraproductive? As we are losing a lot of training data trough that?

 def speech_file_to_array_fn(batch):
        start = 0 
        stop = 20 
        srate = 16_000
        speech_array, sampling_rate = sf.read(batch["file"], start = start * srate , stop = stop * srate)
        batch["speech"] = librosa.resample(np.asarray(speech_array), sampling_rate, srate)
        batch["sampling_rate"] = srate
        batch["parent"] = batch["label"]
        return batch

Did you preprocess your data cutted into smaller pieces so each is max. 20seconds long? Or is it possible to read the whole files in so and generate our batches according to the length of each file. As the whole thing is not quite straight forward to implement.

FileNotFoundError: [Errno 2] No such file or directory: 'model.pth.tar'

inference worked few days ago but not working anymore because of broken googledrive weight file links.

Access denied with the following error:

Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses. 

You may still be able to access the file from the browser:

 https://drive.google.com/uc?id=1J7ZP_q-6mryXUhZ-8j9-RIItz2nJGOIX 

FileNotFoundError Traceback (most recent call last)
in
49 speaker_pre_trained_path = "./klaam/data/model_weights/hifigan/generator_universal.pth.tar"
50
---> 51 ar_model = TextToSpeech(prepare_tts_model_path, model_config_path, train_config_path, vocoder_config_path, speaker_pre_trained_path,root_path)
52
53

5 frames
/content/./klaam/klaam/run.py in init(self, prepare_tts_model_path, model_config_path, train_config_path, vocoder_config_path, speaker_pre_trained_path, root_path)
67 self.vocoder_config_path = vocoder_config_path
68 self.speaker_pre_trained_path = speaker_pre_trained_path
---> 69 self.model, self.vocoder, self.configs = prepare_tts_model(
70 self.configs, self.vocoder_config_path, self.speaker_pre_trained_path
71 )

/content/./klaam/klaam/external/FastSpeech2/inference.py in prepare_tts_model(configs, vocoder_config_path, speaker_pre_trained_path)
57
58 # Get model
---> 59 model = get_model_inference(configs, DEVICE, train=False)
60
61 # Load vocoder

/content/./klaam/klaam/external/FastSpeech2/utils/model.py in get_model_inference(configs, device, train)
42 if not os.path.exists(ckpt_path):
43 gdown.download(url, ckpt_path, quiet=False)
---> 44 ckpt = torch.load(ckpt_path, map_location=torch.device("cpu"))
45 model.load_state_dict(ckpt["model"])
46

/usr/local/lib/python3.8/dist-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
697 pickle_load_args['encoding'] = 'utf-8'
698
--> 699 with _open_file_like(f, 'rb') as opened_file:
700 if _is_zipfile(opened_file):
701 # The zipfile reader is going to advance the current file position.

/usr/local/lib/python3.8/dist-packages/torch/serialization.py in _open_file_like(name_or_buffer, mode)
228 def _open_file_like(name_or_buffer, mode):
229 if _is_path(name_or_buffer):
--> 230 return _open_file(name_or_buffer, mode)
231 else:
232 if 'w' in mode:

/usr/local/lib/python3.8/dist-packages/torch/serialization.py in init(self, name, mode)
209 class _open_file(_opener):
210 def init(self, name, mode):
--> 211 super(_open_file, self).init(open(name, mode))
212
213 def exit(self, *args):

FileNotFoundError: [Errno 2] No such file or directory: 'model.pth.tar'

Functionality to split/align audio segments for training

The audio in two of the datasets we are using (MGB3 and MGB5) come in long sequences of tens of minutes. This is impractical to use with any GPU for training. Longer sequences of audio will result in out of memory errors in GPUs even with a small batch size.

The solution is to split the audio into smaller audio segments of 15 to 30 seconds depending on the hardware used (GPU memory to a large extent).

This issue is to track adding a functionality to split the audio into smaller chunks that can fit into a GPU.

ASR outputs "v" instead of "ث"

I'm not sure where the problem occurs exactly, but this is the only letter, ث is always "v"

Perhaps check the vocabulary files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.