Coder Social home page Coder Social logo

biddwan09 / malaya Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mesolitica/malaya

0.0 1.0 0.0 90.59 MB

Natural-Language-Toolkit for bahasa Malaysia, https://malaya.readthedocs.io/

License: MIT License

Python 25.85% Jupyter Notebook 73.10% Shell 0.01% JavaScript 0.10% HTML 0.94% Dockerfile 0.01%

malaya's Introduction

logo

Pypi version Python3 version MIT License Documentation total stats download stats / month total stats download stats / month


Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya

GPU version

$ pip install malaya-gpu

Only Python 3.6.x and above and Tensorflow 1.10 and above but not 2.0 are supported.

Features

  • Augmentation

    Augment any text using dictionary of synonym, Wordvector or Transformer-Bahasa.

  • Dependency Parsing

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Emotion Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Entities Recognition

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Generator

    Generate any texts given a context using T5-Bahasa, GPT2-Bahasa or Transformer-Bahasa.

  • Keyword Extraction

    Provide RAKE, TextRank and Attention Mechanism hybrid with Transformer-Bahasa.

  • Language Detection

    using Fast-text and Sparse Deep learning Model to classify Malay (formal and social media), Indonesia (formal and social media), Rojak language and Manglish.

  • Normalizer

    using local Malaysia NLP researches hybrid with Transformer-Bahasa to normalize any bahasa texts.

  • Num2Word

    Convert from numbers to cardinal or ordinal representation.

  • Paraphrase

    Provide Abstractive Paraphrase using T5-Bahasa and Transformer-Bahasa.

  • Part-of-Speech Recognition

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Relevancy Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Sentiment Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Similarity

    Using deep Encoder, Doc2Vec, BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa and ALXLNET-base-bahasa to build deep semantic similarity models.

  • Spell Correction

    Using local Malaysia NLP researches hybrid with Transformer-Bahasa to auto-correct any bahasa words.

  • Stemmer

    Using BPE LSTM Seq2Seq with attention state-of-art to do Bahasa stemming.

  • Subjectivity Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Summarization

    Provide Abstractive T5-Bahasa also Extractive interface using Transformer-Bahasa, skip-thought, LDA, LSA and Doc2Vec.

  • Topic Modelling

    Provide Transformer-Bahasa, LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.

  • Toxicity Analysis

    Transfer learning on BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa.

  • Transformer

    Provide easy interface to load BERT-base-bahasa, Tiny-BERT-bahasa, Albert-base-bahasa, Albert-tiny-bahasa, XLNET-base-bahasa, ALXLNET-base-bahasa, ELECTRA-base-bahasa and ELECTRA-small-bahasa.

  • Word2Num

    Convert from cardinal or ordinal representation to numbers.

  • Word2Vec

    Provide pretrained bahasa wikipedia and bahasa news Word2Vec, with easy interface and visualization.

  • Zero-shot classification

    Provide Zero-shot classification interface using Transformer-Bahasa to recognize texts without any labeled training data.

Pretrained Models

Malaya also released Bahasa pretrained models, simply check at Malaya/pretrained-model

Or can try use huggingface ๐Ÿค— Transformers library, https://huggingface.co/models?filter=malay

References

If you use our software for research, please cite:

@misc{Malaya, Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya}}
}

Acknowledgement

Thanks to Im Big, LigBlou, Mesolitica and KeyReply for sponsoring AWS, GCP and private cloud to train Malaya models.

logo

Contributing

Thank you for contributing this library, really helps a lot. Feel free to contact me to suggest me anything or want to contribute other kind of forms, we accept everything, not just code!

logo

License

License

malaya's People

Contributors

huseinzol05 avatar biranchi2018 avatar hemoely avatar heromeng avatar khursani8 avatar lantip avatar leowmjw avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.