Coder Social home page Coder Social logo

artefactory / nlpretext Goto Github PK

View Code? Open in Web Editor NEW
140.0 140.0 13.0 6.17 MB

All the goto functions you need to handle NLP use-cases, integrated in NLPretext

Home Page: https://artefactory.github.io/NLPretext/

License: Apache License 2.0

Python 98.90% Makefile 0.94% Dockerfile 0.16%

nlpretext's People

Contributors

amaleelhamri avatar benoitgoujon avatar brianlz avatar bruce-at-artefact avatar cedric-magnan avatar dependabot[bot] avatar griseau avatar hugovasselin avatar julesbertrand avatar kaislaribi avatar louisrdsc avatar pymousse avatar rafaelleaygalenq avatar rdoume avatar sacha-lasry avatar tkumar19088 avatar wil2210 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nlpretext's Issues

Spacy v2.1 Compatibility

Spacy just realeased v2.1, including training of language model (Bert,UMLFIT like) & 3x faster Tokenization.

Check if no breaking change + make api for pre-training.

Re-write gensim's MALLET wrapper

  • malletmodel2ldamodel —> de la merde
  • Créer une variable d’environnement os.environ[‘MALLET_PATH’]=‘/path/to/mallet/subfolder

os.environ['MALLET_PATH']='/home/shared/Allianz_topic_forcasting/bin/lib'
path2mallet = '$MALLET_PATH/mallet-2.0.8/bin/mallet'
ldamallet = gensim.models.wrappers.LdaMallet(path2mallet,corpus=corpus, num_topics=15, id2word=id2word, prefix='save')
ldamallet.save('monmodele')

Bien préciser le préfixe, et copier coller TOUS les fichiers avec le préfixe, plus le nom du modèle.
L’importer sur l’autre PC, dans ce PC, bien créer la var d’environnement, et

LdaMallet.load(‘/newpath’) avec tous les fichiers dedans.

Attention, il faut placer le fichier d’execution avec le modèle et ses composantes.

Ligne 317 https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/wrappers/ldamallet.py

(a mettre en haut) from gensim.models.wrappers import LdaMallet

Text Generation

(=comprendre la question, et comment y répondre)
==> BERT, GP2, DialogFlow

Main topic issue. Might be spitted into several issues if needed

Text Translation

Main topic issue. Might be spitted into several issues if needed.

Data loader

Test de load en UTF-8, si unicode error on fait un chardet.
Parametre encoding, param pour activer et désactiver chardet

Topic Modelling

  • Code/Library
  • Notebook Implementation
  • Notebook Grid Search (& Interpretation)
  • Add JAVA + Mallet to Dockerfile
  • Mallet train function
  • Mallet load function
  • README installation
  • README Theory
  • README Interpretation
  • Pyldavis Mallet

Ajout d'abstraction Corpus/Document

L'idée de cette abstraction est de créer une classe Document qui contient à la fois le raw text + le résultat des preprocessing.
L'abstraction Corpus étant à prendre littéralement comme un ensemble de document.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.