Coder Social home page Coder Social logo

grammar_ru's Introduction

Grammar_ru

Initially, grammar_ru was envisioned as a set of tools to correct grammar and style errors in Russian texts.

Since the beginning, a lot has changed.

Grammar_ru is now a module that governs a representation of language-independent texts in tabular format.

  • The texts are processed with tokenization and sentenization, placed in pandas DataFrames and stored in zip-files along with Table-Of-Contents, or toc-files that contains metadata about each dataframe.
  • These zip-files we call corpora.
  • Each word, sentence and paragraph receives its unique ID in the corpus
  • Relations can be placed in corpus, establishing the relations between fragments of texts (e.g. that the chapters from translation and original texts are in fact the same chapter).
  • Grammar_ru also allows you to apply featurizers, such as pymorphy, snowball, slovnet, etc.
  • Grammar_ru contains useful components to further convert such datasets in torch tensors (based on Training Grounds Framework)

Aside from grammar_ru, the repository contains a not-yet-working app_grammar_ru which is a docker app that actually checks errors in Russian texts. This app is to utilize existing python solutions (pyenchant), as well as ML models, trained in grammar_ru paradygm.

Finally, a creative articulator (ca) project is also temporarily hosted in this repo.

grammar_ru's People

Contributors

alexjackalope avatar bepuro avatar dabdya avatar dr1mzz avatar glebkochergin avatar holyprapor avatar invis166 avatar itsanastasiaminina avatar mixailkys avatar okulovsky avatar saddance avatar sergeypishchulov avatar squirrelwithavocado avatar yffins avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.