Coder Social home page Coder Social logo

glish's Introduction

Glish

Watch the video about this project: https://youtu.be/sRbcw2sGkJw

Interactive Demo Translator tool here: https://paralogical.dev/glish/

Goal: Make a version of English where every word is only one syllable

Inputs:

  • words by frequency (optimize monosyllabification for more common words) inputs/word_frequency.txt
  • words with pronunciations and split by syllables (CMU Dict syllablized) Note: multiple valid pronunciations for any given word, but all American english

Stages:

  • syllablize.ts → convert CMU dict to JSON mapping of word → IPA split by syllables
  • main.ts → load IPA syllables and generate new monosyllabic version of all words
  • sonorityGraph.ts → data structure that helps generate new syllables following sonority sequencing.
  • respellIPA.ts → convert IPA back into "readable" latin alphabet.

To run code to generate Glish language mapping,

  • ts-node syllablize.ts to generate outputs/syllablizedIPA.json + syllableGraph + big list of randomly generated syllables
  • ts-node main.ts to generate outputs/monosyllabic.json & other monosyllabic results

To run UI,

  • cd ui
  • npm run dev

glish's People

Contributors

paralogical avatar windowsfreak avatar realeatham avatar suhankins avatar

Stargazers

Mohammadreza Hendiani avatar Nathan Hadley avatar Maurits Wilke avatar Ali Khaleghi avatar Leandecks avatar  avatar Jônatas Davi Paganini avatar 机智的小鱼君 avatar 9glenda avatar Jane Jeon avatar Ukasah Satria Kusuma avatar Marvin avatar genix avatar Khoi Nguyen Tinh Song avatar  avatar Tim Kersey avatar Caidan Williams avatar Eigengrau avatar Abzac avatar Cole avatar  avatar Vaughan Rouesnel avatar 爱可可-爱生活 avatar Guilherme Paes avatar Nathaniel D Hendrix avatar  avatar Julian Vanecek avatar  avatar Pavel Pokutnev avatar Alessandro Iob avatar  avatar Ian Channing avatar saud avatar Márk Bartos avatar Justin Bennett avatar dzmitry-lahoda avatar Darkham avatar Ligin Vellakkad avatar Leo avatar Vikram Dutt avatar Christiaan B van Zyl avatar  avatar Jimmy Ruska avatar Pradeep Gowda avatar  avatar Sean Huber avatar Ethan Reece avatar Red avatar Jaden Arceneaux avatar Bjoern Rennhak avatar João avatar Cosmo avatar  avatar Viktor Svub avatar Ben Messer avatar dai avatar  avatar Divy Srivastava avatar WJH avatar Ari Seyhun avatar  avatar Nootan Ghimire avatar Wiz Lee avatar Aavash Shrestha avatar Pascal Honegger avatar Pigeon avatar  avatar Abhijit Saha avatar  avatar Lubomir Anastasov avatar mybearworld avatar Aaron Vegoda avatar Vadim Popov avatar Alexander Golovanov avatar Haolin Li avatar Yifan Tu avatar Paceaux avatar Tatsuya avatar Andrew Mobus avatar Zidgel avatar EM avatar Cynthia avatar Avighna avatar  avatar Benjamin Whelan avatar  avatar Mason Hall avatar Memory avatar  avatar Christopher Ptak avatar thekingofcity avatar  avatar 优秀的小杨同学 avatar  avatar  avatar  avatar  avatar Mad Scientist avatar  avatar Lin Tinusgrag avatar

Watchers

Paceaux avatar Aaron Mayzes avatar Isotarge avatar  avatar  avatar  avatar saxophone avatar MFn233 avatar Elua avatar Andrew Mobus avatar  avatar  avatar

glish's Issues

Excessive spaces when using "Copy monosyllabic" button

"The quick brown fox jumps over the lazy dog" is an English language sentence that contains all the letters of the alphabet. The phrase is commonly used for touch typing practice, testing typewriters and computer keyboards, displaying examples of fonts, and other applications involving text where the use of all letters in the alphabet is desired.

... becomes ...

"The quick brown fox jumps vohdh the bleelz dog" is an glish gwuj stes that ruktntz all the legtz of the feft. The phrase is ahmk used for touch geyept strahrs, tengst preyept and turkst krawrjdz, spahrngspst plizmdz of fonts, and udhd klarnz vlahrng text where the use of all legtz in the feft is deyerdzdzd.

Firefox 120.0.1 on Linux.

pneumonoultramicroscopicsilicovolcanoconiosis is unknown

"pneumonoultramicroscopicsilicovolcanoconiosis" is a famous word online for being long lol, so I guess the glish translator should recognize it?

also multiple weird stuff:
"bookstore" is awrftsk?
"syllable" is libs, which is already a word, and is already an internet slang word for library (mostly used by devs)
"title" is fleyetst, that sounds like german not english 😅
"issue" is shigsht
a coupla more weird stuff that ill post later

most weird of all:
"sonority" is red, so it's unknown?

Show output as IPA?

I struggle with the pronunciation of Glish especially with long words like pawrpstst or stweyedhd but also with words that have an ambiguous pronunciation for my written-english-to-sounds-translator in my head like zawng or veevd.

While it probably wouldn't make me able to pronounce pawrpstst, it would massively help if I had a way to see the pronunciation in an unambiguous alphabet like IPA.

Furthermore, from the video:

Frankly, I think this [IPA] is what the alphabet really should be.

Is it possible to add other languages?

Nice project!

Could you provide some guidance, for extending this approach for other languages?

I want to add support of my native language. What kind of data should I collect and how should it be processed to get it working? If it's not too hard, I would love to contribute.

Suggestion: Add a way for users to generate new Glish for unknown words

If we type an unknown word in the Glish translator, then clicking the red word in the translated area generates Glish for that word using the 4 steps described in the video and saves it.

However we should also check that it is indeed an English word:

Step 0) Is it English? (Try finding the word in an English dictionary) -> Return if false
Step 1) Is it already monosyllable? Then save it
Step 2) Try Generating from a syllable, then save it
Step 3) Try Generating from any syllable, then save it
Step 4) Try Generating from any sounds, then save it -> If this fails, then you have hit the limit of all available sound combinations.

Numbers are stripped from the converted text

Example

Input: "Hello, 1, 2, 3, world"
Expected Output: "lhuh, 1, 2, 3, world"
Actual Output: "lhuh, world"

I think it would better if numbers were left alone in the converted text, especially when it comes to copying the text.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.