Coder Social home page Coder Social logo

ukitinu / markov-words Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 664 KB

Word generation using Markov chains and customisable data

License: The Unlicense

Shell 1.35% Java 98.59% Batchfile 0.06%
markov-chain word-generator picocli command-line-tool java17 graalvm

markov-words's Introduction

Coverage Branches

Markov Words

Random word generator based on Markov chains, with "trainable" datasets.

This project was born out of my desire to have a reliable way to create words that sound kind of similar to others, where others could be a language, a dialect, the names of all the characters of Tolkien's Legendarium and so on.

I also took the opportunity to try out some of the new Java 17 features, Picocli, a CLI library for Java, and GraalVM, a JVM perfectly suited for Java CLIs.

The result is this little programme, which allows the users (via CLI) to define their own "dictionaries", each one with its own "alphabet" and its own n-grams, and to use them to generate words with a mechanism based on Markov chains.

Installation & requirements

Download the latest release for your platfom. There are two "types" of version, jar and nat-img.

  • The jar versions may be used via the bash/batch script provided in the archive. They require Java 17 to run. Despite the name, I think that the script provided in the linux-jar version should run on Mac-OS too, as it is basic shell, but I have no MacOS environment to check.
  • The nat-img version, native image, contains an executable built with GraalVM, I tested it with Java 17 and Java 11 and had no issues. As I had some troubles to create versions for all platforms, (see here), there is only a Linux (Ubuntu) native image.

The executable version is significantly faster: for simple tasks it is approximately 100 times faster (basically instantaneous), while it is "only" 10 times as fast during more computationally heavy tasks.

The first time the programme runs, it will create the default properties file and exit.

On Windows (I don't know if it also applies for Mac-OS) it is necessary to enable case sensitive file names for the data directory. I followed this guide, but it didn't work (I got a "request not supported" error or something like that). This step is necessary if you want to create new dictionaries on Windows.

Quick guide

Use mkw --help to see the list of available commands, while how to call them is explained using the CLI (use mkw help command), what follows here is more about the "big picture".

The user can create dictionaries, with a name and an alphabet. An alphabet is the set of symbols that will make up the dictionary's words. Most characters are allowed, apart from control codes, and the underscore which symbolises the end of a word and is a "reserved" character.
After creation, texts can be read to the dictionary. This will slowly build up the set of n-grams of the dictionary (where n goes from 1 to 3, more than that would be useless and would kill the filesystem, probably) that will later be used to write words.

The importance of a dictionary's alphabet is that, whenever a text is read, every character that is not in the alphabet is evaluated as a word end (_) and won't appear later during word generation.
It is possible to delete a dictionary and restore it later if the deletion is not permanent.
It is also possible to update name and/or description of a dictionary, to list the available dictionaries or to get the info about one.

I expect that most of these commands will go unused, apart from list to check the available dictionaries, create to create new ones, read to improve their accuracy, and write to generate words.

Pre-built dictionaries

Some sample dictionaries can be found in samples. They can be used immediately after being extracted to one's own data directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.