Coder Social home page Coder Social logo

ukitinu / markov-words Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 664 KB

Word generation using Markov chains and customisable data

License: The Unlicense

Shell 1.35% Java 98.59% Batchfile 0.06%
markov-chain word-generator picocli command-line-tool java17 graalvm

markov-words's People

Contributors

github-actions[bot] avatar ukitinu avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

truebat

markov-words's Issues

Logging

Log should be on file only, I think this is the only sensible option for a command-line tool.

Every command, with options, and every exception should be logged. I can't think of anything else of note at the moment.

Input validation

Things to validate:

  • dictionary name (only english letters, digits and dashes);
  • no underscores in alphabets;
  • think about any other characters I may want to exclude from alphabets;
  • dictionary description

... anything else?

Commands to implement

The syntax I'd like to have is markov-words command options... (using picocli subcommands).

  • read dict text/file: processes text to a dictionary.
  • list: lists existing dictionaries, options to show deleted too, or only those.
  • info dict: shows info about the dictionary.
  • delete dict: deletes a dictionary, option to delete permanently.
  • restore dict: restores a deleted dictionary.
  • rename dict new_name: renames a dictionary.
  • create dict alphabet: creates a new dictionary with the given alphabet.
  • write dict num=1 depth=3: generates random words from the given dictionary with the given options.

Word generation improvements

A few ideas on how to improve word generation.

Add word length data

At the moment, a word ends as soon as a word end char is written, regardless of its length. While this is consequence of the Markov property, it may also lead to the generation of silly words and is the reason of the write.max_length property. It could be a good idea to store data on word length (still not sure how). Its usage should be optional.

Chain words

Allow the user to chain multiple words, that is don't stop the word generation at the first word end char created, but at the n-th, with n user-defined. This should be almost free as n-gram generation was already coded to support it.

Unit separator in gram files

A filename cannot contain � (unit separator char), so another solution has to be found.
Maybe I could go back to underscore as WORD_END and disallow most punctuation from alphabets (or at least that symbol)?

word_end 1-gram bug

In every dictionary I created, _.dat always has a _1; in its list of grams, which may cause the generation of empty words.
Although improbable, it may happen, especially in small dictionaries, so it should be fixed.

Release files

I should check how to automate the creation of artefacts to download.

The artefacts are:

  • a .tar.gz containing the release jar, the properties file and a simple bash script mkw that executes the jar (java -jar...) for ease of use;
  • as above, but in .zip format and with a batch instead of bash;
  • a Linux native executable as in #20.

External configuration file

The config file should allow to set the following values:

  • logging directory;
  • data path;
  • max depth;

... anything else?

FileRepo on the road to become a God Class

At the moment it's not showing, but with some test changes, PMD gave the following warning on FileRepo: Possible God Class (WMC=47, ATFD=17, TCC=22.794%).

It may be worth to look into reordering/spitting it.

Sample dictionaries

Throwing out a few ideas of sample dictionaries to provide.

  • Names from a book/series (Tolkien maybe?).
  • Names from a game (I was thinking about Morrowind, and all its racial names).

... more?

Add Windows and Mac-OS native images

For Mac-OS the problem is that the --static option of native-image is not supported, stops the workflow (don't know why as it should be ignored) and I have no environment to test whether the lack of static linking invalidates the executable.

For Windows, there is also a problem with the workflow. The step Get release name needs to be fixed, and I don't know (and care) enough about batch to fix it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.