Coder Social home page Coder Social logo

crossword-generator's Introduction

MCTS Crossword Generator

This package provides a pure Python implementation for generating crosswords using Monte Carlo Tree Search (MCTS).

  • A good overview about the project can be found in this blog post.
  • The pip package can be found on PyPI

screenshot filled crossword screenshot filled crossword

Quickstart

A: Install package:

  1. Create and activate a virtual environment based on Python >= 3.8
  2. Install crossword_generator package:
pip install crossword-generator

B: Generate crossword with default settings:

You can generate a crossword without providing any arguments. This will fill a 4x5 layout without any black squares using words from an English dictionary.

To do so, activate your virtual environment and chose one of the following (equivalent) options:

  1. Call application directly:
crossword
  1. Execute package:
python -m crossword_generator
  1. Run the main function in a python shell or your own script:
>>> from crossword_generator import generate_crossword
>>> generate_crossword()

For the next examples I assume you are using the first option to interact with the package.

Examples

  • To get started and see which input formats are required, you can download some english (comma-separated) or german (semicolon-separated) sample data.
  • Let's assume you have downloaded the sample files into a directory called "crossword_input" inside your working directory.

A: Use your own layouts

  • In order to use your own layouts, you will need to set argument path_to_layout to a CSV file on your local machine.
  • the CSV file must have an index column and a header row
  • potential letters are marked with "_" (underscore)
  • black squares are marked with "" (empty)

Fill an empty 5x12 layout:

crossword --path_to_layout "crossword_input/layout_5_12_empty.csv"

Fill a prefilled 5x12 layout:

crossword --path_to_layout "crossword_input/layout_5_12_prefilled.csv"

Fill an entire NYT-style 15x15 layout:

crossword --path_to_layout "crossword_input/layout_15_15_empty.csv"

Of course, you can also provide arguments from within your code:

generate_crossword(
  path_to_layout="crossword_input/layout_15_15_empty.csv",
)

B: Use your own words

  • In order to use your own set of words, you will need to set argument path_to_words to a CSV file (or pattern of CSV files) on your local machine.
  • the CSV file(s) must contain a column named "answer" with the relevant words

Fill an empty 5x12 layout with words from a list

crossword --path_to_layout "crossword_input/layout_5_12_empty.csv" --path_to_words "crossword_input/sample_words.csv"

Again, you can do the same from within your code:

generate_crossword(
  path_to_layout="crossword_input/layout_5_12_empty.csv",
  path_to_words="crossword_input/sample_words.csv",
)

C: Other arguments you might want to play with:

  • num_rows & num_cols [int]
    • number of rows / columns the layout should have
    • will only be considered if path_to_layout is not specified
  • max_num_words [int]
    • limits the number of words to improve runtime
  • max_mcts_iterations [int]
    • sets the maximum number of MCTS iterations
    • can be increased to get a better solution or decreased to improve runtime
  • random_seed [int]
    • change the seed to obtain different filled crosswords
  • output_path [str]
    • if provided, save the final grid and a summary as CSV files into the provided directory

Modules

  • optimizer.py
    • script that contains the main function generate_crossword()
  • layout_handler.py
    • Provides the layout that will later be filled with words
    • NewLayoutHandler: creates a new layout from scratch
    • ExistingLayoutHandler: reads an existing layout from a CSV file
  • word_handler.py
    • Provides the words that will later be filled into the layout
    • DictionaryWordHandler: get words from NLTK corpus
    • FileWordHandler: read words from CSV files
  • state.py
    • Entry: class that represents the current state of one entry of the crossword
    • CrosswordState: class that represents the current state of whole crossword
  • tree_search.py
    • TreeNode: represents one node of the MCTS tree
    • MCTS: represents the whole MCTS tree and provides all necessary functionalities such as
      • Selection
      • Expansion
      • Simulation / Rollout
      • Backpropagation

References & Dependencies

  • The MCTS implementation in tree_search.py is based on the algorithm provided by pbsinclair42, which I adapted in several ways:
    • Convert from 2-player to 1-player domain
    • Adjust reward function + exploration term
    • Add additional methods to analyze the game tree
    • Use PEP 8 code style
  • Have a look at pyproject.toml for a list of all required and optional dependencies
  • Python >= 3.8
  • Required packages
    • nltk>=3.5
    • pandas>=1.4.0
    • numpy>=1.22.0
    • tqdm>=4.41.0

Future work

  • Add a python module that creates questions for given answers using NLP techniques
  • Add a graphical user interface (GUI)

crossword-generator's People

Contributors

jonas-schumacher avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.