Coder Social home page Coder Social logo

lovecraft_public_domain's Introduction

Lovecraft Public Domain

Public Domain short stories by HP Lovecraft in Markua format.

This repository contains the original stories which Lovecraft wrote as an adult. It begins in 1917 with “The Tomb” and ends in 1935 with his last original work “The Haunter of the Dark.” The book is ordered chronologically by the date the story was written. Because Lovecraft was a terrible businessman and left no heirs to his intellectual property, all of his works are already in the public domain.

Read the Book

You can read the book online at LeanPub.com. It is available for free, but you can choose to pay for it if you want to support the project. You can download the book in various formats (PDF, EPUB, HTML) from the same link.

If you want to compile the book yourself, you can clone this repository and run the ops/run_full.sh script. Though you will need to have docker (or podman) installed on your system. Note that the lay-outing, styling, and formatting of the book are done using the Leanpub platform, and the alternative compilation may not look as good or miss some parts of the content.

Contents

  • manuscript: Contains the stories in Markua format.
    • Book.md: The main file that includes all the stories to be included in the book.
  • raw_converts: Contains the stories in raw text format, as extracted from various sources.
  • ops: Contains a bunch of scripts to assist with the conversion process.

How to Contribute

Proofreading, fixing typos, and improving the formatting are all welcome. To contribute, please fork the repository, make your changes, and submit a pull request.

If you want to include a new story, please make sure it is in the public domain.

Technical Flow

For those interested in the approach taken to convert the stories from public websites to a published book, here is a brief overview:

  1. Extract the raw text from the public domain websites using simple scripting, and removing any HTML-tags.
    (a bunch of shell scripts, being creative with wget, cat, head, tail, sed,awk, etc.)
  2. Convert the raw text to Markua format.
    • This is mostly done manually, after the extraction.
  3. Categorize the stories into different files based on themes.
    • Define categories and subcategories.
    • AI-assisted categorization using OpenAI's GPT-4.
  4. Compile the book using Leanpub's platform (pulled automatically from this very repository).

The process is not fully automated, and there is a lot of manual work involved in cleaning up the text, formatting it, and categorizing the stories. This is because the quality of the text extracted from the public domain websites is not always perfect, and the stories are often split into multiple parts. Furthermore, the stories are not always in the correct order, and the metadata is often missing. Lovecraft had a habit of being expressive with his language, and the stories are often filled with archaic words, complex sentences, and creative formatting. These are extremely difficult to parse and convert to a structured format automatically.

Changes

  • version 1: Initial release with most of Lovecraft's public domain stories, structured in a single book.
  • current WIP: Expanding the book to include more stories and improve the structure (ordering stories by theme).

lovecraft_public_domain's People

Contributors

stijn-dejongh avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.