Coder Social home page Coder Social logo

readittome's Introduction

ReadItToMe

Why use ReadItToMe rather than a screen reader? I built this tool for two major use cases.

  1. Reading research papers and large web content in a smart way (don't read ads, don't read menus, etc etc).
  2. Reading large forums and summarizing the findings, consensus, insights, etc.

In these cases, it blows a standard screenreader out of the water.

Features

  • Support for OpenAI models (GPT-3/GPT-4)
  • Support for Anthropic models (Claude-2/3)
  • Support for Ollama Models (Mistral, Llama2, etc)

Optional CLI usage

Specify Url

py main.py --url "https://example.com/page"

Specify a filename (to reuse the same file. By default one file per webpage is generated)

py main.py --fixed-filename "summary.mp3"

Specify a 'playlist' or file with multiple urls, one per line, to process. Can be combined with --silent and --download-only to setup a playlist for later listening.

py main.py --playlist \your\directory\playlist.txt

Save the AI generated summaries for later viewing

py main.py --save-summaries \output\dir

Flags

  • --silent (Don't vocalize the actions being performed)
  • --download-only (Only download the audio files, don't play them back (useful for bulk creating a playlist))

Example (Download a playlist for use in a media player)

py main.py --playlist C:\git\HNplaylist.txt --download-only --silent

Setup

  • Copy or Rename config.example.json to config.json
  • Add your keys for models. OpenAI key is required for OpenAi's natural text to speech which is the main feature of this app. (may support other platforms in the future)
  • Add your output directory - this is where audio files generated for playback will be stored
  • Add your selected model and model type for text summarization (openai, claude, ollama)

Technical Decisions

Disclaimer: I'm not a daily Python coder but ironically the core implementation is in Python via experimentation and backported to C# via Claude 3.0 and hand fixup.

  • Opted to use Pygame for audio playback in Python as it provided the most seamless user experience (other approaches required convoluted FFMPEG setup on Windows)
  • Opted for OpenAI's voice - I personally enjoy the natural way they sound including vocal mannerisms.

Practical Notes

  • MAX_RESPONSE_TOKENS has a very strong effect on how thorough or concise the summary is. At 720, you'll get a reasonable and detailed overview if the story is brief. I personally use 3072 since I use it for large stories or HackerNews threads. Expand this if you prefer a deeper dive - to the limits of your model. Of course, this has a direct effect on cost-per-query.
  • In general, requires models with 16k+ context sizes in order to be useful (GPT-3.5-turbo, GPT-4-Turbo, Claude-2, Claude-3)
  • Not all Ollama models support large context sizes.
  • In practice Mistral was passable but most small/medium models (7B or less) did poorly or required tweaking to deliver useful summaries. YMMV!
  • Claude-3 and GPT-4 did exceptionally well due to the large context sizes and recollect quality

Roadmap

  • Chromium and FF based plugins (investigating)
  • Better tested support for specific local models (ollama, oobabooga, or anything that supports the OpenAI api)
  • Support for multiple audio generation models

readittome's People

Contributors

jmoral4 avatar

Stargazers

Josh Sullivan avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.