Coder Social home page Coder Social logo

phoenix10.1's Introduction

Phoenix10.1

Coverage Status

Logo of Phoenix10.1 - it is a Phoenix in the sky, disney style

Phoenix10.1 is a software to generate personalized pre-recorded internet radios that has a text-to-speech based radio jockey.

Here's a demo to understand what it sounds like!

Screen Shot 2022-11-27 at 7 14 56 PM

What can it do?

This radio jockey is capable of playing your favorite songs, including tracks from your preferred artist, genre, or Billboard chart. It can automatically discover and play fascinating clips from your preferred podcasts, provide weather updates, and deliver daily news.

For a more authentic radio experience, it brightens up your day with fictional company ads, conducts daily QnA with the audience, and shares interesting "On this day..." facts.

Installation

It is recommended to use Python 3.10 or newer to run the code.

Quick Start

If you're using a Debian-based distribution, you can install all dependencies using install.sh:

sh install.sh

Manual

This software requires ffmpeg and espeak. To install them on MacOS:

brew install ffmpeg espeak

To install them on Linux (Debain-based):

sudo apt-get install ffmpeg espeak

For Windows users, to setup and use ffmpeg, follow this guide from Stack Exchange. Moreover, to setup espeak, use this tutorial from Stack Overflow.

To install the Python dependencies, use:

pip3 install -r ./requirements.txt

The software also requires installing punkt from nltk. In a Python shell, use the following code to install punkt:

import nltk
nltk.download("punkt")

To generate TTS (text-to-speech), Coqui-ai's vits model is used. We recommend running a generic TTS command on your shell as this will prompt TTS to automatically install the vits model.

tts --text "I am excited to demo Phoenix ten point one" --model_name tts_models/en/vctk/vits --speaker_idx p267 --out_path temp.wav

The vits model requires around 150 MB of storage.

Creating your radio broadcast

To create your own radio, start by updating the default schema in ./data/schema.json.

Each action in schema.json is a list with two indices, one mentions the action and another mentions the characteristic of that action. Actions available are:

  • up
    • routine to start the radio broadcast
    • characteristic value is ignored
  • music
    • fetches and streams music
    • characteristic should contain list of song names
  • local-music
    • streams music using locally stored songs
    • characteristic should contain either:
      • list of paths to the audio files
      • list of lists of the format [album_path, num_of_songs]
  • music-artist
    • fetches and streams music (based on the artist names)
    • characteristic should contain list of lists of the format [artist_name, num_of_songs]
  • music-genre
  • music-billboard
    • fetches and streams music from Billboard charts
    • characteristic should contain list of lists of the format [chart_name, num_of_songs]
  • podcast
    • fetches an interesting clip from a podcast
    • characteristic should be a list of the format [podcast_rss_link, max_clip_duration_in_mins]
  • weather
    • broadcasts the weather
    • characteristic should contain city name. Use null to fetch weather using your IP address.
  • news
    • broadcasts the news using rss feeds. The rss feeds can be updated in ./data/rss.json.
    • characteristic should be a list of the format [category, num_of_news_items]
  • fun
    • broadcasts a On this day... fact
    • characteristic is ignored
  • end
    • routine to end the broadcast
    • characteristic is ignored
  • no-ads
    • removes fictional advertisements from the broadcast. This action should come before up
    • characteristic value is ignored
  • no-qna
    • removes the daily QnA from the broadcast. This action should come before up
    • characteristic value is ignored

Run

Once schema.json is configured, run the software using:

python3 radio.py

Your entire broadcast would be stored in a radio.mp3 file.

TTS configuration

You can modify the voice of the radio jockey, the name of your radio station/host, and the volume of the background music by editing the ./config.json file. To experiment with different voices, you can use Coqui-ai's vits model with the following command:

tts-server --model_name tts_models/en/vctk/vits

For advice on selecting the best voices, check out this discussion.

The volume of the background music can be adjusted between 0.1 and 2. A value of 0.1 will turn off the background music, while a value of 2 doubles its volume.

Contributing

We always welcome and greatly appreciate contributions! You can contribute in various ways, like by reporting and fixing bugs or suggesting and implementing new features. To start contributing, you can either submit a pull request or open an issue.

If you're submitting a pull request, please make sure to run pylint before submitting. Although it's not mandatory, performing unit tests on your code is highly encouraged.

To run the unit tests, use this command (from the root directory):

python3 -m coverage run --omit */site-packages/* -m unittest

We also recommend using mutation testing with mutmut. To execute mutmut, run this command (once again from the root directory):

mutmut run --paths-to-mutate ./radio.py --tests-dir ./tests/ --runner 'python3 -m unittest'

Bear in mind that mutation testing is a costly means of evaluating your test suite and can take several hours. So, only use this while suggesting a major change.

License

The code is open-sourced under the MIT License.

Acknowledgements

Every software stands on the shoulders of giants, and this is no different!

  • The authors would like to thank Coqui-ai and their work on TTS (licensed under Mozilla Public License 2.0).
  • The logic to generate random identities is from rig and the names database (fnames.txt, lnames.txt, locdata.txt) is from the US Census database.
  • The dataset in ./data/genres.csv is curated from the The Million Song Dataset.
    • The Million Song Dataset was created under a grant from the National Science Foundation, project IIS-0713334. The original data was contributed by The Echo Nest, as part of an NSF-sponsored GOALI collaboration. Subsequent donations from SecondHandSongs.com, musiXmatch.com, and last.fm, as well as further donations from The Echo Nest, are gratefully acknowledged.
  • The background music is Woke up this Morning Theme by Lobo Loco and is licensed under a Attribution-ShareAlike 4.0 International License.
  • The questions are from icebreakers which is licensed under MIT License. Some of the responses were manually curated from character.ai. From their TOS:
    • As to a user interacting with a Character created by another user or by Character AI, the user who elicits the Generations from a Character owns all rights in those Generations and grants to Character AI a nonexclusive, worldwide, royalty-free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use any Generations.
  • The fictional advertisement and intros/outros were curated using the gpt-3.5-turbo model. We followed the Sharing & Publication Policy of OpenAI and acknowledge that we have reviewed, edited, and revised the language of the content to our preference. We take ultimate responsibility for the content generated.
  • The Logo was generated using Open AI's Dall-E 2.

phoenix10.1's People

Contributors

ology avatar pncnmnp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

phoenix10.1's Issues

Add support for multiple voices

Many playtesters have asked for this feature. It is true that the current voice is quite a nice human voice, however, choices are personal. Having this feature seems quite useful - integrate different voices from coqui-ai's vits model.
Helpful: coqui-ai/TTS#1891

Woo! Excellent!

Donno how to chat with you, Parth. I just wanted to say that your improvements are fantastic. The tracks in my personal collection are named properly now, by the DJ. Woo!

-Gene

Fix the metadata mismatch

Sometimes, there can be a disparity between the song title/artist introduced by the radio host and the actual song that plays afterwards. This is due to the fact that song fetch and metadata fetch have separate methods. As a result, the search rankings outputted during the fetch actions differ, leading to the mismatch.

The song fetch logic (in the music method) relies on the song name and, potentially, an artist name. However, the metadata fetch depends solely on the song name.

To address this issue, we could potentially utilize the --add-metadata tag in youtube-dl. If this method fails to bring in the necessary metadata, we would then resort to the older method. I believe that this approach would significantly improve the matching accuracy.

It would also be nice to check if yt-mdl can automatically handle this for us within the music method.

Improve Logging

The logging system we currently have is quite difficult to read and debug. To address this issue, our first step would be to eliminate the unnecessary noise generated by Coqui-ai's TTS and FFmpeg.

However, it may be challenging to suppress TTS logs as they currently rely on print statements rather than the logging module. To solve this problem, we could use a method similar to the one outlined here: https://stackoverflow.com/a/45669280/7543474 - to remove any excess print statements.

While we don't want to overwhelm the user with too many logs, we still want them to have a clear understanding of the overall flow. Ideally, they should be aware of what is being generated and in what order.

Allow selection by generic genere types

In addition to the ability to set songs by searching song title and artist and song title together, it might offer some simplicity to the enduser to be able to specify genre and a number of songs, which would offer a bit of randomness to the process and make listening more of a surprise. There would need to be a way to record for lookup which songs were played previously in a log of some sort to avoid being bombarded with the same songs over and over again but it could add some variety, especially if it were possible to specify a song, then several songs in a genre before returning to a specific song or two.

TTS ERROR

Uploading ttx.JPG…
am getting an error trying to install TTS on windows

Adjust volume from schema

One of the playtesters had the following to say when the following:

I am curious, was the voice in the demo too hard to hear? Can you point out some specific issues with it, so that we can fix it.

It was a combination of my personal hearing issues, the lack of inflection on the bot's voice (though with that accent it kind of worked for it a bit because it seemed so cool and laid back in the demo,) and possibly the background music being a slight bit too loud for the voice and they sort of blended together a bit for me. After a while I simply could not focus on it.

Adjusting volume seems quite useful.

Bug: KeyError: 'itunes_author' when using podcast action

When we use the podcast action, at times we might get an error like this:

Traceback (most recent call last):
  File "radio.py", line 967, in <module>
    dialogue.flow()
  File "radio.py", line 739, in flow
    speech = self.podcast_dialogue(rss_feed)
  File "radio.py", line 489, in podcast_dialogue
    "You know, I love listening to podcasts. "
KeyError: 'itunes_author'

This happens because no 'itunes_author' metadata was provided by the RSS feed. We need a fallback condition here.

Customizing "ads"

It would be wonderful to be able to customize the fictional ads. I could see this as being linked to a to-do list or reminders lists, allowing someone to advertise important things to themselves. Alternatively, it could be advertising recently learned facts for greater knowledge absorption.

TTS ERROR

Uploading ttx.JPG…
am getting an error trying to install TTS on windows.

Add support for Artists

At the moment, only song names are supported in schema. A nice iteration would be to add support for artist names - kind of like Pandora.

Add support for local songs

There may be times when users have their songs stored locally. In these cases, they may not need to use youtube-dl to retrieve the songs again. It would be a good idea to add support for locally stored songs.

Ideally, this feature should allow for both the specific path name of the song and a directory name for retrieving a set of random songs. Perhaps an intelligent fetch could automatically detect artist names as well?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.