Software to automatically generate talks, presentations for PowerPoint and/or Keynote. Their main purpose is for the improvisational comedy format "Improvised TED talk", where the actors have to present an unseen presentation. This software can be extended to be used for any sort of presentation including for example pecha kucha, etc.
For a demo of this generator, please visit the online demo page, a platform created by Shaun Furragia to give easier access to this talk generator.
# Run the setup script from the command line
source setup.sh
Our program relies on certain APIs that require authentication in order to use it. Create a file named .env
(don't forget the period) in your project directory.
# Reddit Authentication
REDDIT_CLIENT_ID=
REDDIT_CLIENT_SECRET=
REDDIT_USER_AGENT="" #use quotes here
# Wikihow Authentication
WIKIHOW_USERNAME=
WIKIHOW_PASSWORD=
# OPTIONAL: If you want to save to Amazon S3, define your params here
AWS_TALK_BUCKET_KEY=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_DEFAULT_REGION=
Get your Reddit authentication keys by following these steps.
- Create a Reddit account
- Go to your App preferences
- Pressing "create app"
- Filling in a name and other details
- name: 'talk-generator', 'script', description: 'talk-generator', about url: 'https://github.com/korymath/talk-generator', redirect url: 'https://github.com/korymath/talk-generator'
- Open
.env
- Fill in the
REDDIT_CLIENT_ID
above using the id under the name of the app, theREDDIT_CLIENT_SECRET
using the text next tosecret
in the app card.
The REDDIT_USERAGENT
can be set to "python:https://github.com/korymath/talk-generator:v0.0.1 by /u/REDDIT_USERNAME)" and replace the REDDIT_USERNAME with your Reddit username.
You can create this file by following the next steps:
- Create a Wikihow account.
- Open
.env
- Fill in
WIKIHOW_USERNAME
with your username, andWIKIHOW_PASSWORD
with your password.
We require several NTLK packages, which can be downloaded by running the following code in to the Python console:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')
From the Reddit API documentation, it sounds like the 401 error is given when your client id/secret are incorrect. Are you using the correct values from the app page?
- Add parser for BeautifulSoup
sublime venv/lib/python3.6/site-packages/PyDictionary/utils.py
# change: return BeautifulSoup(requests.get(url).text)
# to: return BeautifulSoup(requests.get(url).text, 'lxml')
pip
might complain when installing the python-pptx
dependency due to a missing the lxml
dependency.
If this is not resolved automatically, visit this page.
On that page, select the right lxml
version for your platform and Python version (e.g. cp37
= Python 3.7).
In case installing the dependencies complain about Visual C++ version while resolving the python-pptx
dependency,
consider installing a version of Visual Studio.
If pip
complains about a missing mysql.h
, you need to pip install wheel
,
go to mysql wheel download to download the wheel
and run pip install mysqlclient-1.3.8-cp36-cp36m-win_amd64.whl
slaps the hood of the container Yep this bad boy runs on Docker.
Build the image, and tag it as talkgen.
docker build -t talkgen .
Run the image tagged as talkgen. The container /output directory maps to your current working directory.
docker run --env-file .env -v ``pwd``/output:/output talkgen run.py --open_ppt false
Reasonable defaults have been provided. To override, simply pass the command-line parameter. Here we are overriding the the topic and number of slides.
docker run --env-file .env -v ``pwd``/output:/output talkgen run.py --topic 'climate change' --num_slides 12 --open_ppt false
- be sure that open_ppt is false when running as a docker process.
python run.py --topic cat --num_slides 10
Argument | Description |
---|---|
topic |
The topic of the generator. This works best if it is a common, well-known noun |
num_slides |
The number of slides in the generated presentation (default: 10) |
schema |
The presentation schema to use when generating the presentation. Currently, only two modes are implemented, being default and test (for testing during development) |
presenter |
The name that will be present on the first slide. Leave blank for an automatically generated name |
output_folder |
The folder to output the generated presentations (default: ./output/ ) |
save_ppt |
If this flag is true(default), the generated powerpoint will be saved on the computer in the output_folder |
open_ppt |
If this flag is true (default), the generated powerpoint will automatically open after generating |
Run the generator as a microservice at 0.0.0.0:5687.
sh python run_web.py
You can then hit http://0.0.0.0:5687?topic=sometopic
. This will kick the main.py off.
In this section, we discuss the many parts of this software.
-
data/powerpoint/template.pptx
: This Powerpoint file contains the powerpoint presentation to start from. The interesting part of this file is when opening the model view, as you can edit the slide templates and their placeholders. -
slide_templates.py
: This Python module is responsible for filling in thetemplate.pptx
with values. There are also functions present which you can give as arguments functions that generates content when given apresentation_context
dictionary. It will then generate the content, and if certain conditions (e.g. originality) are satisfied, it will create and add a slide to the presentation.
-
PresentationSchema: This class controls the parameters of the powerpoint generator. It contains information about which slide generators to use, which slide topic generator and a dictionary
max_allowed_tags
that contains information about how many times slide generators with certain tags are allowed to generate in one presentation. We might add different presentation schemas for different types of presentation generators in the future. The PresentationSchema class can be found inpresentation_schema.py
. -
SlideGenerator: This class contains information about how to generate a particular type of slide. It holds a generating function for this purpose, as well as meta-data. For example, it contains a
weight_function
to calculate the chance of being used for a certain slide number, a name and tags for the generator, the allowed number of elements that have already been used in the presentation and the number of retries the slide generator is allowed to make in case it fails to generate a slide. The SlideGenerator class can be found inpresentation_schema.py
. -
SlideTopicGenerator: This is type of class that has a
generate_seed(slide_nr)
function, which generates a seed for the given slide number, which tends to be based on the topic of the presentation (as entered by the user). This slideseed
will then be given to a Slide Generator as an inspiration point for the content of the slide. There are several types of topic generators, such as a slide topic generator that just returns the presentation topic, one that gives synonyms and one that makes little side tracks using related concepts onConceptNet
-
presentation_context
: This is an object that is created by the Presentation Schema, containing information about the topic and presenter of the presentation, the used content and theseed
for the current slide
We wrote our own custom templated text generation language to easily generate texts. They're mostly based on Python's str.format
and Tracery, but come with some extra functionalities (see also language_util
)
The template files themselves are stored in /data/text-templates/*
On construction, this object is given the name of a file that contains a text template on a new line, usually in a .txt
file. Similar to the build-in str.format
function, these text templates can contain named variables between curly brackets {variable_name}
. Usually, the presentation_context
dictionary is used as an argument. This means that in these text generators {seed}
will be the variable containing the slide topic seed. This dictionary can also be extended before generating the text, such that more, custom variables are also possible.
A difference is that our custom language also provides some functions that can be easily called within the template. If a function returns None, or the variable is not present in the given dictionary, the text generator will keep retrying to generate until no templates are left. An example of such a function is {seed.plural.title}
, which will pluralise the seed, and then apply title casing.
Listed below are some possible functions in our text generation language. The up-to-date list of function can be found in text_generator.py
.
Function | Description |
---|---|
title |
Converts the string to title casing |
lower |
Converts the string to lower casing |
upper |
Converts the string to upper casing |
dashes |
Replaces the spaces of the string with dashes |
a |
Adds the "a" article, or "an" if the word starts with a vowel (except u) |
ing |
Converts a verb to the present participle |
plural |
Converts a noun to plural |
singular |
Converts a noun to singular |
synonym |
Converts the noun to a random synonym |
2_to_1_pronouns |
Changes 2nd person pronouns to 1st person pronouns in a sentence (e.g. you->my) |
wikihow_action |
Retrieves a random action based on the string using Wikihow |
get_last_noun_and_article |
Extracts the last noun from a sentence |
conceptnet_location |
Retrieves a location related to the string using Conceptnet |
is_noun |
Checks if the string can be a noun |
is_verb |
Checks if the string can be a verb |
Allows the same things TemplatedTextGenerator
does, but using a Tracery grammar. This means that the file is saved as a JSON file, and that local variables can be declared, for easily creating a large possibility space of possible texts.
-
random_util
: This module helps with dealing better with randomness. It has a function to deal with picking from a list with weighted elements, as well aschoice_optional(list)
, which is likerandom.choice
, except it returns None if the list is empty -
generator_util
: This module provides lots of utilities for creating generators. Since some content generators return (image or textual) lists, there are functions for converting them to normal single output generators. There are also methods for converting methods that only take a stringseed
as input to one that takes apresentation_context
, namelycreate_seeded_generator(generator)
. There are also more exotic generators such as weighted generators and walking generators. -
language_util
: Contains many language functionalities, such as converting to singular/plural, changing tense, checking part of speech, getting synonyms etc. -
scraper_util
: Provides some common functionalities for the page scrapers. -
os_util
: Contains some methods dealing with the operating system, such as saving and checking files. -
cache_util
: Contains a hashable dictionary class, which is necessary for caching certain functions.
There are a lot of different services providing content to our generator. Usually, the content scrapers below are used in run.py
to craft a real concent generator used in the slides generators.
chart.py
: Generates random powerpoint charts using text templates and random math functions.conceptnet.py
: Explores the graph of related concepts to certain seedsgoodreads.py
: Used for retrieving quotes related to a seedgoogle_images.py
: Used for retrieving relevant images for certain seeds (e.g. as background image)inspirobot.py
: Used for retrieving nonsensical quote imagesreddit.py
: Used for scraping reddit images, as there are many interesting subreddits to scrape images from.shitpostbot.py
: Used for retrieving "interesting"/weird imageswikihow.py
: Used for finding related actions to a certain seed.
Sometimes, certain content providers return a default image when no image is found for that url (usually when an image got deleted). These types of images are stored in our repository in data/images/prohibited/*
. This folder gets automatically scanned, and all images in the generated presentation are compared to images from this folder, to ensure that none gets added to the final presentation.
There are a lot of tests present in this repository. These .py
files are prefixed with test_
, and use the unittest
module. They can easily be run all together when using PyCharm by right clicking on talk-generator
and pressing Run 'Unittests in talk-generator'
source setup.sh
pytest
Test coverage is automatically handled by codecov
.
Tests are automatically run with CircleCI based on the .yml
file in the .circleci
directory.
This Talk Generator
is made by Kory Mathewson
and Thomas Winters,
with help from Shaun Farrugia, Piotr Mirowski and Julian Faid.
MIT License. Copyright (c) 2018 Kory Mathewson and Thomas Winters.