Comments (6)
If you're already successfully grabbing an image then a hacky way to put it together would be to just write some scripting to save the image to the external images folder and ensure that it's the only image there. Then just set pycasso to external mode and run it once you've finished your own script.
Play around with ext info or names in the image files to get the title/artist on the page working if you want that.
from pycasso.
Cool idea. It's definitely possible. A few notes:
System would have to always be powered and the idea behind pycasso is to run and complete as quickly as possible, without staying on.
The solution that makes sense to me is:
- Create/use a separate program or script that listens for your microphone, and when it does hear something, it calls pycasso.
- pycasso needs to be updated to take a command line argument (and optional python arg if called in a python script) that overrides the random prompt with one provided to it. This should not be hard, just need to add an argument to be parsed and then based on its value, interrupt prompt generation with the argument.
I can handle 2 under this issue and create the functionality to override the prompt as I think this is in scope of this project and is functionality that others could use. As for 1 the microphone service, you'd have to implement a different project/library/program that allows this (probably loads of them for pis) and just modify it so it calls pycasso with args. That's something you'd have to do yourself as my implementation has no mic so I'd have no way to test if I wanted to.
from pycasso.
Thank you for responding so quickly! I have literally never worked with Raspberry Pi (or hardware at all), so I think for starters I'm going to try to replicate your project. I already have program that allows me to dictate a prompt through my computer's microphone, transcribe it with OpenAI's "whisper" model, and send the prompt to DALL-E to generate an image. In my mind, the Raspberry Pi would literally just run this program. How difficult would that be and what would it entail? You can see and run the code in my Github repo.
I'd just want a button connected via GPIO to the Raspberry Pi that, when held, allows you to dictate a prompt (also a red LED light indicator when it's held). When you let go of the button, it transcribes and sends it to DALLE for an image.
Another idea is to connect a small thumb-keyboard that will allow me to type a manual prompt instead of using my voice.
Any advice you have for me is greatly appreciated!
from pycasso.
I'm gonna try that once my Raspberry Pi 4 arrives. I imagine having a button that triggers the microphone is just a matter of connecting it and configuring the drivers?
from pycasso.
In case you wanted to see my code and test this out with your own, this is what I have so far. You are the first to see it since you inspired me, this is not on github yet and i have not shared it with anyone.
Credits to https://github.com/DevMiser/AI_Art_Frame for also inspiring me and helping me get started. A lot of this base code is his.
- The raspberry pi has a "wake word" of "Art Frame" that, when called, listens for your image prompt. Wake word is powered by Picovoice.ai's
pvporcupine
python library - The image is generated via dalle, and rendered on the inky display
- The first GPIO button A will render the same image but with a text overlay showing the prompt (this was inspiration from you, and a lot of your code helped me out with this piece)
- The second GPIO button will re-generate an image with the same prompt (in case you didn't like the first one dalle spit out)
- The third GPIO button will show a QR code that points to the image URL so you can download the image on your phone
Please let me know your thoughts and how I can improve this!
import datetime
import io
import openai
import os
import pvcobra
import pvleopard
import pvporcupine
import pyaudio
import random
import socket
import struct
import schedule
import sys
import threading
import time
import traceback
import urllib.request
import qrcode
from colorama import Fore, Style
from gpiozero import Button
from inky.auto import auto
from pathlib import Path
from openai import OpenAI
from PIL import Image,ImageDraw,ImageFont,ImageOps,ImageEnhance
from pvleopard import *
from pvrecorder import PvRecorder
from threading import Thread, Event
from time import sleep
import RPi.GPIO as GPIO
FONT_PATH = "Arial.ttf"
GPIO.setwarnings(False)
GPIO.setmode(GPIO.BCM)
led1_pin=4
GPIO.setup(led1_pin, GPIO.OUT)
GPIO.output(led1_pin, GPIO.LOW)
audio_stream = None
cobra = None
pa = None
porcupine = None
recorder = None
wav_file = None
display = auto()
openai.api_key = "OPENAIAPIKEYHERE"
pv_access_key= "PICOVOICEAPIKEY HERE"
client = OpenAI(api_key=openai.api_key)
global img, img_resized, prompt_full, display_text_overlay, last_image_url # Declare globals for image and prompt
prompt_full = "" # Initialize the prompt variable
last_image_url = None
Clear_list = ["Clear",
"Clear the screen",
"Clear the display",
"Clear the canvas",
"Delete",
"Clean",
"Clean the screen",
"Clean the display",
"Clean the canvas",
"Wipe",
"Wipe the screen",
"Wipe the display",
"Wipe the canvas",
"Erase",
"Erase the screen",
"Erase the display",
"Erase the canvas",
"Blank screen",
"Blank Display"]
def clean_screen():
cycles = 2
colours = (display.RED, display.BLACK, display.WHITE, display.CLEAN)
colour_names = (display.colour, "red", "black", "white", "clean")
img = Image.new("P", (display.WIDTH, display.HEIGHT))
for i in range(cycles):
print("Cleaning cycle %i\n" % (i + 1))
for j, c in enumerate(colours):
print("- updating with %s" % colour_names[j+1])
display.set_border(c)
for x in range(display.WIDTH):
for y in range(display.HEIGHT):
img.putpixel((x, y), c)
display.set_image(img)
display.show()
time.sleep(1)
print("\n")
print("Cleaning complete")
def current_time():
time_now = datetime.datetime.now()
formatted_time = time_now.strftime("%m-%d-%Y %I:%M %p\n")
print("The current date and time is:", formatted_time)
def dall_e3(prompt):
try:
response = client.images.generate(
model="dall-e-3",
prompt=prompt,
size="1024x1024",
quality="hd",
style="vivid",
n=1,
)
last_image_url = response.data[0].url
return last_image_url
except ConnectionResetError:
print("ConnectionResetError")
current_time()
def detect_silence():
cobra = pvcobra.create(access_key=pv_access_key)
silence_pa = pyaudio.PyAudio()
cobra_audio_stream = silence_pa.open(
rate=cobra.sample_rate,
channels=1,
format=pyaudio.paInt16,
input=True,
frames_per_buffer=cobra.frame_length)
last_voice_time = time.time()
while True:
cobra_pcm = cobra_audio_stream.read(cobra.frame_length)
cobra_pcm = struct.unpack_from("h" * cobra.frame_length, cobra_pcm)
if cobra.process(cobra_pcm) > 0.2:
last_voice_time = time.time()
else:
silence_duration = time.time() - last_voice_time
if silence_duration > 1.3:
print("End of request detected\n")
GPIO.output(led1_pin, GPIO.LOW)
cobra_audio_stream.stop_stream
cobra_audio_stream.close()
cobra.delete()
last_voice_time=None
break
def fade_leds(event):
pwm1 = GPIO.PWM(led1_pin, 200)
event.clear()
while not event.is_set():
pwm1.start(0)
for dc in range(0, 101, 5):
pwm1.ChangeDutyCycle(dc)
time.sleep(0.05)
time.sleep(0.75)
for dc in range(100, -1, -5):
pwm1.ChangeDutyCycle(dc)
time.sleep(0.05)
time.sleep(0.75)
def listen():
cobra = pvcobra.create(access_key=pv_access_key)
listen_pa = pyaudio.PyAudio()
listen_audio_stream = listen_pa.open(
rate=cobra.sample_rate,
channels=1,
format=pyaudio.paInt16,
input=True,
frames_per_buffer=cobra.frame_length)
print("Listening...")
while True:
listen_pcm = listen_audio_stream.read(cobra.frame_length)
listen_pcm = struct.unpack_from("h" * cobra.frame_length, listen_pcm)
if cobra.process(listen_pcm) > 0.3:
print("Voice detected")
listen_audio_stream.stop_stream
listen_audio_stream.close()
cobra.delete()
break
def refresh():
print("\nThe screen refreshes every day at midnight to help prevent burn-in\n")
current_time()
clean_screen()
sleep(5)
print("\nRe-rendering")
display.set_image(img_resized)
# display.set_border(display.BLACK)
display.show()
print("\nDone")
def refresh_schedule(event2):
schedule.every().day.at("00:00").do(refresh)
event2.clear()
while not event2.is_set():
schedule.run_pending()
sleep(1)
def wake_word():
porcupine = pvporcupine.create(keywords=["computer", "jarvis", "Art-Frame"],
access_key=pv_access_key,
sensitivities=[0.1, 0.1, 0.1], #from 0 to 1.0 - a higher number reduces the miss rate at the cost on increased false alarms
)
devnull = os.open(os.devnull, os.O_WRONLY)
old_stderr = os.dup(2)
sys.stderr.flush()
os.dup2(devnull, 2)
os.close(devnull)
wake_pa = pyaudio.PyAudio()
porcupine_audio_stream = wake_pa.open(
rate=porcupine.sample_rate,
channels=1,
format=pyaudio.paInt16,
input=True,
frames_per_buffer=porcupine.frame_length)
Detect = True
while Detect:
porcupine_pcm = porcupine_audio_stream.read(porcupine.frame_length)
porcupine_pcm = struct.unpack_from("h" * porcupine.frame_length, porcupine_pcm)
porcupine_keyword_index = porcupine.process(porcupine_pcm)
if porcupine_keyword_index >= 0:
GPIO.output(led1_pin, GPIO.HIGH)
print(Fore.GREEN + "\nWake word detected\n")
current_time()
print("What would you like me to render?\n")
porcupine_audio_stream.stop_stream
porcupine_audio_stream.close()
porcupine.delete()
os.dup2(old_stderr, 2)
os.close(old_stderr)
Detect = False
def add_text_overlay(image, prompt, text_color=(255, 255, 255)):
width, height = image.size
overlay_height = int(height * 0.1)
overlay = Image.new('RGBA', (width, overlay_height), (0, 0, 0, 128))
draw = ImageDraw.Draw(overlay)
script_dir = os.path.dirname(__file__)
absolute_font_path = os.path.join(script_dir, FONT_PATH)
if not os.path.isfile(absolute_font_path):
print("Error: Arial.ttf not found.")
return image
max_text_width = width * 0.8
capitalized_prompt = prompt.title().strip()
if not capitalized_prompt:
capitalized_prompt = "(No Text Entered)"
# Word Wrapping Logic
lines = []
current_line = ""
# Initial font size
font_size = 36
while True: # Loop to find the optimal font size
lines = [] # Reset lines for each font size iteration
current_line = ""
for word in capitalized_prompt.split():
test_line = current_line + " " + word
test_width = draw.textlength(test_line, font=ImageFont.truetype(absolute_font_path, font_size))
if test_width > max_text_width:
lines.append(current_line.strip())
current_line = word
else:
current_line = test_line
lines.append(current_line.strip())
# If text fits within overlay, we have the right font size
if len(lines) * font_size <= overlay_height:
break
# Otherwise, reduce font size
font_size -= 4
# Draw the wrapped text (with the final font size)
y = (overlay_height - len(lines) * font_size) // 2
for line in lines:
text_width = draw.textlength(line, font=ImageFont.truetype(absolute_font_path, font_size))
x = int((width - text_width) / 2)
draw.text((x, y), line, fill=text_color, font=ImageFont.truetype(absolute_font_path, font_size))
y += font_size
image.paste(overlay, (0, height - overlay_height), overlay)
return image
def generate_qr_code(url):
qr = qrcode.QRCode(
version=1,
error_correction=qrcode.constants.ERROR_CORRECT_L,
box_size=10,
border=4,
)
qr.add_data(url)
qr.make(fit=True)
qr_img = qr.make_image(fill_color="black", back_color="white")
qr_img = qr_img.resize((display.WIDTH, display.HEIGHT), Image.ANTIALIAS)
return qr_img
# Assuming Button C is wired to GPIO 16
button_c = Button(16)
displaying_qr = False # State to track what is currently displayed
def toggle_qr_display():
global displaying_qr, img_resized
if displaying_qr:
# Display the original image if QR code is currently displayed
display.set_image(img_resized)
displaying_qr = False
else:
# Display the QR code if the original image is currently displayed
if last_image_url:
qr_img = generate_qr_code(last_image_url)
display.set_image(qr_img)
displaying_qr = True
else:
print("No image URL available to generate QR code.")
display.show()
button_c.when_pressed = toggle_qr_display
def regenerate_image():
global img, img_resized, prompt_full
if prompt_full:
print("Regenerating image with prompt:", prompt_full)
image_url = dall_e3(prompt_full)
try:
raw_data = urllib.request.urlopen(image_url).read()
img = Image.open(io.BytesIO(raw_data))
img_resized = img.resize((600, 448), Image.ANTIALIAS)
display.set_image(img_resized)
display.show()
print("Done regenerating and displaying the new image.")
images_dir = Path('/home/evan/Documents/Python/AI_Art_Frame/images')
images_dir.mkdir(exist_ok=True)
filename = images_dir / 'generated_image.png'
counter = 1
while filename.exists():
counter += 1
filename = images_dir / f'generated_image_{counter}.png'
img.save(filename)
print("Image saved as:", filename)
except Exception as e:
print("Error downloading or displaying the new image:", str(e))
display_text_overlay = False
def toggle_overlay():
global img, img_resized, display_text_overlay
display_text_overlay = not display_text_overlay
if display_text_overlay:
img_with_overlay = add_text_overlay(img.copy(), prompt_full) # Use the current image
img_resized_with_overlay = img_with_overlay.resize((600, 448), Image.ANTIALIAS)
display.set_image(img_resized_with_overlay)
print("Rendering image with text overlay...")
else:
print("Rendering original image...")
display.set_image(img_resized)
display.show()
print("Done")
button_a = Button(5)
button_a.when_pressed = toggle_overlay
button_b = Button(6) # Define button B
button_b.when_pressed = regenerate_image
class Recorder(Thread):
def __init__(self):
super().__init__()
self._pcm = list()
self._is_recording = False
self._stop = False
def is_recording(self):
return self._is_recording
def run(self):
self._is_recording = True
recorder = PvRecorder(device_index=-1, frame_length=512)
recorder.start()
while not self._stop:
self._pcm.extend(recorder.read())
recorder.stop()
self._is_recording = False
def stop(self):
self._stop = True
while self._is_recording:
pass
return self._pcm
try:
o = create(
access_key=pv_access_key,
enable_automatic_punctuation=False,
)
event = threading.Event()
event2 = threading.Event()
def check_button():
global display_text_overlay
global button_pressed
while True:
if button_a.is_pressed:
if not button_pressed:
button_pressed = True
display_text_overlay = not display_text_overlay
toggle_overlay()
else:
button_pressed = False
time.sleep(0.1)
button_thread = threading.Thread(target=check_button)
button_thread.daemon = True
button_thread.start()
while True:
wake_word()
event2.set()
recorder = Recorder()
recorder.start()
listen()
detect_silence()
transcript, words = o.process(recorder.stop())
t_fade = threading.Thread(target=fade_leds, args=(event,))
t_fade.start()
recorder.stop()
if transcript not in Clear_list:
current_time()
prompt_full = transcript
print("You requested:", prompt_full)
print("\nCreating...")
image_url = dall_e3(prompt_full)
raw_data = urllib.request.urlopen(image_url).read()
img = Image.open(io.BytesIO(raw_data))
images_dir = Path('/home/evan/Documents/Python/AI_Art_Frame/images')
images_dir.mkdir(exist_ok=True)
filename = images_dir / 'generated_image.png'
counter = 1
while filename.exists():
counter += 1
filename = images_dir / f'generated_image_{counter}.png'
img.save(filename)
print("\nRendering original image...")
img_resized = img.resize((600, 448), Image.ANTIALIAS)
display.set_image(img_resized)
display.show()
print("\nDone")
event.set()
GPIO.output(led1_pin, GPIO.LOW)
sleep(2)
t_refresh = threading.Thread(target=refresh_schedule, args=(event2,))
t_refresh.start()
else:
print("Clearing the display...")
clean_screen()
event.set()
event2.set()
GPIO.output(led1_pin, GPIO.LOW)
print("\nDone")
except Exception as e:
print("Error:", e)
except ConnectionResetError:
print ("Reset Error")
current_time()
except KeyboardInterrupt:
exit()
from pycasso.
I think your code looks good (take this with a grain of salt because I'm no professional). Thanks for sharing it with me.
My recommendation is to do what I did and throw it up on github, maybe even do a little write up for some internet points and pop it up in the raspberry pi subreddit. If you're going to throw it up on github the only advice I'd give is abstract out your keys into another file so you can test easily and avoid posting your actual keys from a mistake when committing.
I will leave this issue open as it's overall a good idea and code wise there's quite a few good ideas that could be thrown into pycasso - however would take a bit of work and not having the tech it's harder for me to test.
from pycasso.
Related Issues (20)
- ModuleNotFoundError: No module named 'piblo' HOT 21
- Configuration Migrator
- Update Readme
- Installation on RasPi Zero GRPCIO fix not working HOT 1
- Rotating screen to portrait HOT 9
- Saved images are original 512x512 rather than in-filled picture HOT 5
- "Multiple" image on resultant .png file HOT 2
- A few minor things and a thank you - Bash script pycasso install HOT 14
- Fix up stable diffusion provider
- Add negative prompts for stable API
- New: Text read from file rather than image prompt HOT 4
- Override text wrapping HOT 1
- TFT-Screen HOT 1
- Don't post image to socials if it's the test image
- Error Building wheel for Pillow (PEP 517) HOT 2
- Generated images not working, test mode not working; External images working HOT 5
- Dalle Image generation not working HOT 5
- ModuleNotFoundError: No module named 'piblo' on install HOT 1
- multiple generation lines HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pycasso.