Coder Social home page Coder Social logo

Idea for prompt about pycasso HOT 6 OPEN

EA914 avatar EA914 commented on May 28, 2024
Idea for prompt

from pycasso.

Comments (6)

jezs00 avatar jezs00 commented on May 28, 2024 1

If you're already successfully grabbing an image then a hacky way to put it together would be to just write some scripting to save the image to the external images folder and ensure that it's the only image there. Then just set pycasso to external mode and run it once you've finished your own script.

Play around with ext info or names in the image files to get the title/artist on the page working if you want that.

from pycasso.

jezs00 avatar jezs00 commented on May 28, 2024

Cool idea. It's definitely possible. A few notes:

System would have to always be powered and the idea behind pycasso is to run and complete as quickly as possible, without staying on.

The solution that makes sense to me is:

  1. Create/use a separate program or script that listens for your microphone, and when it does hear something, it calls pycasso.
  2. pycasso needs to be updated to take a command line argument (and optional python arg if called in a python script) that overrides the random prompt with one provided to it. This should not be hard, just need to add an argument to be parsed and then based on its value, interrupt prompt generation with the argument.

I can handle 2 under this issue and create the functionality to override the prompt as I think this is in scope of this project and is functionality that others could use. As for 1 the microphone service, you'd have to implement a different project/library/program that allows this (probably loads of them for pis) and just modify it so it calls pycasso with args. That's something you'd have to do yourself as my implementation has no mic so I'd have no way to test if I wanted to.

from pycasso.

EA914 avatar EA914 commented on May 28, 2024

Thank you for responding so quickly! I have literally never worked with Raspberry Pi (or hardware at all), so I think for starters I'm going to try to replicate your project. I already have program that allows me to dictate a prompt through my computer's microphone, transcribe it with OpenAI's "whisper" model, and send the prompt to DALL-E to generate an image. In my mind, the Raspberry Pi would literally just run this program. How difficult would that be and what would it entail? You can see and run the code in my Github repo.

I'd just want a button connected via GPIO to the Raspberry Pi that, when held, allows you to dictate a prompt (also a red LED light indicator when it's held). When you let go of the button, it transcribes and sends it to DALLE for an image.

Another idea is to connect a small thumb-keyboard that will allow me to type a manual prompt instead of using my voice.

Any advice you have for me is greatly appreciated!

from pycasso.

EA914 avatar EA914 commented on May 28, 2024

I'm gonna try that once my Raspberry Pi 4 arrives. I imagine having a button that triggers the microphone is just a matter of connecting it and configuring the drivers?

from pycasso.

EA914 avatar EA914 commented on May 28, 2024

In case you wanted to see my code and test this out with your own, this is what I have so far. You are the first to see it since you inspired me, this is not on github yet and i have not shared it with anyone.

Credits to https://github.com/DevMiser/AI_Art_Frame for also inspiring me and helping me get started. A lot of this base code is his.

  1. The raspberry pi has a "wake word" of "Art Frame" that, when called, listens for your image prompt. Wake word is powered by Picovoice.ai's pvporcupine python library
  2. The image is generated via dalle, and rendered on the inky display
  3. The first GPIO button A will render the same image but with a text overlay showing the prompt (this was inspiration from you, and a lot of your code helped me out with this piece)
  4. The second GPIO button will re-generate an image with the same prompt (in case you didn't like the first one dalle spit out)
  5. The third GPIO button will show a QR code that points to the image URL so you can download the image on your phone

Please let me know your thoughts and how I can improve this!

import datetime
import io
import openai
import os
import pvcobra
import pvleopard
import pvporcupine
import pyaudio
import random
import socket
import struct
import schedule
import sys
import threading
import time
import traceback
import urllib.request
import qrcode

from colorama import Fore, Style
from gpiozero import Button
from inky.auto import auto
from pathlib import Path
from openai import OpenAI
from PIL import Image,ImageDraw,ImageFont,ImageOps,ImageEnhance
from pvleopard import *
from pvrecorder import PvRecorder
from threading import Thread, Event
from time import sleep

import RPi.GPIO as GPIO
FONT_PATH = "Arial.ttf"
GPIO.setwarnings(False)
GPIO.setmode(GPIO.BCM)
led1_pin=4
GPIO.setup(led1_pin, GPIO.OUT)
GPIO.output(led1_pin, GPIO.LOW)

audio_stream = None
cobra = None
pa = None
porcupine = None
recorder = None
wav_file = None

display = auto()

openai.api_key = "OPENAIAPIKEYHERE"
pv_access_key= "PICOVOICEAPIKEY HERE"

client = OpenAI(api_key=openai.api_key)

global img, img_resized, prompt_full, display_text_overlay, last_image_url	# Declare globals for image and prompt
prompt_full = ""  # Initialize the prompt variable
last_image_url = None

Clear_list = ["Clear",
	"Clear the screen",
	"Clear the display",
	"Clear the canvas",
	"Delete",
	"Clean",
	"Clean the screen",
	"Clean the display",
	"Clean the canvas",
	"Wipe",
	"Wipe the screen",
	"Wipe the display",
	"Wipe the canvas",
	"Erase",
	"Erase the screen",
	"Erase the display",
	"Erase the canvas",
	"Blank screen",
	"Blank Display"]

def clean_screen():
	cycles = 2
	colours = (display.RED, display.BLACK, display.WHITE, display.CLEAN)
	colour_names = (display.colour, "red", "black", "white", "clean")
	img = Image.new("P", (display.WIDTH, display.HEIGHT))
	for i in range(cycles):
		print("Cleaning cycle %i\n" % (i + 1))
		for j, c in enumerate(colours):
			print("- updating with %s" % colour_names[j+1])
			display.set_border(c)
			for x in range(display.WIDTH):
				for y in range(display.HEIGHT):
					img.putpixel((x, y), c)
			display.set_image(img)
			display.show()
			time.sleep(1)
		print("\n")
	print("Cleaning complete")

def current_time():

	time_now = datetime.datetime.now()
	formatted_time = time_now.strftime("%m-%d-%Y %I:%M %p\n")
	print("The current date and time is:", formatted_time)	 

def dall_e3(prompt):
	try:
		response = client.images.generate(
			model="dall-e-3",
			prompt=prompt,
			size="1024x1024",
			quality="hd",
			style="vivid",
			n=1,
		)
		last_image_url = response.data[0].url
		return last_image_url
	except ConnectionResetError:
		print("ConnectionResetError")
		current_time()

def detect_silence():

	cobra = pvcobra.create(access_key=pv_access_key)
	silence_pa = pyaudio.PyAudio()
	cobra_audio_stream = silence_pa.open(
					rate=cobra.sample_rate,
					channels=1,
					format=pyaudio.paInt16,
					input=True,
					frames_per_buffer=cobra.frame_length)
	last_voice_time = time.time()
	while True:
		cobra_pcm = cobra_audio_stream.read(cobra.frame_length)
		cobra_pcm = struct.unpack_from("h" * cobra.frame_length, cobra_pcm)			
		if cobra.process(cobra_pcm) > 0.2:
			last_voice_time = time.time()
		else:
			silence_duration = time.time() - last_voice_time
			if silence_duration > 1.3:
				print("End of request detected\n")
				GPIO.output(led1_pin, GPIO.LOW)
				cobra_audio_stream.stop_stream					
				cobra_audio_stream.close()
				cobra.delete()
				last_voice_time=None
				break

def fade_leds(event):

	pwm1 = GPIO.PWM(led1_pin, 200)

	event.clear()

	while not event.is_set():
		pwm1.start(0)
		for dc in range(0, 101, 5):
			pwm1.ChangeDutyCycle(dc)  
			time.sleep(0.05)
		time.sleep(0.75)
		for dc in range(100, -1, -5):
			pwm1.ChangeDutyCycle(dc)				
			time.sleep(0.05)
		time.sleep(0.75)
		
def listen():

	cobra = pvcobra.create(access_key=pv_access_key)
	listen_pa = pyaudio.PyAudio()
	listen_audio_stream = listen_pa.open(
				rate=cobra.sample_rate,
				channels=1,
				format=pyaudio.paInt16,
				input=True,
				frames_per_buffer=cobra.frame_length)
	print("Listening...")
	while True:
		listen_pcm = listen_audio_stream.read(cobra.frame_length)
		listen_pcm = struct.unpack_from("h" * cobra.frame_length, listen_pcm)
		if cobra.process(listen_pcm) > 0.3:
			print("Voice detected")
			listen_audio_stream.stop_stream
			listen_audio_stream.close()
			cobra.delete()
			break

def refresh():

	print("\nThe screen refreshes every day at midnight to help prevent burn-in\n")
	current_time()
	clean_screen()
	sleep(5)
	print("\nRe-rendering")
	display.set_image(img_resized)
#	 display.set_border(display.BLACK)
	display.show()
	print("\nDone")
	
def refresh_schedule(event2):

	schedule.every().day.at("00:00").do(refresh)
	event2.clear()
	while not event2.is_set():						 
		schedule.run_pending()
		sleep(1)
		
def wake_word():

	porcupine = pvporcupine.create(keywords=["computer", "jarvis", "Art-Frame"],
							access_key=pv_access_key,									 
							sensitivities=[0.1, 0.1, 0.1], #from 0 to 1.0 - a higher number reduces the miss rate at the cost on increased false alarms
									   )
	devnull = os.open(os.devnull, os.O_WRONLY)
	old_stderr = os.dup(2)
	sys.stderr.flush()
	os.dup2(devnull, 2)
	os.close(devnull) 
	wake_pa = pyaudio.PyAudio()
	porcupine_audio_stream = wake_pa.open(
					rate=porcupine.sample_rate,
					channels=1,
					format=pyaudio.paInt16,
					input=True,
					frames_per_buffer=porcupine.frame_length)
	Detect = True
	while Detect:
		porcupine_pcm = porcupine_audio_stream.read(porcupine.frame_length)
		porcupine_pcm = struct.unpack_from("h" * porcupine.frame_length, porcupine_pcm)
		porcupine_keyword_index = porcupine.process(porcupine_pcm)
		if porcupine_keyword_index >= 0:
			GPIO.output(led1_pin, GPIO.HIGH)
			print(Fore.GREEN + "\nWake word detected\n")
			current_time()
			print("What would you like me to render?\n")
			porcupine_audio_stream.stop_stream
			porcupine_audio_stream.close()
			porcupine.delete()			 
			os.dup2(old_stderr, 2)
			os.close(old_stderr)
			Detect = False

def add_text_overlay(image, prompt, text_color=(255, 255, 255)):
	width, height = image.size
	overlay_height = int(height * 0.1)	  
	overlay = Image.new('RGBA', (width, overlay_height), (0, 0, 0, 128)) 
	draw = ImageDraw.Draw(overlay)

	script_dir = os.path.dirname(__file__) 
	absolute_font_path = os.path.join(script_dir, FONT_PATH)

	if not os.path.isfile(absolute_font_path):
		print("Error: Arial.ttf not found.")
		return image

	max_text_width = width * 0.8 

	capitalized_prompt = prompt.title().strip() 
	if not capitalized_prompt:	  
		capitalized_prompt = "(No Text Entered)" 

	# Word Wrapping Logic 
	lines = [] 
	current_line = ""

	# Initial font size
	font_size = 36	  

	while True:		# Loop to find the optimal font size
		lines = []	  # Reset lines for each font size iteration
		current_line = ""
		for word in capitalized_prompt.split():
			test_line = current_line + " " + word
			test_width = draw.textlength(test_line, font=ImageFont.truetype(absolute_font_path, font_size)) 
			if test_width > max_text_width:
				lines.append(current_line.strip())
				current_line = word
			else:  
				current_line = test_line
		lines.append(current_line.strip()) 

		# If text fits within overlay, we have the right font size
		if len(lines) * font_size <= overlay_height: 
			break

		# Otherwise, reduce font size
		font_size -= 4	  

	# Draw the wrapped text (with the final font size)
	y = (overlay_height - len(lines) * font_size) // 2 

	for line in lines:
		text_width = draw.textlength(line, font=ImageFont.truetype(absolute_font_path, font_size))
		x = int((width - text_width) / 2)  
		draw.text((x, y), line, fill=text_color, font=ImageFont.truetype(absolute_font_path, font_size))
		y += font_size

	image.paste(overlay, (0, height - overlay_height), overlay)
	return image


def generate_qr_code(url):
	qr = qrcode.QRCode(
		version=1,
		error_correction=qrcode.constants.ERROR_CORRECT_L,
		box_size=10,
		border=4,
	)
	qr.add_data(url)
	qr.make(fit=True)
	qr_img = qr.make_image(fill_color="black", back_color="white")
	qr_img = qr_img.resize((display.WIDTH, display.HEIGHT), Image.ANTIALIAS)
	return qr_img

# Assuming Button C is wired to GPIO 16
button_c = Button(16)
displaying_qr = False  # State to track what is currently displayed


def toggle_qr_display():
	global displaying_qr, img_resized
	if displaying_qr:
		# Display the original image if QR code is currently displayed
		display.set_image(img_resized)
		displaying_qr = False
	else:
		# Display the QR code if the original image is currently displayed
		if last_image_url:
			qr_img = generate_qr_code(last_image_url)
			display.set_image(qr_img)
			displaying_qr = True
		else:
			print("No image URL available to generate QR code.")
	display.show()

button_c.when_pressed = toggle_qr_display

def regenerate_image():
	global img, img_resized, prompt_full
	if prompt_full:
		print("Regenerating image with prompt:", prompt_full)
		image_url = dall_e3(prompt_full)
		try:
			raw_data = urllib.request.urlopen(image_url).read()
			img = Image.open(io.BytesIO(raw_data))
			img_resized = img.resize((600, 448), Image.ANTIALIAS)
			display.set_image(img_resized)
			display.show()
			print("Done regenerating and displaying the new image.")
			
			images_dir = Path('/home/evan/Documents/Python/AI_Art_Frame/images')
			images_dir.mkdir(exist_ok=True)
			filename = images_dir / 'generated_image.png'
			counter = 1
			while filename.exists():
				counter += 1
				filename = images_dir / f'generated_image_{counter}.png'
			img.save(filename)
			
			print("Image saved as:", filename)
		except Exception as e:
			print("Error downloading or displaying the new image:", str(e))

display_text_overlay = False
def toggle_overlay():
	global img, img_resized, display_text_overlay
	display_text_overlay = not display_text_overlay
	if display_text_overlay:
		img_with_overlay = add_text_overlay(img.copy(), prompt_full)  # Use the current image
		img_resized_with_overlay = img_with_overlay.resize((600, 448), Image.ANTIALIAS)
		display.set_image(img_resized_with_overlay)
		print("Rendering image with text overlay...")
	else:
		print("Rendering original image...")
		display.set_image(img_resized)
	display.show()
	print("Done")

button_a = Button(5)
button_a.when_pressed = toggle_overlay

button_b = Button(6)  # Define button B
button_b.when_pressed = regenerate_image

class Recorder(Thread):

	def __init__(self):
		super().__init__()
		self._pcm = list()
		self._is_recording = False
		self._stop = False

	def is_recording(self):
		return self._is_recording

	def run(self):
		self._is_recording = True

		recorder = PvRecorder(device_index=-1, frame_length=512)
		recorder.start()

		while not self._stop:
			self._pcm.extend(recorder.read())
		recorder.stop()

		self._is_recording = False

	def stop(self):
		self._stop = True
		while self._is_recording:
			pass

		return self._pcm
try:
	o = create(
		access_key=pv_access_key,
		enable_automatic_punctuation=False,
	)

	event = threading.Event()
	event2 = threading.Event()
	

	def check_button():
		global display_text_overlay
		global button_pressed
		while True:
			if button_a.is_pressed:
				if not button_pressed:
					button_pressed = True
					display_text_overlay = not display_text_overlay
					toggle_overlay()
			else:
				button_pressed = False
			time.sleep(0.1)

	button_thread = threading.Thread(target=check_button)
	button_thread.daemon = True
	button_thread.start()

	while True:
		wake_word()
		event2.set()
		recorder = Recorder()
		recorder.start()
		listen()
		detect_silence()
		transcript, words = o.process(recorder.stop())
		t_fade = threading.Thread(target=fade_leds, args=(event,))
		t_fade.start()
		recorder.stop()

		if transcript not in Clear_list:
			current_time()
			prompt_full = transcript
			print("You requested:", prompt_full)
			print("\nCreating...")
			image_url = dall_e3(prompt_full)
			raw_data = urllib.request.urlopen(image_url).read()
			img = Image.open(io.BytesIO(raw_data))

			images_dir = Path('/home/evan/Documents/Python/AI_Art_Frame/images')
			images_dir.mkdir(exist_ok=True)
			filename = images_dir / 'generated_image.png'
			counter = 1
			while filename.exists():
				counter += 1
				filename = images_dir / f'generated_image_{counter}.png'
			img.save(filename)

			print("\nRendering original image...")
			img_resized = img.resize((600, 448), Image.ANTIALIAS)
			display.set_image(img_resized)
			display.show()
			print("\nDone")

			event.set()
			GPIO.output(led1_pin, GPIO.LOW)
			sleep(2)
			t_refresh = threading.Thread(target=refresh_schedule, args=(event2,))
			t_refresh.start()

		else:
			print("Clearing the display...")
			clean_screen()
			event.set()
			event2.set()
			GPIO.output(led1_pin, GPIO.LOW)
			print("\nDone")



except Exception as e:
	print("Error:", e)

except ConnectionResetError:
	print ("Reset Error")
	current_time()

except KeyboardInterrupt:	 
	exit()

from pycasso.

jezs00 avatar jezs00 commented on May 28, 2024

I think your code looks good (take this with a grain of salt because I'm no professional). Thanks for sharing it with me.

My recommendation is to do what I did and throw it up on github, maybe even do a little write up for some internet points and pop it up in the raspberry pi subreddit. If you're going to throw it up on github the only advice I'd give is abstract out your keys into another file so you can test easily and avoid posting your actual keys from a mistake when committing.

I will leave this issue open as it's overall a good idea and code wise there's quite a few good ideas that could be thrown into pycasso - however would take a bit of work and not having the tech it's harder for me to test.

from pycasso.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.