Coder Social home page Coder Social logo

meta-vision-api's Introduction

Meta Vision

Meta Glasses GPT4 Vision API Implementation

This is a hacky way to integrate GPT4 Vision into the Meta Rayban Smart Glasses using voice commands.

Example Demonstration

Requirements:

a) Meta Rayban Smart Glasses

b) OpenAi Api Key

c) Alternative Facebook/Messenger account

d) bun

Setup

Get the server up and running:

  1. Add a .env file with your OpenAi API key (example via env.example)

  2. Run bun install

  3. Run bun run dev

  4. Server should be up and running on PORT 3103

Add the Messenger Chat Observer:

WARNING: bookmarklets are a slightly obscure and very hacky way to execute arbitrary javascript in your browser, before running MAKE SURE to check the code you're executing. The bookmarklet code is documented below in the section titled: Bookmarklet Code Breakdown

  1. Login to messenger.com with an alternative messenger/facebook account (make sure you are friends with your main account that's logged into your meta view app)

  2. Copy and paste the code from bookmarklet.js and create a new bookmark in your browser with the URL as the code (alternatively import it as a bookmark)

  3. Click the newly created bookmark

  4. Upon success a dialog should appear with Added Messenger Chat Observer

Test the integration:

  1. Make sure within the Meta View app that the messenger connection is connected to the appropriate main account

  2. Say You: Hey Meta, send a photo to *name of alternative account*

  3. Meta: Send a photo to *name of alternative account*

  4. You: Yes

  5. Upon receiving the new photo and sending it to GPT4 Vision the server should display the following logs:

GPT4 Vision Request
Sending request to GPT4 Vision
Request Successful
Saving data
Reading stored data
Creating new data file.
Writing new data
  1. Open up ./public/data.json to check the successful added data

ENJOY!

Bookmarklet Code Breakdown:

javascript: (function (s) {
  //This a bookmarklet that you can either import as a bookmark
  //OR you can copy all the code and paste into the URL when making a new bookmark
  //OR post in dev console

  // This is designed to observe for any new photo messages that are sent in messenger and then to forward the image url to this projects REST api

  const messages = document.getElementsByClassName("x78zum5 xdt5ytf x1iyjqo2 xs83m0k x1xzczws x6ikm8r x1rife3k x1n2onr6 xh8yej3")[1].childNodes[2];

  // This is to find the messages container within messenger.com for the selected chat

  // However, these obfuscated classes are subject to change and so this is likely to break in the near future

  messages.removeEventListener("DOMNodeInserted", null);

  // The utilization of DOMNodeInserted is very bad practice and will be deprecated in all browsers in the future

  // Mutation observer should replace DOMNodeInserted
  messages.addEventListener("DOMNodeInserted", async (event) => {
    const imgSrc = event?.target?.getElementsByTagName("img")[1]?.src;
    if (imgSrc) {
      const res = await fetch("http://localhost:3103/api/gpt-4-vision", {
        method: "POST",

        //Facebook's image urls contains lots of properties that need to be perfectly preserved in order to view the image
        body: JSON.stringify({ imageUrl: imgSrc }),
        mode: "no-cors",
        headers: {
          "Content-Type": "application/json",
        },
      });
      const data = res.json();
      console.log(data);
    }
  });
  alert("Added Messenger Chat Observer");
})();

by Devon Crebbin

Please reach out if there are any issues or feature requests :)

Hopefully the Meta Reality Labs team will provide an SDK in the future so these types of integrations can be ✨productionised✨

meta-vision-api's People

Contributors

dcrebbin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.