microsoft / codex-babylon Goto Github PK

View Code? Open in Web Editor NEW

71.0 15.0 21.0 3.61 MB

Web app that uses Codex and BabylonJS to turn natural language into 3D objects and instructions

License: MIT License

JavaScript 8.35% HTML 2.79% Shell 0.62% CSS 1.97% TypeScript 86.27%

codex-babylon's Introduction

Codex Babylon Prototype

This project converts natural language into 3D assets using BabylonJS and OpenAI's Codex:

The project is made up of a React web application frontend with an Express backend.

Statement of Purpose

This repository aims to grow the understanding of using Codex in applications by providing an example of implementation and references to support the Microsoft Build conference in 2022. It is not intended to be a released product. Therefore, this repository is not for discussing OpenAI API, BabylonJS or requesting new features.

Requirements

Node.JS
An OpenAI account
- OpenAI API Key.
- OpenAI Organization Id. If you have multiple organizations, please update your default organization to the one that has access to codex engines before getting the organization Id.
- OpenAI Engine Id. It provides access to a model. For example, code-davinci-002 or code-cushman-001. See here for checking available engines.

Running the App

Clone the repo: git clone https://github.com/microsoft/Codex-Babylon and open the Codex-Babylon folder.
Create a .env file in the root directory of the project, copying the contents of the .env.example file.

In .env, provide the following configuration:

Config Name	Description
`OPENAI_API_KEY`	The OpenAI API key.
`OPENAI_ORGANIZATION_ID`	Your OpenAI organization id. If you have multiple organizations, please update your default organization to the one that has access to codex engines before getting the organization id.
`OPENAI_ENGINE_ID`	The OpenAI engine id that provides access to a model. For example, `code-davinci-002` or `code-cushman-001`. See here for checking available engines.
`SERVER_PORT`	The port to run the server code. Default to `1200`.
`CLIENT_PORT`	The port to run the web app. Default to `3000`.

Run npm install to gather the projects' dependencies.
Run npm run start to serve the backend and launch the web application.

Using the App

The app consists of a basic text box to enter natural language commands, and a 3D scene to display the results. Enter commands into the text box and press enter to see the results. Note that conversation context is maintained between commands, so subsequent commands can refer back to previous ones.

Example commands:

Create a cube

Make it red and make it spin

Put a teal sphere above it and another below it

Make the bottom sphere change colors when the cursor hovers over it

Debugging

To debug the web application, you can debug with VSCode debugger.

To debug the code generated from codex, the current debugging experience is basic:

Observe logs in your browser dev tools (F12) to debug issues evaluating generated code
Observe logs in your console to debug issues between the Express server, Codex, and the client

Understand the Code

The server and client code is under src/.

Client (src/client)

index.tsx is the entry to bootstrap the React web application.
index.html is the barebones main view of the app. It uses Bootstrap for basic styling.

Server (src/server)

app.ts is the main entry point for the app. It sets up the Express to serve RESTful APIs after being transpile into JavaScript (output: dist\server\app.js).
model.ts manages interaction the Codex API. This uses isomorphic-fetch to make POST calls of natural language to be converted to code. It also includes helper methods for engineering the prompt that is sent to Codex (see "prompt engineering" below).

Prompt Engineering

Generative models like Codex are trained on the simple task of guessing the next token in a sequence. A good practice to coax the kind of tokens (code) you want from Codex is to include context and example interactions in a prompt - this practice is called few-shot prompt engineering. These examples are sent to the model with every API call, along with your natural language query. Codex then "guesses" the next tokens in the sequence (the code that satisfies the natural language).

This project currently contains "contexts" - examples of what we expect from the model in the src/server/contexts folder. A context consists of a description to the model of what will be in the prompt along with examples of Natural Language and the code it should produce. See snippet of context1 from the contexts folder:

/* This document contains natural language commands and the BabylonJS code needed to accomplish them */

state = {};

/* Make a cube */
state.cube = BABYLON.MeshBuilder.CreateBox("cube", {size: 1}, scene);

/* Move the cube up */
state.cube.position.y += 1;

/* Move it to the left */
state.cube.position.x -= 1;

As you can see, the first line gives a description of the prompt (explaining to Codex that it should take natural language commands and produce BabylonJS code. It then shows a single line of contextual code, establishing the existence of a state object to be used by Codex. Finally, it gives several examples of natural language and code to give Codex a sense of the kind of code it should write. These examples use the state object mentioned above to save new Babylon objects onto. It also establishes a kind of conversational interaction with the model, where a natural language command might refer to something created on a past turn ("Move it to..."). These examples help nudge the model to produce this kind of code on future turns.

The project also includes a Context class (see Context.ts) that offers several helpers for loading contexts and creating prompts. As a user interacts with the experience, we update the context to include past commands and responses. On subsequent conversation turns, this gives the model the relevant context to do things like pronoun resolution (e.g. of "it" in "make it red").

Currently a single ongoing context is maintained on the server. This can be reset with the "Reset" button in the app. The single context means that the app is currently not multi-tenanted, and that multiple browser instances will reuse the same context. Note that prompts to Codex models can only be so long - as the prompt exceeds a certain token limit, the Context class will shorten the prompt from the beginning.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

FAQ

What OpenAI engines are available to me?

You might have access to different OpenAI engines per OpenAI organization. To check what engines are available to you, one can query the List engines API for available engines. See the following commands:

Shell

curl https://api.openai.com/v1/engines \
  -H 'Authorization: Bearer YOUR_API_KEY' \
  -H 'OpenAI-Organization: YOUR_ORG_ID'

Windows Command Prompt (cmd)

curl --ssl-no-revoke https://api.openai.com/v1/engines --header OpenAI-Organization:YOUR_ORG_ID --oauth2-bearer YOUR_API_KEY

Can I run the sample on Azure?

The sample code can be currently be used with Codex on OpenAI’s API. In the coming months, the sample will be updated so you can use it also with the Azure OpenAI Service.

codex-babylon's People

Contributors

Stargazers

Watchers

codex-babylon's Issues

Consider scaling all assets to 1 meter cube

Currently asset sizing is all over the place

Introduce Undo/Redo Semantics and Implementation

Currently the user experience in the app involve multiple back-to-back NL -> Code interactions. While the code often accomplishes what the user set out for it to do, it also makes mistakes. Frequently, backing out of these mistakes (e.g. deleting the resources created, resetting the animation changed, moving a mesh back) is difficult to do. Instead, we should introduce semantics for undoing and redoing a change.

There are a couple of ways we can do this:

Create some kind of checkpoint of a scene (scenes may be serializable?), maintaining them in a stack. The stack is popped and pushed onto (probably maintain a queue for this) as users undo/redo
Maintain a history of the code written since interaction started. This is easier if we start with an empty scene, but we may also want to consider the scenario in which someone starts with an existing scene and how we represent the original scene versus the scripted code evaluated beyond it

The readme is slightly confusing regarding the endpoint/model

I'm finally getting back to looking at this project and noticed some changes to the .env and the readme, specifically under Running the App it states:

Create a .env file in the root directory of the project, copying the contents of the .env.example file
i. The endpoint for OpenAI models is https://api.openai.com/v1/completions
ii. The model name can be from a list of available off-the-shelf engines or a fine-tuned model. See the OpenAI API Reference for how to get a list of available models and fine-tunes from OpenAI.

This seems to imply that I should use https://api.openai.com/v1/completions as the ENDPOINT value with one of the models returned from https://beta.openai.com/docs/api-reference/engines/list as the MODEL value.

When I do this, I get an http 400 status code.

If I set the ENDPOINT to something in this pattern it all works as expected:

ENDPOINT=https://api.openai.com/v1/engines/<model name>/completions

Intervals/Timeouts keep running after scene is reset

Babylex uses setTimeout/setInterval to repeatedly do something in the scene. As an example, if we ask to make a cube spin, it sets an interval:

When we reset the scene, we delete the state bag, but the intervals/timeouts that were set continue to fire, leading to repeated errors visible in the console (in the above case, once every 10ms). These probably also impact performance of the app.

Convert backend code into TypeScript

Reset should also reset the scene

Currently the reset button calls the backend to reset the prompt (basically resetting the "conversation"). Beyond this, it should also reset the scene to a blank scene (as created in #1)

Add deeper documentation to README

The README should go deeper into the prompt engineering and code organization. It should also give more complex examples of what works out of box.

Support Multi-Tenancy

Currently the server maintains a single variable - an ongoing prompt, such that it can only serve an experience to a single user in a single tab.

We should consider introducing multi-tenancy - allowing one server to serve multiple users, or multiple tabs for a single user. This will require the introduction of IDs for users and sessions. We could start with just session GUIDs for simplicity.

Fix LOD loading issue

Either in Babylon, or via workaround in app

Introduce Dialog Exporting feature

As users interact with the model, they effectively maintain an ongoing dialog. Beyond able to export these dialogs can help with debugging, reproducing interesting behavior and assembling a training corpus for fine-tuning.

Beyond just exporting the "dialog" (NL -> Code conversation), we should also consider exporting the model parameters used for a given session.

A fast follow to this feature would likely be a dialog importing tool and the ability to replay an existing dialog.

Improve the code rendering view

Currently we just print the generated code over the Babylon Scene with white text. We should consider creating a better code view (likely using Monaco or another code rendering tool), and should make the view minimizable, as many users won't want to see the code

Move to new codex model

Given the availability of a new version of the davinci codex model, we should move over to it

Setup feedback

Setup:
- It would be nice to homogenize the locations of the OpenAI credentials accross apps. microsoft/NL-CLI stores such in ~/.config/openaiapirc unless making sure it is posible to set end points on a per-app basis.
- Install and launch goes smoothly. Launches in Safari. But there's no response to "Create a cube", likewise in Edge. Looking at the console in Safari I see "TypeError: URL is not valid or contains user credentials." Screenshot below

Notes on WSL2 Bash: Error when running simple command

Environment

Windows machine on WSL2
Using Bash terminal on WSL2
Using anaconda environment setup with python 3.10
Using NodeJS version v8.10.0
Using Edge browser

Setup

Step 3 Running the App -- absolutely love seeing the links to where we can find values to fill out the config; would love to see this on other samples too!
Step 4 Running npm install -- ran into an issue because I had NodeJS on my windows machine but not in bash, so had to follow the steps outlined here: https://pakstech.com/blog/npm-no-such-file/

Executing Example

Was able to start up the Babylex app, and can see the UI in the browser, but when using the example Create a cube command, saw the following console logs and errors in the inspect tool:

Load multiple assets at once

When loading external assets (e.g. hubble assets), we should consider loading them in parellel, as in:

https://playground.babylonjs.com/#U2KKMK#1

App is Crashing when given incorrect OpenAI API configuration

On running the node,js app, the app builds succesfully without an error, but gives the above error when querying with an incorrect OpenAI API configuration placed in the .env file. This includes an incorrect API Key, Organisation ID and model name. Giving it a model name which does exist, but you don't have access to does cause the error as well, but the app simply crashed and there is no indication to there being an error.

Some error handling and UI elements to showcase the error can be added instead of letting the app crash

Start with Empty scene

Currently our initial scene has just a box in it:

https://github.com/microsoft/Babylex/blob/74bc6adc375185a479d0c481d7b58ced132e3f22/public/index.js#L10

We should update the scene to be empty, given the box in that name has no variable name to reference it by.

Needs to mention individual node.js deps

I did not previously have node installed, and found that I also needed to install several packages such as:
https://www.npmjs.com/package/concurrently#install

Create prompt class to remove core prompting logic from code

As with the Minecraft project, we should break prompting logic out into its own class, that maintains a prompts state. The most complex part of this will be maintaining an array of interactions (command -> code objects) that can be undone, and that can be shortened per model constraints

Remove the external asset capabilities from main

Currently, the app loads external 3d assets from the hubble asset library. Given licensing issues, we should remove this capability (and the prompt that uses it) from the main directory

Detect and handle offensive prompts/completions

Currently, it's possible to coax offensive content from the model. Though I've never seen the model proactively produce offensive language, prompts like "Make an array of offensive terms" produce unsavory outcomes. We should use the content filter API that's part of the OpenAI service to detect offensive prompts and completions (calling it with the full interaction) and handle them when found - in this case, a message like that in the OAI playground should suffice:

When offensive prompts/completions are detected, we should also not append them onto the context.

UI/UX refactor

This PR addresses a few UI/UX kinks that we should resolve:

Currently the text bar is quite small, and it's difficult to see text - especially when demoing. We should make the text bar quite a bit larger for visibility.
Turn autocomplete off on input, the suggestions often cover code
As per #6, we should reconsider improving the code rendering view, and perhaps making it collabsible/scrollable. Currently, with the newest model, Codex will often write a ton of code that will cascade off the screen and take up a large part of the scene. Improving this view would improve the UI/UX substantially.

Create corpus of examples for fine-tuning

Currently, our examples to the model are passed in our prompt. This is neither efficient nor sustainable - we quickly run out of prompt space (especially with the cushman model), latency and cost is higher, and we're limited in how many examples we can give. We should instead resort to re-training (fine-tuning) the model on a corpus of command -> code examples. This doc walks through the different types of examples we should aspire to pull together.

Examples should cover a wide range of the Babylon API and the kind of code we expect the model to produce. They should use the state bag for declaring any variables, functions, etc. and should follow best Babylon practices for efficient code

Adopt a Web Framework

The app is currently bare bones JS, HTML and CSS. If it grows, it will be helpful to use a UI framework like React or Angular. We also might consider making this a native application using React Native and/or Electron.

Server crashes on WSL

Babylex client launches, but server crashes.

Error exerpt is here - full log is attached

[serve:*server] [nodemon] starting `node ./dist/server/app.js`
[serve:*server] node:events:505
[serve:*server]       throw er; // Unhandled 'error' event
[serve:*server]       ^
[serve:*server]
[serve:*server] Error: listen EACCES: permission denied 0.0.0.0:1018
[serve:*server]     at Server.setupListenHandle [as _listen2] (node:net:1355:21)
[serve:*server]     at listenInCluster (node:net:1420:12)
[serve:*server]     at Server.listen (node:net:1508:7)
[serve:*server]     at Function.listen (/home/gojira/Babylex/node_modules/express/lib/application.js:618:24)
[serve:*server]     at Object.<anonymous> (/home/gojira/Babylex/dist/server/app.js:95:5)
[serve:*server]     at Module._compile (node:internal/modules/cjs/loader:1099:14)
[serve:*server]     at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
[serve:*server]     at Module.load (node:internal/modules/cjs/loader:975:32)
[serve:*server]     at Function.Module._load (node:internal/modules/cjs/loader:822:12)
[serve:*server]     at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
[serve:*server] Emitted 'error' event on Server instance at:
[serve:*server]     at emitErrorNT (node:net:1399:8)
[serve:*server]     at processTicksAndRejections (node:internal/process/task_queues:83:21) {
[serve:*server]   code: 'EACCES',
[serve:*server]   errno: -13,
[serve:*server]   syscall: 'listen',
[serve:*server]   address: '0.0.0.0',
[serve:*server]   port: 1018
[serve:*server] }
[serve:*server]
[serve:*server] Node.js v17.8.0
[serve:*server] [nodemon] app crashed - waiting for file changes before starting...

$ uname -a
Linux keijik 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ nvm ls
->      v17.8.0
         system
default -> node (-> v17.8.0)
iojs -> N/A (default)
unstable -> N/A (default)
node -> stable (-> v17.8.0) (default)
stable -> 17.8 (-> v17.8.0) (default)
lts/* -> lts/gallium (-> N/A)
lts/argon -> v4.9.1 (-> N/A)
lts/boron -> v6.17.1 (-> N/A)
lts/carbon -> v8.17.0 (-> N/A)
lts/dubnium -> v10.24.1 (-> N/A)
lts/erbium -> v12.22.12 (-> N/A)
lts/fermium -> v14.19.1 (-> N/A)
lts/gallium -> v16.14.2 (-> N/A)

app-error.log

Use Babylon from npm package instead of CDN

Use Babylon from npm package instead of CDN and use its TypeScript definition to replace all the 'any' types in the code.

Create grounded environment

In this environment + prompt, there should be a ground and everything should be placed within it.

Clean up index.html

In its early no-framework state, the index.html is jam packed with scripts, html and css. Short of adopting a framework, we should at least clean up these assets

Fix high severity vulnerability

# npm audit report

async  <2.6.4
Severity: high
Prototype Pollution in async - https://github.com/advisories/GHSA-fwr7-v2mv-hh25
fix available via `npm audit fix`
node_modules/async

1 high severity vulnerability

To address all issues, run:
  npm audit fix

Add "Statement of Purpose" section in ReadMe

Please add the following statement in ReadMe to set the right expectation to users.

Statement of Purpose
This repository aims to grow the understanding of using codex in applications by providing an example of implementation and references to support the Microsoft Build conference in 2022. It is not intended to be a released product. Therefore, this repository is not for discussing OpenAI API or requesting new features.

Had trouble with npm install

Looks specific to the npm and node coming from the Ubuntu 18.04 package repo (I am using WSL2)
$ npm --version
3.5.2
$ node --version
v8.10.0

Got this error:
npm ERR! typeerror Error: Missing required argument #1
npm ERR! typeerror at andLogAndFinish (/usr/share/npm/lib/fetch-package-metadata.js:31:3)
npm ERR! typeerror at fetchPackageMetadata (/usr/share/npm/lib/fetch-package-metadata.js:51:22)
npm ERR! typeerror at resolveWithNewModule (/usr/share/npm/lib/install/deps.js:456:12)
npm ERR! typeerror at /usr/share/npm/lib/install/deps.js:457:7
npm ERR! typeerror at /usr/share/npm/node_modules/iferr/index.js:13:50
npm ERR! typeerror at /usr/share/npm/lib/fetch-package-metadata.js:37:12
npm ERR! typeerror at addRequestedAndFinish (/usr/share/npm/lib/fetch-package-metadata.js:82:5)
npm ERR! typeerror at returnAndAddMetadata (/usr/share/npm/lib/fetch-package-metadata.js:117:7)
npm ERR! typeerror at pickVersionFromRegistryDocument (/usr/share/npm/lib/fetch-package-metadata.js:134:20)
npm ERR! typeerror at /usr/share/npm/node_modules/iferr/index.js:13:50
npm ERR! typeerror This is an error with npm itself. Please report this error at:
npm ERR! typeerror http://github.com/npm/npm/issues

After some searching, found this solution to update versions:
sudo npm install -g n
sudo n latest
sudo npm install -g npm
npm i

Not sure if you are planning on a Troubleshooting or FAQ, but I might not be the only one to hit this.

Experiment with existing asset libraries

There are 1P and 3P 3d asset libraries of objects, materials, lights, etc. Current prototyping has purely been focused primitives. We should experiment with pre-built assets as well.

This will likely involve using Codex for Retrieve and Generate (RAG) with a flow something like:

User -> "Create a spinning avocado"

Codex generates code to search for avocado asset
Run the code, determining which asset we want to use
Have Codex generate code to render the asset and make it spin

Detect produced URLs

The model sometimes outputs URLs when asked to make textures or images. We should detect these and either flag them or filter them (e.g. turn "https://..." to [IMAGE_URL])

Setup checkin guard to capture runtime errors

Basically, setup simple e2e tests to capture runtime errors and enable it with GitHub action for each PR.

Create ability for prompt to "forget" older interactions

Given our strategy for appending interactions onto a prompt, our prompts sometimes get too large for the Codex model we're using, causing errors from the Codex APIs. We should engineer a deliberate way to shorten prompts by forgetting the oldest interactions in them. This depends on #20, where we're designing a prompt class that is aware of interactions (instead of just being a string).

Support multiple babylon sessions with multiple code generation guesses

Currently we call Codex for a single "guess" (code representing the inputted natural language). We should consider asking Codex to make multiple guesses - and rendering them all simultaneously in the UI. This will increase the likelihood that a user achieves the behavior they want. It could also be used as a tool for tagging correct/incorrect guesses as future fine-tuning data.

This feature would likely require substantial engineering, and tasks like tagging, duplicating one view across the others, etc. It may merit its own research effort.

npm run start - first time setup will not start the backend

How to repo: Remove /dist folder and run npm run start.

Issue: it takes ms to build the server code while we already called nodemon to serve the server js which isn't existed yet.

Use Fluent UI controls for all inputs

These are the standard controls that many 1st party web solutions build upon: https://developer.microsoft.com/en-us/fluentui#/