transitive-bullshit / openopenai Goto Github PK

View Code? Open in Web Editor NEW

506.0 10.0 44.0 686 KB

Self-hosted version of OpenAI’s new stateful Assistants API

License: MIT License

Shell 0.05% JavaScript 0.32% TypeScript 99.63%

assistants gpts openai openai-api self-hosted

openopenai's People

Contributors

Stargazers

Watchers

Forkers

jeffara rossman22590 lguzzon-scratchbook winklerj fmbento oieieio dbigman emersonbroga llegomark anon2578 touristshaun whatif-dev r2k0 id-2 nrvo papiguy tuanzi1015 andrei997 avvaramesh brunoscaglione mason0510 roskideluge lolszz scienceartist airoom-chat cylonspace jaytoday evgyk mdtanrikulu oripekelman dennis-nedry-from-jurassic-park praneybehl toddyco cbruyndoncx jwjohns adeelahmad to-thesis-mwucs zhengdeding anyaoha miroslavkubasek fanvcz934n visioninhope bytesbybrandon bdphil86

openopenai's Issues

This is amazing! But getting a little stale now, with v2 assistants API, streaming and GPT 4o

Is this project dead, or is there any change it'll get updated again?

Can anyone recommend any alternative?

Add support for different knowledge retrieval methods

This is for the built-in retrieval tool.

Currently, the current knowledge retrieval implementation uses a very naive retrieval which simply returns the full contents of every attached file (source).

The current implementation also only support text file types like text/plain and markdown, as no preprocessing or conversions are done at the moment.

It shouldn't be too hard to add support for more legit knowledge retrieval approaches, which would require:

processForFileAssistant - File ingestion pre-processing for files marked with purpose: 'assistants'
- converting non-text files to a common format like markdown (this is probably the hardest step to do well across all of the most common file types)
- chunking files
- embedding chunks
- storing embeddings to an external vector store; make sure to store the file_id each chunk comes from for filtering purposes
retrievalTool - Performs knowledge retrieval for a given query on a set of file_ids for RAG.
- embed query
- semantic search over vector store filtering by the given file_ids

Integrations here with LangChain and/or LlamaIndex would be great for their flexibility, but we could also KISS and roll out own using https://github.com/dexaai/dexter

Error 'slow down' on localhost e2e test

Just installed, and kicked off dist/server and dist/runner.

Both start listening with no complaints.

Kick off a test with: OPENAI_API_BASE_URL='http://127.0.0.1:3000' npx tsx e2e

and get the below error on runner.

Note the only thing I can see wrong is that it is showing server: cloudflare though I have s3 configured with an s3:// address.

According to this thread on the opennai forums, this can sometimes be caused by lack of "Authorization" header.
https://community.openai.com/t/getting-hit-with-429-slow-down-error/482704/16

Runner started for queue "openopenai" listening for "thread-run" jobs
Processing thread-run job "clrmgtt100005iwpwdp54i3mb" for run "clrmgtt100005iwpwdp54i3mb"
Job "clrmgtt100005iwpwdp54i3mb" run "clrmgtt100005iwpwdp54i3mb": >>> chat completion call {
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    {
      role: 'user',
      content: 'What is the weather in San Francisco today?'
    }
  ],
  model: 'gpt-4-1106-preview',
  tools: [ { type: 'function', function: [Object] } ],
  tool_choice: 'auto'
}
Error job "clrmgtt100005iwpwdp54i3mb" run "clrmgtt100005iwpwdp54i3mb": APIError: 429 "slow down"
    at <anonymous> (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/openai-fetch/src/fetch-api.ts:43:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at fn (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/ky/source/core/Ky.ts:55:14)
    at Promise.result.<computed> (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/ky/source/core/Ky.ts:86:27)
    at OpenAIClient.createChatCompletion (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/openai-fetch/src/openai-client.ts:73:45)
    at ChatModel.runModel (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/@[email protected]/node_modules/@dexaai/dexter/src/model/chat.ts:53:24)
    at ChatModel.run (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/@[email protected]/node_modules/@dexaai/dexter/src/model/model.ts:132:24)
    at Worker.Worker.connection (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/src/runner/index.ts:270:21)
    at async Worker.processJob (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/bullmq/dist/cjs/classes/worker.js:350:28)
    at async Worker.retryIfFailed (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/bullmq/dist/cjs/classes/worker.js:537:24) {
  status: 429,
  headers: {
    'alt-svc': 'h3=":443"; ma=86400',
    'cf-ray': '8489bcf009a4efd2-PDX',
    connection: 'keep-alive',
    'content-length': '22',
    'content-type': 'application/json',
    date: 'Sat, 20 Jan 2024 19:31:29 GMT',
    server: 'cloudflare',
    'set-cookie': '__cf_bm=NIGNPPboVXD8fiY_rJBrhjexBEiM7x8_Q4Nkuof9gS4-1705779089-1-ATL5Ajgw+3zdT405fvUxZgi5nnc3+jPQ/+W+uEv1Tk8yuymYc0CDzU2F25WJ0zNVlrF4eVSTlCTvoHOZeG7ZH/A=; path=/; expires=Sat, 20-Jan-24 20:01:29 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None',
    vary: 'Accept-Encoding'
  },
  error: 'slow down',
  code: undefined,
  param: undefined,
  type: undefined
}

Add streaming support for runs

This isn't supported in the official OpenAI API yet, but it was mentioned at the OpenAI dev day that it will be coming soon, possibly via websocket and/or webhook support.

See this related issue in the OpenAI developer community.

The toughest part of this is that the runner is completely disparate from the HTTP server, as it should be, to process thread runs in an async task queue. The runner is responsible for making chat completion calls, which are streamable, so we'd have to either:

do some plumbing to connect the runner's execution to the result of the createRun or createThreadAndRun operations, and then pipe the chat completion calls into this stream
or we could move the run implementation to not be handled by an async task queue, but rather live within createRun / createThreadAndRun
- this approach would be quite a bit simpler, but I have a feeling it's the wrong approach long-term, as runs conceptually lend themselves to being decoupled from the HTTP call. this also makes the most sense from a sandboxing perspective, and to keep the HTTP server lightweight without long-running HTTP responses
or move to a websocket and/or webhook approach, which is fine in and of itself, but has the huge downside of being completely different from the current SSE streaming that the chat completion API has embraced, and thinking about building apps that would potentially have to support both of these streaming approaches would make me a really sad panda

Add support for db prefix ids

OpenAI uses prefix IDs for its resources, which would be great, except it's a pain to get working with Prisma.

See prisma/prisma#3391 and prisma/prisma#6719 for more details.

OpenAI's resource prefixes:

asst_
msg_
thread_
run_
step_
call_
file-

Add support for custom models

Currently, the models is hard-coded to use the OpenAI chat completion API, but it wouldn't be very difficult to use custom LLMs or external model providers.

The only real constraint is that the custom models need to support function calling and/or ideally parallel tool calling using OpenAI's tool_calls format.

Will consider implementing this depending on how much love this issue gets.

Add support for built-in code_interpreter tool

This is for the built-in code_interpreter tool.

Currently, the built-in code_interpreter tool is hard-coded to throw a 501 unsupported error (source).

At a minimum, we should support an integration with open-interpreter.

I believe that e2b also has some interpreter functionality.

Integration with LangChain OpenGPTs

OpenGPTs is an awesome OSS project by the LangChain team that has a decent amount of overlap with this project.

The main difference between the two is that this project is intended to have 100% API compatibility with the official OpenAI Assistants API, whereas OpenGPTs is based loosely on the functionality of OpenAI GPTs.

Seeing as we're all playing around in similar sandboxes, I figured it made sense to open an issue to see if there's any interest.

transitive-bullshit / openopenai Goto Github PK

openopenai's People

Contributors

Stargazers

Watchers

Forkers

openopenai's Issues

This is amazing! But getting a little stale now, with v2 assistants API, streaming and GPT 4o

Add support for different knowledge retrieval methods

Error 'slow down' on localhost e2e test

Add streaming support for runs

Add support for db prefix ids

Add support for custom models

Add support for built-in code_interpreter tool

Integration with LangChain OpenGPTs

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent