Coder Social home page Coder Social logo

transitive-bullshit / openopenai Goto Github PK

View Code? Open in Web Editor NEW
506.0 10.0 44.0 686 KB

Self-hosted version of OpenAI’s new stateful Assistants API

License: MIT License

Shell 0.05% JavaScript 0.32% TypeScript 99.63%
assistants gpts openai openai-api self-hosted

openopenai's People

Contributors

evgyk avatar jamiew avatar transitive-bullshit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openopenai's Issues

Add support for different knowledge retrieval methods

This is for the built-in retrieval tool.

Currently, the current knowledge retrieval implementation uses a very naive retrieval which simply returns the full contents of every attached file (source).

The current implementation also only support text file types like text/plain and markdown, as no preprocessing or conversions are done at the moment.

It shouldn't be too hard to add support for more legit knowledge retrieval approaches, which would require:

  • processForFileAssistant - File ingestion pre-processing for files marked with purpose: 'assistants'

    • converting non-text files to a common format like markdown (this is probably the hardest step to do well across all of the most common file types)
    • chunking files
    • embedding chunks
    • storing embeddings to an external vector store; make sure to store the file_id each chunk comes from for filtering purposes
  • retrievalTool - Performs knowledge retrieval for a given query on a set of file_ids for RAG.

    • embed query
    • semantic search over vector store filtering by the given file_ids

Integrations here with LangChain and/or LlamaIndex would be great for their flexibility, but we could also KISS and roll out own using https://github.com/dexaai/dexter

Error 'slow down' on localhost e2e test

Just installed, and kicked off dist/server and dist/runner.

Both start listening with no complaints.

Kick off a test with: OPENAI_API_BASE_URL='http://127.0.0.1:3000' npx tsx e2e

and get the below error on runner.

Note the only thing I can see wrong is that it is showing server: cloudflare though I have s3 configured with an s3:// address.

According to this thread on the opennai forums, this can sometimes be caused by lack of "Authorization" header.
https://community.openai.com/t/getting-hit-with-429-slow-down-error/482704/16

Runner started for queue "openopenai" listening for "thread-run" jobs
Processing thread-run job "clrmgtt100005iwpwdp54i3mb" for run "clrmgtt100005iwpwdp54i3mb"
Job "clrmgtt100005iwpwdp54i3mb" run "clrmgtt100005iwpwdp54i3mb": >>> chat completion call {
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    {
      role: 'user',
      content: 'What is the weather in San Francisco today?'
    }
  ],
  model: 'gpt-4-1106-preview',
  tools: [ { type: 'function', function: [Object] } ],
  tool_choice: 'auto'
}
Error job "clrmgtt100005iwpwdp54i3mb" run "clrmgtt100005iwpwdp54i3mb": APIError: 429 "slow down"
    at <anonymous> (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/openai-fetch/src/fetch-api.ts:43:14)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at fn (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/ky/source/core/Ky.ts:55:14)
    at Promise.result.<computed> (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/ky/source/core/Ky.ts:86:27)
    at OpenAIClient.createChatCompletion (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/openai-fetch/src/openai-client.ts:73:45)
    at ChatModel.runModel (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/@[email protected]/node_modules/@dexaai/dexter/src/model/chat.ts:53:24)
    at ChatModel.run (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/@[email protected]/node_modules/@dexaai/dexter/src/model/model.ts:132:24)
    at Worker.Worker.connection (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/src/runner/index.ts:270:21)
    at async Worker.processJob (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/bullmq/dist/cjs/classes/worker.js:350:28)
    at async Worker.retryIfFailed (/Users/chuckjewell/git/123/repos/ai_dev/OpenOpenAI/node_modules/.pnpm/[email protected]/node_modules/bullmq/dist/cjs/classes/worker.js:537:24) {
  status: 429,
  headers: {
    'alt-svc': 'h3=":443"; ma=86400',
    'cf-ray': '8489bcf009a4efd2-PDX',
    connection: 'keep-alive',
    'content-length': '22',
    'content-type': 'application/json',
    date: 'Sat, 20 Jan 2024 19:31:29 GMT',
    server: 'cloudflare',
    'set-cookie': '__cf_bm=NIGNPPboVXD8fiY_rJBrhjexBEiM7x8_Q4Nkuof9gS4-1705779089-1-ATL5Ajgw+3zdT405fvUxZgi5nnc3+jPQ/+W+uEv1Tk8yuymYc0CDzU2F25WJ0zNVlrF4eVSTlCTvoHOZeG7ZH/A=; path=/; expires=Sat, 20-Jan-24 20:01:29 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None',
    vary: 'Accept-Encoding'
  },
  error: 'slow down',
  code: undefined,
  param: undefined,
  type: undefined
}

Add streaming support for runs

This isn't supported in the official OpenAI API yet, but it was mentioned at the OpenAI dev day that it will be coming soon, possibly via websocket and/or webhook support.

See this related issue in the OpenAI developer community.

The toughest part of this is that the runner is completely disparate from the HTTP server, as it should be, to process thread runs in an async task queue. The runner is responsible for making chat completion calls, which are streamable, so we'd have to either:

  • do some plumbing to connect the runner's execution to the result of the createRun or createThreadAndRun operations, and then pipe the chat completion calls into this stream
  • or we could move the run implementation to not be handled by an async task queue, but rather live within createRun / createThreadAndRun
    • this approach would be quite a bit simpler, but I have a feeling it's the wrong approach long-term, as runs conceptually lend themselves to being decoupled from the HTTP call. this also makes the most sense from a sandboxing perspective, and to keep the HTTP server lightweight without long-running HTTP responses
  • or move to a websocket and/or webhook approach, which is fine in and of itself, but has the huge downside of being completely different from the current SSE streaming that the chat completion API has embraced, and thinking about building apps that would potentially have to support both of these streaming approaches would make me a really sad panda

Add support for custom models

Currently, the models is hard-coded to use the OpenAI chat completion API, but it wouldn't be very difficult to use custom LLMs or external model providers.

The only real constraint is that the custom models need to support function calling and/or ideally parallel tool calling using OpenAI's tool_calls format.

Will consider implementing this depending on how much love this issue gets.

Integration with LangChain OpenGPTs

OpenGPTs is an awesome OSS project by the LangChain team that has a decent amount of overlap with this project.

The main difference between the two is that this project is intended to have 100% API compatibility with the official OpenAI Assistants API, whereas OpenGPTs is based loosely on the functionality of OpenAI GPTs.

Seeing as we're all playing around in similar sandboxes, I figured it made sense to open an issue to see if there's any interest.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.