We have a server implemented with FastAPI to cal Anthropic through Python, but when ru

Thanks for the explanation <a class="user-mention notranslate" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the explanation <a class="user-mention notranslate" data-hovercard-type="us

Memory leak while using anthropic python sdk 0.3.10 about anthropic-sdk-python HOT 13 CLOSED

anthropics commented on June 15, 2024

Memory leak while using anthropic python sdk 0.3.10

from anthropic-sdk-python.

Comments (13)

deter3 commented on June 15, 2024 2

Thanks for the explanation @yuliu-eric . Unfortunately, I don't think there's much we can do to help here without a reproducible script. Can you try to compose one that demonstrates the issue?

I will test tonight and let you know the reult .

from anthropic-sdk-python.

yuliu-eric commented on June 15, 2024 1

Thank you for taking the effort look into this! I'll try to provide a minimal working snippet later to reproduce the problem.

from anthropic-sdk-python.

rattrayalex commented on June 15, 2024

Are you reinstantiating the client on every request?

If so, could you move it outside your handler, like so?

chat = Anthropic(api_key=api_key)

@app.get("/")
async def my_handler():
  completion = chat.completions.create(
    model="claude-2",
    max_tokens_to_sample=20,
    prompt=f"{HUMAN_PROMPT} what is {passed_in_number} {AI_PROMPT}",
  )
  return completion

If the API key changes per request, you might consider a with block:

@app.get("/")
async def my_handler():
  with Anthropic(api_key=api_key) as client:
    completion = chat.completions.create(
      model="claude-2",
      max_tokens_to_sample=20,
      prompt=f"{HUMAN_PROMPT} what is {passed_in_number} {AI_PROMPT}",
    )
    return completion

Do either of those work for you?

from anthropic-sdk-python.

yuliu-eric commented on June 15, 2024

In summary, I've tried all the approaches as mentioned above, but unfortunately still got the same memory issue. Would be great if you can help to reproduce the same issue on your side.

from anthropic-sdk-python.

RobertCraigie commented on June 15, 2024

@yuliu-eric can you share a minimal reproduction? I can't fully reproduce what you're seeing.

Running this script:

import psutil
import anthropic
from anthropic import Anthropic

base_url = "http://127.0.0.1:4010"  # local mock server


for _ in range(1000):
    print(f"total mem usage: {psutil.virtual_memory().percent}")

    prompt = "\n\nHuman: This is a test prompt. Return some randome text\n\nAssistant:"
    client = Anthropic(base_url=base_url)
    client.completions.create(
        prompt=prompt,
        stop_sequences=[anthropic.HUMAN_PROMPT],
        model="claude-instant-1.2",
        max_tokens_to_sample=100,
    )

Results in a total memory usage increase of 54.7% -> 55.5% and eventually results in this error

OSError: [Errno 24] Too many open files
Exception ignored in: <function Anthropic.__del__ at 0x107569dc8>
Traceback (most recent call last):
  File "/Users/robert/stainless/stainless/dist/anthropic-python/src/anthropic/_client.py", line 224, in __del__
  File "/Users/robert/stainless/stainless/dist/anthropic-python/src/anthropic/_base_client.py", line 691, in close
AttributeError: 'Anthropic' object has no attribute '_client'

However this would be considered a bug in the user-code, you shouldn't be instantiating a client in a loop like this and if you really need to you should call client.close() or use the context manager that was previously mentioned.

For example:

    client.completions.create(
        prompt=prompt,
        stop_sequences=[anthropic.HUMAN_PROMPT],
        model="claude-instant-1.2",
        max_tokens_to_sample=100,
    )
    client.close()

This results in memory usage steadily increasing from 56.3% to a peak of 58% where it then sharply drops back down to 56.3%, this is presumably because the gc hasn't been able to free up the created client instances until the script is nearly finished running.

from anthropic-sdk-python.

rattrayalex commented on June 15, 2024

@yuliu-eric could you make a modification to the script above that reproduces the error?

from anthropic-sdk-python.

yuliu-eric commented on June 15, 2024

@RobertCraigie @rattrayalex
Thank you so much for the effort!

However, I think there might be a misunderstanding here, let me try to rephrase it and below is our scenario:

we have a python server (let's name it as S here) that communicates with Anthropic using python sdk
we have client (let's name it as C) that sending 20 concurrent requests to S continuously, which in turn S sends python calls to Anthropic, processes the results from Anthropic, and returns data back to C

We observed that the server S's memory kept increasing in the above case, but no problem when replacing Anthropic python calls with http calls in S.

I have attached a diagram to help demonstrate the problem.

Btw, I've tried to add client.close() in our python server code after each call to Anthropic, but got the same memory issue.

from anthropic-sdk-python.

rattrayalex commented on June 15, 2024

Thanks for the explanation @yuliu-eric . Unfortunately, I don't think there's much we can do to help here without a reproducible script. Can you try to compose one that demonstrates the issue?

from anthropic-sdk-python.

qxcv commented on June 15, 2024

FYI I'm also getting a resource leak when using the Anthropic client where open file count climbed monotonically due to connections to *.googleusercontent.com. In our case it turned out to be because of code like this (slightly cursed code, sorry):

def make_prediction():
        limits = httpx.Limits(max_connections=1, max_keepalive_connections=1)
        with closing(Anthropic(connection_pool_limits=limits)) as anthropic:
            completion = anthropic.with_options(max_retries=12).completions.create(
                model=some_model_name,
                prompt=some_prompt,
                temperature=temperature,
                max_tokens_to_sample=max_length_tokens,
            )
        return completion.completion

We didn't realize that with_options was creating a new client each time. It seems like those clients weren't getting GC'd at all so they never cleaned up their resources. If we take out the unnecessary .with_options() (and move max_retries into the Anthropic() constructor), then everything works fine (no more leak of connections), even though we're creating and closing a new client on each function call. Perhaps this will be useful for tracking down where the leak is? I didn't notice any obvious circular refs in .with_options()/.copy(), but I also only spent a minute on this.

(Also IMO using __del__ to clean up open files is a massive footgun and the client should probably be a context manager instead; I've had to deal with so much deadlocked multiprocessing code where things break because they rely on __del__ to be called as soon as the object falls out of scope and in the right order relative to other finalizers, which often does not happen. PyTorch dataloaders are the absolute worst offender for this.)

from anthropic-sdk-python.

RobertCraigie commented on June 15, 2024

Yes I agree that __del__ is not an ideal way to cleanup but our intention with it is to help protect against people forgetting to call .close() themselves when not using a context manager.

the client should probably be a context manager instead;

The client does support being used as a context manager, did this not work for you or were you unaware?

I've also just realised that we don't document the client as being a context manager or .close() in the README but we probably should!

We didn't realize that with_options was creating a new client each time. It seems like those clients weren't getting GC'd at all so they never cleaned up their resources. If we take out the unnecessary .with_options() (and move max_retries into the Anthropic() constructor), then everything works fine (no more leak of connections), even though we're creating and closing a new client on each function call. Perhaps this will be useful for tracking down where the leak is? I didn't notice any obvious circular refs in .with_options()/.copy(), but I also only spent a minute on this.

This is helpful context, thanks!

It is very surprising that those clients are not being cleaned up... will look into it.

Creating a new client on every with_options() call is not ideal, we'll look into improving this but it should be noted that due to the nature of how httpx works, passing certain options such as connection_pool_limits will require that a new httpx instance is created. However in your example max_retries wouldn't require a new client.

completion = anthropic.with_options(max_retries=12).completions.create(

Out of curiosity is there a reason you set max_retries to 12? Have you ran into instability in the SDK / API?

from anthropic-sdk-python.

qxcv commented on June 15, 2024

Yes I agree that __del__ is not an ideal way to cleanup but our intention with it is to help protect against people forgetting to call .close() themselves when not using a context manager
The client does support being used as a context manager, did this not work for you or were you unaware?

Thanks for clarifying, I totally missed the context manager support. I took a quick look at the Anthropic class in vscode and didn't see the context manager methods, but I should have checked the base class as well. Using it as a context manager works great 👍

Creating a new client on every with_options() call is not ideal, we'll look into improving this but it should be noted that due to the nature of how httpx works, passing certain options such as connection_pool_limits will require that a new httpx instance is created. However in your example max_retries wouldn't require a new client.

FWIW our use case didn't truly need to make use of this method & it was easy to move the extra option up to the place where the client is constructed. If it wasn't for the GC issue then I don't think anyone would have noticed (efficiency is not much of a concern for the batch job we're running).

Out of curiosity is there a reason you set max_retries to 12? Have you ran into instability in the SDK / API?

No, we're just doing something hacky to parallelize our requests (using a pool of 10 threads), which leads to lots of rate limit issues. If we did things properly with asyncio then this would not have been a problem.

from anthropic-sdk-python.

rattrayalex commented on June 15, 2024

I don't think we've been able to repro a memory leak, so I'm going to close this, but we do plan to investigate resource leaks more deeply and we'd still appreciate a code snippet that can reproduce an issue!

from anthropic-sdk-python.

tamird commented on June 15, 2024

This might've been fixed by 5e51ebd.

from anthropic-sdk-python.

Memory leak while using anthropic python sdk 0.3.10 about anthropic-sdk-python HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent