arcee-ai / arcee-python Goto Github PK

View Code? Open in Web Editor NEW

14.0 3.0 3.0 114 KB

The Arcee client for executing domain-adpated language model routines

Home Page: https://www.arcee.ai

Python 100.00%

ai llm llm-inference llm-training llmops

arcee-python's Introduction

Arcee Client Docs

The Arcee client for executing domain-adpated language model routines

Installation

pip install arcee-py

Authenticating

Your Arcee API key is obtained at app.arcee.ai

In bash:

export ARCEE_API_KEY=********

In notebook:

import os
os.environ["ARCEE_API_KEY"] = "********"

Upload Context

Upload context for your domain adapted langauge model to draw from.

import arcee
arcee.upload_doc("pubmed", doc_name="doc1", doc_text="whoa")
# or
# arcee.upload_docs("pubmed", docs=[{"doc_name": "doc1", "doc_text": "foo"}, {"doc_name": "doc2", "doc_text": "bar"}])

Train DALM

Train a DALM with the context you have uploaded.

import arcee
dalm = arcee.train_dalm("medical_dalm", context="pubmed")
# Wait for training to complete
arcee.get_dalm_status("medical_dalm")

The DALM training procedure trains your model in context and stands up an index for your model to draw from.

DALM Generation

import arcee
med_dalm = arcee.get_dalm("medical_dalm")
med_dalm.generate("What are the components of Scoplamine?")

DALM Retrieval

Retrieve documents for a given query and to view them or plug into a different LLM.

import arcee
med_dalm = arcee.get_dalm("medical_dalm")
med_dalm.retrieve("my query")

Using the Arcee CLI

You can easily train and use your Domain-Adapted Language Model (DALM) with Arcee using the CLI. Follow these steps post installation to train and utilize your DALM:

Upload Context

Upload a context file for your DALM like,

arcee upload context pubmed --file doc1

Upload all files in a directory like,

arcee upload context pubmed --directory docs

Upload any combination of files and directories with,

arcee upload context pubmed --directory some_docs --file doc1 --directory more_docs --file doc2

Note: The upload command ensures only valid and unique files are uploaded.

Train your DALM:

Train your DALM with any uploaded context like,

arcee train medical_dalm --context pubmed
# wait for training to complete...

DALM Generation:

Generate text completions from a model like,

arcee generate medical_dalm --query "Can AI-driven music therapy contribute to the rehabilitation of patients with disorders of consciousness?"

DALM Retrieval:

Retrieve documents for a given query and to view them or plug into a different LLM like,

arcee retrieve medical_dalm --query "Can AI-driven music therapy contribute to the rehabilitation of patients with disorders of consciousness?"

Contributing

We use invoke to manage this repo. You don't need to use it, but it simplifies the workflow.

Set up the repo

git clone https://github.com/arcee-ai/arcee-python && cd arcee-python
# optionally setup your virtual environment (recommended)
python -m venv .venv && source .venv/bin/activate
# install repo
pip install invoke
inv install

Format, lint, test

inv format  # run black and ruff
inv lint    # black check, ruff check, mypy
inv test    # pytest

Publishing

We publish in this repo by creating a new release/tag in github. On release, a github action will publish the __version__ of arcee-py that is in arcee/__init__.py

So you need to increase that version before releasing, otherwise it will fail

To create a new release

Open a PR increasing the __version__ of arcee-py. You can manually edit it or run inv uv
Create a new release, with the name being the __version__ of arcee-py

Manual release [not recommended]

We do not recommend this. If you need to, please make the version number an alpha or beta release.
If you need to create a manual release, you can run inv build && inv publish

arcee-python's People

Contributors

Stargazers

Watchers

Forkers

ericliclair beddows rachittshah

arcee-python's Issues

Set up openapi spec

https://github.com/openapi-generators/openapi-python-client

Proposal: DALM connectors for popular frameworks

While exploring and building with prominent frameworks I've been wondering how DALMs would integrate into these.
The Arcee client helps executing dalm routines in its own setting, but shall we consider implementing or maintaining connectors for prominent frameworks like LangChain, semantic-kernel, DSPy etc.

People experiment with a couple of models and frameworks and try-out what suits best for them. Featuring connectors may result in active adoption.

What's proposed?

We could implement our connectors either in arcee-python or adding pr's to these frameworks. IMO the former is easier to maintain while we await merges for the connectors in the main repositories.

What's expected?

Implementing and maintaining these connectors is subject to discussion regarding how these would be developed and structured. It is also subjective whether to implement connectors for certain framework or not.
After providing an OOTB connector one could use arcee's client to build applications using DALMs.

A basic example usage for Microsoft's semantic-kernel

# install arcee-client and semantic-kernel like,
# !pip install arcee-python semantic-kernel

import arcee_python as arcee
import semantic_kernel as sk

# import dalms
from arcee_python.connectors.semantic_kernel import ArceeTextCompletion # dalm
# or
from semantic_kernel.connectors.ai.arcee_ai import ArceeTextCompletion # dalm

kernel = sk.Kernel()

# Prepare Arcee service using credentials stored in the `.env` file
api_key, org_id = arcee.settings_from_dot_env() # config
# or
api_key, org_id = sk.arceeai_settings_from_dot_env() # config

kernel.add_text_completion_service(
    "arcee", ArceeTextCompletion("DPT-PubMed-7b", api_key, org_id)
)

# Wrap your prompt in a function
prompt = kernel.create_semantic_function(
    """
    Can AI-driven music therapy contribute to the rehabilitation of patients with disorders of consciousness?
    """.strip()
)

# Run your prompt
print(prompt())
# => Based on the provided context, AI-driven music therapy has the potential to contribute to the rehabilitation of patients with disorders of consciousness. The use of AI agents in robotic therapy has already shown promising results in stroke rehabilitation, indicating that AI can assist in enhancing motor functions. Additionally, evidence-based neurorehabilitation interventions that incorporate principles of activity-dependent plasticity and motor learning have been developed, which can be further enhanced by AI-driven music therapy. However, it is important to note that the specific effectiveness and implementation of AI-driven music therapy in the rehabilitation of patients with disorders of consciousness would require further research and clinical trials.

A basic example usage for langchain

# install arcee-client and langchain like,
# !pip install arcee-python langchain

import arcee_python as arcee

# import dalms
from arcee_python.connectors.langchain import ArceeAI # dalm
# or
from langchain.llms import ArceeAI

# ===== use as single model =====
llm = ArceeAI("DPT-PubMed-7b", api_key=api_key, org_id=org_id)
prompt = "Can AI-driven music therapy contribute to the rehabilitation of patients with disorders of consciousness?"

print(llm(prompt))
# => Based on the provided context, AI-driven music therapy has the potential to contribute to the rehabilitation of patients with disorders of consciousness. The use of AI agents in robotic therapy has already shown promising results in stroke rehabilitation, indicating that AI can assist in enhancing motor functions. Additionally, evidence-based neurorehabilitation interventions that incorporate principles of activity-dependent plasticity and motor learning have been developed, which can be further enhanced by AI-driven music therapy. However, it is important to note that the specific effectiveness and implementation of AI-driven music therapy in the rehabilitation of patients with disorders of consciousness would require further research and clinical trials.

# ===== run in chain =====
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

prompt = PromptTemplate(
    input_variables=["disease"],
    template="Can AI-driven music therapy contribute to the rehabilitation of patients with {disease}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
print(chain.run("disorders of consciousness"))
# => Based on the provided context, AI-driven music therapy has the potential to contribute to the rehabilitation of patients with disorders of consciousness. The use of AI agents in robotic therapy has already shown promising results in stroke rehabilitation, indicating that AI can assist in enhancing motor functions. Additionally, evidence-based neurorehabilitation interventions that incorporate principles of activity-dependent plasticity and motor learning have been developed, which can be further enhanced by AI-driven music therapy. However, it is important to note that the specific effectiveness and implementation of AI-driven music therapy in the rehabilitation of patients with disorders of consciousness would require further research and clinical trials.

@Jacobsolawetz @Ben-Epstein your thoughts on this? 🤔

Resources:

semantic-kernel

Base connector client

langchain

Base model

set a user-agent header like arcee-py/{version} on all api requests

Hatchling backwards incompatibility issue

I'm hitting the same issue as described here: arcee-ai/DALM#86 while trying to pip install

Repo setup

typing
tests
lint/type check and test CI
auto deployment on release

Error uploading document(s): Memory Limit Exceeded

I tried uploading a 3 GB doc and got this error:

│ Error uploading document(s): Memory Limit Exceeded. When uploading context_data_6fab4be1-6939-4b95-bd93-09d87c23cba0.csv (3226.388792991638 MB). Try increasing chunk size.                                        │

Bugfix: pass filters as a dict with Arcee SDK

On passing filters as arguments for using arcee with Langchain, the filters need to be passed as a dict,

arcee = Arcee(
    model="DALM-PubMed",
    model_kwargs={
        "size": 10,  # The number of documents to inform the generation
        "filters": [
            {
                "field_name": "document",
                "filter_type": "fuzzy_search",
                "value": "neuroscience"
            }
        ]
    }
)

TypeError: Object of type DALMFilter is not JSON serializable

The above exception was the direct cause of the following exception:

Exception                                 Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/langchain/llms/arcee.py in _call(self, prompt, stop, run_manager, **kwargs)
    145             return self._client.generate(prompt=prompt, **kwargs)
    146         except Exception as e:
--> 147             raise Exception(f"Failed to generate text: {e}") from e
Exception: Failed to generate text: Object of type DALMFilter is not JSON serializable

Possible issues:

This is likely because the DALMFilter class does not have a method to convert its instances to a JSON serializable format.

Possible fixes:

Add handling to allow the dict to be passed in to a JSON serializable format.

class DALMFilter(BaseModel):
    """Filters available for a dalm retrieve/generation query

    Arguments:
        field_name: The field to filter on. Can be 'document' or 'name' to filter on your document's raw text or title
            Any other field will be presumed to be a metadata field you included when uploading your context data
        filter_type: Currently 'fuzzy_search' and 'strict_search' are supported. More to come soon!
            'fuzzy_search' means a fuzzy search on the provided field will be performed. The exact strict doesn't
            need to exist in the document for this to find a match. Very useful for scanning a document for some
            keyword terms
            'strict_search' means that the exact string must appear in the provided field. This is NOT an exact eq
            filter. ie a document with content "the happy dog crossed the street" will match on a strict_search of "dog"
            but won't match on "the dog". Python equivalent of `return search_string in full_string`
        value: The actual value to search for in the context data/metadata
    """

    field_name: str
    filter_type: FilterType
    value: str
    _is_metadata: bool = False

The issue mainly seems to be occurring due to how we're handling requests in make_request

def make_request(
    request: Literal["post", "get"],
    route: Union[str, Route],
    body: Optional[Dict[str, Any]] = None,
    params: Optional[Dict[str, Any]] = None,
    headers: Optional[Dict[str, Any]] = None,
) -> Dict[str, str]:
    """Makes the request"""
    headers = headers or {}
    internal_headers = {"X-Token": f"{config.ARCEE_API_KEY}", "Content-Type": "application/json"}
    headers.update(**internal_headers)
    url = f"{config.ARCEE_API_URL}/{config.ARCEE_API_VERSION}/{route}"

    req_type = getattr(requests, request)
    response = req_type(url, json=body, params=params, headers=headers)
    if response.status_code not in (200, 201):
        raise Exception(f"Failed to make request. Response: {response.text}")
    return response.json()

Train DALM status link innacurate

Takes you to the workspace page, needs org-name

eg. https://app.arcee.ai/jacob-solawetz/models/pubmed_500

is correct

https://app.arcee.ai/models/pubmed_500 is not

This will require a rest api change

Training Failed

When trying to train a new model (with a new context), I get the message: 'status': 'Training failed (RuntimeError)'

ImportError: cannot import name 'model_validator' from 'pydantic' (/usr/local/lib/python3.10/dist-packages/pydantic/init.py)

Seems like a pydantic issue in Google colab

enable selecting your org on any given request

User should be able to switch orgs on a given request. Add an optional org to each method

Method Not Allowed

Hey everyone,

I'm following the arcee-python instructions, but keep getting the error:
Exception: Failed to make request. Response: {"detail":"Method Not Allowed"}

I'm getting this message both in Python and via CLI. Any ideas what could be happening?

Thanks,
Michael

arcee-ai / arcee-python Goto Github PK

arcee-python's Introduction

Arcee Client Docs

Installation

Authenticating

Upload Context

Train DALM

DALM Generation

DALM Retrieval

Using the Arcee CLI

Upload Context

Train your DALM:

DALM Generation:

DALM Retrieval:

Contributing

Set up the repo

Format, lint, test

Publishing

To create a new release

Manual release [not recommended]

arcee-python's People

Contributors

Stargazers

Watchers

Forkers

arcee-python's Issues

What's proposed?

What's expected?

A basic example usage for Microsoft's semantic-kernel

A basic example usage for langchain

Resources:

semantic-kernel

langchain

Recommend Projects

Recommend Topics

Recommend Org