<a href="https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/03_QuestionA

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

inference configurations are invalid for BedrockEmbeddings models about amazon-bedrock-workshop HOT 3 CLOSED

aws-samples commented on May 29, 2024

inference configurations are invalid for BedrockEmbeddings models

from amazon-bedrock-workshop.

Comments (3)

HannaHUp commented on May 29, 2024

I have printed all the input text and feed it one by one to to embedding model. I found the the text that cause error. Please take a look:
here's my code,

from urllib.request import urlretrieve

os.makedirs("data", exist_ok=True)
files = [
    "https://www.irs.gov/pub/irs-pdf/p1544.pdf",
    "https://www.irs.gov/pub/irs-pdf/p15.pdf",
    "https://www.irs.gov/pub/irs-pdf/p1212.pdf",
    "https://www.irs.gov/pub/irs-pdf/p3.pdf",
    "https://www.irs.gov/pub/irs-pdf/p17.pdf",
    "https://www.irs.gov/pub/irs-pdf/p51.pdf",
    "https://www.irs.gov/pub/irs-pdf/p54.pdf",
]
for url in files:
    file_path = os.path.join("data", url.rpartition("/")[2])
    urlretrieve(url, file_path)

import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("./data/")

documents = loader.load()
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=1000, chunk_overlap=100
    
)
docs = text_splitter.split_documents(documents)

avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(documents)
avg_char_count_post = avg_doc_length(docs)
print(f'Average length among {len(documents)} documents loaded is {avg_char_count_pre} characters.')
print(f'After the split we have {len(docs)} documents more than the original {len(documents)}.')
print(f'Average length among {len(docs)} documents (after split) is {avg_char_count_post} characters.')

texts = [d.page_content for d in docs]
metadatas = [d.metadata for d in docs]
print(len(texts), len(metadatas))

def embedding_func(text: str):
# this function is  from langchain/embeddings/bedrock.py
    """Call out to Bedrock embedding endpoint."""
    # replace newlines, which can negatively affect performance.
    text = text.replace(os.linesep, " ")
    print("\n text", text)
    _model_kwargs = {}

    input_body = {**_model_kwargs, "inputText": text}
    **print("input_body", input_body)**
    body = json.dumps(input_body)

    try:
        response = boto3_bedrock.invoke_model(
            body=body,
            modelId="amazon.titan-e1t-medium",
            accept="application/json",
            contentType="application/json",
        )
        response_body = json.loads(response.get("body").read())
    except Exception as e:
            raise ValueError(f"Error raised by inference endpoint: {e}")
            
for text in texts:
    response = embedding_func(text)

and when the input text is "18,100 18,150 1,970 1,813 1,970 1,882 18,150 18,200 1,976 1,818 1,976 1,888 18,200 18,250 1,982 1,823 1,982 1,894 18,250 18,300 1,988 1,828 1,988 1,900 18,300 18,350 1,994 1,833 1,994 1,906 18,350 18,400 2,000 1,838 2,000 1,912 18,400 18,450 2,006 1,843 2,006 1,918 18,450 18,500 2,012 1,848 2,012 1,924 18,500 18,550 2,018 1,853 2,018 1,930 18,550 18,600 2,024 1,858 2,024 1,936 18,600 18,650 2,030 1,863 2,030 1,942 18,650 18,700 2,036 1,868 2,036 1,948 18,700 18,750 2,042 1,873 2,042 1,954 18,750 18,800 2,048 1,878 2,048 1,960 18,800 18,850 2,054 1,883 2,054 1,966 18,850 18,900 2,060 1,888 2,060 1,972 18,900 18,950 2,066 1,893 2,066 1,978 18,950 19,000 2,072 1,898 2,072 1,984 19,000 19,000 19,050 2,078 1,903 2,078 1,990 19,050 19,100 2,084 1,908 2,084 1,996 19,100 19,150 2,090 1,913 2,090 2,002 19,150 19,200 2,096 1,918 2,096 2,008 19,200 19,250 2,102 1,923 2,102 2,014 19,250 19,300 2,108 1,928 2,108 2,020 19,300 19,350 2,114 1,933 2,114 2,026 19,350 19,400 2,120 1,938 2,120 2,032"
embedding model give me ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid

I belive you can reproduce it. Thank you

from amazon-bedrock-workshop.

HannaHUp commented on May 29, 2024

You can reproduce the error simply by replace your data preparation code with these
`from urllib.request import urlretrieve

os.makedirs("data_17", exist_ok=True)
files = [

"https://www.irs.gov/pub/irs-pdf/p17.pdf"

]
for url in files:
file_path = os.path.join("data_17", url.rpartition("/")[2])
urlretrieve(url, file_path)`

Then run all the other code in noetebook

from amazon-bedrock-workshop.

lauerarnaud commented on May 29, 2024

@HannaHUp hello, I run the same list of PDFs, and chunk size = 1000 and chunk overlap = 100.

from urllib.request import urlretrieve

os.makedirs("data", exist_ok=True)
files = [
    "https://www.irs.gov/pub/irs-pdf/p1544.pdf",
    "https://www.irs.gov/pub/irs-pdf/p15.pdf",
    "https://www.irs.gov/pub/irs-pdf/p1212.pdf",
    "https://www.irs.gov/pub/irs-pdf/p3.pdf",
    "https://www.irs.gov/pub/irs-pdf/p17.pdf",
    "https://www.irs.gov/pub/irs-pdf/p51.pdf",
    "https://www.irs.gov/pub/irs-pdf/p54.pdf",
]
for url in files:
    file_path = os.path.join("data", url.rpartition("/")[2])
    urlretrieve(url, file_path)

Got this as the split.

Average length among 314 documents loaded is 6397 characters.
After the split we have 2351 documents more than the original 314.
Average length among 2351 documents (after split) is 920 characters.
We had 3 PDF documents which have been split into smaller ~500 chunks.

It worked fine for me. Would you be able to try with the latest Titan Embeddings model that was released last week at General Availability of the service with latest workshop code: "amazon.titan-embed-text-v1" and see if you get the same problem.

from amazon-bedrock-workshop.

inference configurations are invalid for BedrockEmbeddings models about amazon-bedrock-workshop HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent