Coder Social home page Coder Social logo

azure / azure-search-vector-samples Goto Github PK

View Code? Open in Web Editor NEW
715.0 272.0 297.0 227.09 MB

A repository of code samples for Vector search capabilities in Azure AI Search.

Home Page: https://azure.microsoft.com/products/search

License: MIT License

JavaScript 0.26% Jupyter Notebook 87.60% C# 1.12% Python 2.61% Bicep 7.34% PowerShell 0.16% Java 0.90%
azure azurecognitivesearch embeddings vector vector-search

azure-search-vector-samples's Introduction

Vector samples - Azure AI Search

This repository provides Python, C#, REST, and JavaScript code samples for vector support in Azure AI Search.

There are breaking changes from REST API version 2023-07-01-Preview to newer API versions. These breaking changes also apply to the Azure SDK beta packages targeting that REST API version. See Upgrade REST APIs for migration guidance.

Feature status

Vector support consists of generally available features and preview features.

Feature Status
vector indexing generally available (2023-11-01 and stable SDK packages)
vector queries generally available (2023-11-01 and stable SDK packages)
integrated data chunking public preview (2023-10-01-preview and later, plus beta SDK packages)
integrated embedding public preview (2023-10-01-preview and later, plus beta SDK packages)
index projections public preview (2023-10-01-preview and later, plus beta SDK packages)
vectorizers public preview (2023-10-01-preview and later, plus beta SDK packages)
scalar quantization public preview (2024-03-01-preview and later, plus beta SDK packages)
OneLake indexer public preview (2024-05-01-preview, plus beta SDK packages)
Binary vectors support public preview (2024-05-01-preview, plus beta SDK packages)

Preview features are available under Supplemental Terms of Use.

demo-python samples

Sample Description Status
demo-python readme A growing collection of notebooks that demonstrate aspects of vector search support, including data chunking and embedding of both text and image content and queries, using a variety of frameworks and techniques. GA and preview

demo-dotnet samples

Sample Description Status
DotNetVectorDemo A .NET console app that calls Azure OpenAI to vectorize data. It then calls Azure AI Search to create, load, and query vector data. Generally available (GA)
DotNetIntegratedVectorizationDemo A .NET console app that calls Azure AI Search to create an index, indexer, data source, and skillset. An Azure Storage account provides the data. Azure OpenAI is called by the skillset during indexing, and again during query execution to vectorize text queries. Public preview
QuantizationAndStorageOptions A .NET console app that demonstrates narrow data types and built-in scalar quantization, reducing vector index size in memory and on disk. It also disables storage of vectors returned in query response, which you don't need if you're not returning vectors in a query. Public preview

demo-java samples

Sample Description Status
demo-vectors A Java console app that calls Azure OpenAI to vectorize data. It then calls Azure AI Search to create, load, and query vector data. GA
demo-integrated-vectorization A Java console app that calls Azure AI Search to create an index, indexer, data source, and skillset. An Azure Storage account provides the data. Azure OpenAI is called by the skillset during indexing, and again during query execution to vectorize text queries. GA and preview

demo-javascript samples

Sample Description Status
JavaScriptVectorDemo A single folder contains three code samples. The azure-search-vector-sample.js script calls just Azure OpenAI and is used to generate embeddings for fields in an index. The docs-text-openai-embeddings.js program is an end-to-end code sample that calls Azure OpenAI for embeddings and Azure AI Seach to create, load, and query an index that contains vectors. The query-text-openai-embeddings.js script generates an embedding for a vector query. GA and preview

Other vector samples and tools

  • azure-ai-search-lab A learning and experimentation lab for trying out various AI-enabled search scenarios in Azure. It includes web application front-end which uses Azure AI Search and Azure OpenAI to execute searches with a variety of options - ranging from simple keyword search, to semantic ranking, vector and hybrid search, and using generative AI to answer search queries in various ways. This allows you to quickly understand what each option does, how it affects the search results, and how various approaches compare against each other.
  • chat-with-your-data-solution-accelerator A template that deploys multiple Azure resources for a custom chat-with-your-data solution. Use this accelerator to create a production-ready solution that implements coding best practices.
  • Azure Search OpenAI Demo A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. Use the "vectors" branch to leverage Vector retrieval.
  • Azure Search OpenAI Demo - C# A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences using C#.
  • Azure OpenAI Embeddings QnA with Azure Search as a Vector Store (github.com) A simple web application for a OpenAI-enabled document search. This repo uses Azure OpenAI Service for creating embeddings vectors from documents. For answering the question of a user, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
  • ChatGPT Retreival Plugin Azure Search Vector Database The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language. Azure AI Search now supported as an official vector database.
  • Azure Search Vector Search Demo Web App Template A Vector Search Demo React Web App Template using Azure OpenAI for Text Search and Cognitive Services Florence Vision API for Image Search.
  • Azure Cognitive Search Comparison Tool

Documentation

azure-search-vector-samples's People

Contributors

aditmer avatar alexmanie avatar arv100kri avatar chuwik avatar dependabot[bot] avatar eltociear avatar eric-urban avatar farzad528 avatar fbaroni avatar fiddi avatar finnless avatar gmndrg avatar heidisteen avatar hyoshioka0128 avatar konabuta avatar mattgotteiner avatar microsoft-github-operations[bot] avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar s0uravjain avatar satwikide avatar yahnoosh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-search-vector-samples's Issues

Python example does not work due to package issue?

I used "pip install azure-search-documents --pre"
However, the function does not have the parameters mentioned in the example.

results = client.search(
search_text=None,
vector=generate_embeddings(text),
top_k=3,
vector_fields="gpt_embedding",
select=["title", "content", "category"],
)
#results = client.search(query_type="full",search_text="attestation",top= "3",search_fields = ["filename","content"])
print("\n result based on gpt_embedding: \n")

Error: request() got an unexpected keyword argument 'vector'.

Anyone success in python version example?

Need package version to use in .NET

I am a .NET guy and wanted to know the Azure.Search.Documents package version to install. 11.5.0-beta.2 package doesn't seem to have the classes/objects required for the vector implementation. My company is currently weighing various vector database options, so I'm eager to rapidly prototype a solution using Azure's Cognitive Search vector persistence feature. Can you provide me with the appropriate version of the Azure.Search.Documents package, or suggest any other potential solutions?

An 11.5.* version of the sdk does not appear to exist.

I'm attempting to follow the guide here: https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-python/readme.md however it doesn't appear that there is an 11.5.0-alpha.20230522.2 version available in the dev package index. Latest version appears to be 11.4.0b3

pip install azure-search-documents==11.5.0-alpha.20230522.2 --index-url=https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple
Looking in indexes: https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple
ERROR: Could not find a version that satisfies the requirement azure-search-documents==11.5.0-alpha.20230522.2 (from versions: 1.0.0b2.dev20200319001, 1.0.0b2.dev20200320001, 1.0.0b2.dev20200321001, 1.0.0b2.dev20200322001, 1.0.0b2.dev20200323001, 1.0.0b2.dev20200324001, 1.0.0b2.dev20200325001, 1.0.0b2.dev20200327001, 1.0.0b2.dev20200328001, 1.0.0b2.dev20200329001, 1.0.0b2.dev20200330001, 1.0.0b2.dev20200331001, 1.0.0b2.dev20200401001, 1.0.0b2.dev20200402001, 1.0.0b2.dev20200403001, 1.0.0b2.dev20200404001, 1.0.0b2.dev20200405001, 1.0.0b2.dev20200406001, 1.0.0b2.dev20200407001, 1.0.0b3.dev20200408001, 1.0.0b3.dev20200409001, 1.0.0b3.dev20200410001, 1.0.0b3.dev20200411001, 1.0.0b3.dev20200412001, 1.0.0b3.dev20200413001, 1.0.0b3.dev20200414001, 1.0.0b3.dev20200415001, 1.0.0b3.dev20200416001, 1.0.0b3.dev20200417001, 1.0.0b3.dev20200418001, 1.0.0b3.dev20200419001, 1.0.0b3.dev20200420001, 1.0.0b3.dev20200421001, 1.0.0b3.dev20200422001, 1.0.0b3.dev20200423001, 1.0.0b3.dev20200424001, 1.0.0b3.dev20200427001, 1.0.0b3.dev20200428001, 1.0.0b3.dev20200429001, 1.0.0b3.dev20200430001, 1.0.0b3.dev20200501001, 1.0.0b3.dev20200507001, 1.0.0b3.dev20200508001, 1.0.0b3.dev20200511001, 1.0.0b3.dev20200512001, 1.0.0b3.dev20200513001, 1.0.0b3.dev20200514001, 1.0.0b3.dev20200515001, 1.0.0b3.dev20200518001, 1.0.0b3.dev20200519001, 1.0.0b3.dev20200520001, 1.0.0b3.dev20200521001, 1.0.0b4.dev20200522001, 1.0.0b4.dev20200525001, 1.0.0b4.dev20200526001, 1.0.0b4.dev20200527001, 1.0.0b4.dev20200528001, 1.0.0b4.dev20200529001, 1.0.0b4.dev20200601001, 1.0.0b4.dev20200602001, 1.0.0b4.dev20200603001, 1.0.0b4.dev20200604001, 1.0.0b4.dev20200605001, 1.0.0b4.dev20200608001, 1.0.0b4.dev20200609001, 1.0.0b4.dev20200615001, 1.0.0b4.dev20200616001, 1.0.0b4.dev20200617001, 1.0.0b4.dev20200618001, 1.0.0b4.dev20200619001, 1.0.0b4.dev20200624001, 1.0.0b4.dev20200625001, 1.0.0b4, 11.0.0.dev20200626001, 11.0.0.dev20200630001, 11.0.0.dev20200701001, 11.0.0.dev20200702001, 11.0.0.dev20200703001, 11.0.0.dev20200707001, 11.0.0, 11.0.1.dev20200708001, 11.0.1.dev20200709001, 11.0.1.dev20200710001, 11.0.1.dev20200713001, 11.0.1.dev20200714001, 11.0.1.dev20200717001, 11.0.1.dev20200720001, 11.0.1.dev20200721001, 11.0.1.dev20200722001, 11.0.1.dev20200723001, 11.0.1.dev20200724001, 11.0.1.dev20200727001, 11.0.1.dev20200728001, 11.0.1.dev20200729001, 11.0.1.dev20200730001, 11.0.1.dev20200731001, 11.0.1.dev20200803001, 11.0.1.dev20200804001, 11.0.1.dev20200805001, 11.0.1.dev20200806001, 11.0.1.dev20200807001, 11.1.0a20200904001, 11.1.0a20200907001, 11.1.0a20200909001, 11.1.0a20200910001, 11.1.0a20200911001, 11.1.0a20200914001, 11.1.0a20200915001, 11.1.0a20200916001, 11.1.0a20200917001, 11.1.0a20200921001, 11.1.0a20200922001, 11.1.0a20200923001, 11.1.0a20200924001, 11.1.0a20200925001, 11.1.0a20200928001, 11.1.0a20200929001, 11.1.0a20200930002, 11.1.0a20201001001, 11.1.0a20201002001, 11.1.0a20201006001, 11.1.0a20201007001, 11.1.0a20201008001, 11.1.0a20201009001, 11.1.0a20201012001, 11.1.0a20201013001, 11.1.0a20201014001, 11.1.0a20201016001, 11.1.0a20201019001, 11.1.0a20201020001, 11.1.0a20201021001, 11.1.0a20201022001, 11.1.0a20201023001, 11.1.0a20201026001, 11.1.0a20201027001, 11.1.0a20201028001, 11.1.0a20201029001, 11.1.0a20201030001, 11.1.0a20201104001, 11.1.0a20201105001, 11.1.0a20201106001, 11.1.0a20201109001, 11.1.0a20201110001, 11.1.0a20201111001, 11.1.0a20201112001, 11.1.0a20201113001, 11.1.0a20201116001, 11.1.0a20201117001, 11.1.0a20201118001, 11.1.0a20201120001, 11.1.0a20201123001, 11.1.0a20201124001, 11.1.0a20201125001, 11.1.0a20201126001, 11.1.0a20201201001, 11.1.0a20201202001, 11.1.0a20201203001, 11.1.0a20201204001, 11.1.0a20201207001, 11.1.0a20201208001, 11.1.0a20201209001, 11.1.0a20201210001, 11.1.0a20201211001, 11.1.0a20201214001, 11.1.0a20201215001, 11.1.0a20201216001, 11.1.0a20201217001, 11.1.0a20201218001, 11.1.0a20201221001, 11.1.0a20201222001, 11.1.0a20201223001, 11.1.0a20201224001, 11.1.0a20201230001, 11.1.0a20201231001, 11.1.0a20210101001, 11.1.0a20210104001, 11.1.0a20210105001, 11.1.0a20210106001, 11.1.0a20210107001, 11.1.0a20210108001, 11.1.0a20210111001, 11.1.0a20210118001, 11.1.0a20210119001, 11.1.0a20210120001, 11.1.0a20210121001, 11.1.0a20210122001, 11.1.0a20210125001, 11.1.0a20210126001, 11.1.0a20210127001, 11.1.0a20210128001, 11.1.0a20210129001, 11.1.0a20210201001, 11.1.0a20210202001, 11.1.0a20210203001, 11.1.0a20210204001, 11.1.0a20210205001, 11.1.0a20210208001, 11.1.0a20210209001, 11.1.0a20210210001, 11.1.0b1.dev20200810001, 11.1.0b1.dev20200811001, 11.1.0b1, 11.1.0b2.dev20200812001, 11.1.0b2.dev20200813001, 11.1.0b2.dev20200814001, 11.1.0b2.dev20200818001, 11.1.0b2.dev20200819001, 11.1.0b2.dev20200820001, 11.1.0b2.dev20200821001, 11.1.0b2.dev20200824001, 11.1.0b2.dev20200825001, 11.1.0b2.dev20200826001, 11.1.0b2.dev20200827001, 11.1.0b2.dev20200828001, 11.1.0b2.dev20200831001, 11.1.0b2.dev20200901001, 11.1.0b2.dev20200902001, 11.1.0b2.dev20200903001, 11.1.0b2, 11.1.0b3, 11.1.0b4, 11.1.0, 11.1.1a20210211001, 11.1.1a20210212001, 11.1.1a20210215001, 11.1.1a20210216001, 11.1.1a20210217001, 11.1.1a20210218001, 11.1.1a20210219001, 11.1.1a20210222001, 11.1.1a20210223001, 11.1.1a20210224001, 11.1.1a20210301001, 11.1.1a20210302001, 11.1.1a20210303001, 11.1.1a20210304001, 11.1.1a20210305001, 11.1.1a20210308001, 11.1.1a20210309001, 11.1.1a20210310001, 11.1.1a20210311001, 11.1.1a20210312001, 11.1.1a20210315001, 11.2.0a20210331001, 11.2.0a20210401001, 11.2.0a20210402001, 11.2.0a20210405001, 11.2.0a20210406001, 11.2.0a20210407001, 11.2.0a20210408001, 11.2.0a20210409001, 11.2.0a20210412001, 11.2.0a20210426001, 11.2.0a20210427001, 11.2.0a20210428001, 11.2.0a20210429001, 11.2.0a20210430001, 11.2.0a20210503001, 11.2.0a20210504001, 11.2.0a20210505001, 11.2.0a20210506001, 11.2.0a20210507001, 11.2.0a20210510001, 11.2.0a20210511001, 11.2.0a20210512001, 11.2.0a20210513001, 11.2.0a20210514001, 11.2.0a20210517001, 11.2.0a20210518001, 11.2.0a20210519001, 11.2.0a20210520001, 11.2.0a20210521001, 11.2.0a20210524001, 11.2.0a20210525001, 11.2.0a20210526001, 11.2.0a20210527001, 11.2.0a20210528001, 11.2.0a20210531001, 11.2.0a20210601001, 11.2.0a20210602001, 11.2.0a20210603001, 11.2.0a20210604001, 11.2.0a20210607001, 11.2.0a20210608001, 11.2.0b1, 11.2.0b2, 11.2.0b3, 11.2.0, 11.2.1a20210609001, 11.2.1, 11.2.2, 11.3.0a20210610001, 11.3.0a20210611001, 11.3.0a20210614001, 11.3.0a20210615001, 11.3.0a20210616001, 11.3.0a20210618001, 11.3.0a20210621001, 11.3.0a20210622001, 11.3.0a20210623001, 11.3.0a20210624001, 11.3.0a20210625001, 11.3.0a20210628001, 11.3.0a20210629001, 11.3.0a20210630001, 11.3.0a20210701001, 11.3.0a20210702001, 11.3.0a20210705001, 11.3.0a20210706001, 11.3.0a20210707001, 11.3.0a20210708001, 11.3.0a20210709001, 11.3.0a20210712001, 11.3.0a20210713001, 11.3.0a20210714001, 11.3.0a20210715001, 11.3.0a20210716001, 11.3.0a20210719001, 11.3.0a20210720001, 11.3.0a20210721001, 11.3.0a20210722001, 11.3.0a20210723001, 11.3.0a20210726001, 11.3.0a20210727001, 11.3.0a20210728001, 11.3.0a20210729001, 11.3.0a20210730001, 11.3.0a20210802001, 11.3.0a20210803001, 11.3.0a20210804001, 11.3.0a20210805001, 11.3.0a20210806001, 11.3.0a20210809001, 11.3.0a20210810001, 11.3.0a20210811001, 11.3.0a20210812001, 11.3.0a20210813001, 11.3.0a20210816001, 11.3.0a20210817001, 11.3.0a20210818001, 11.3.0a20210819001, 11.3.0a20210820001, 11.3.0a20210823001, 11.3.0a20210824001, 11.3.0a20210825001, 11.3.0a20210826001, 11.3.0a20210827001, 11.3.0a20210830001, 11.3.0a20210831001, 11.3.0a20210901001, 11.3.0a20210902001, 11.3.0a20210903001, 11.3.0a20210906001, 11.3.0a20210908001, 11.3.0a20210909001, 11.3.0a20210910001, 11.3.0a20210913001, 11.3.0a20210914001, 11.3.0a20210915001, 11.3.0a20210916001, 11.3.0a20210917001, 11.3.0a20210920001, 11.3.0a20210921001, 11.3.0a20210922001, 11.3.0a20210923001, 11.3.0a20210924001, 11.3.0a20210927001, 11.3.0a20210928001, 11.3.0a20210929001, 11.3.0a20210930001, 11.3.0a20211001001, 11.3.0a20211004001, 11.3.0a20211005001, 11.3.0a20211006001, 11.3.0a20211008001, 11.3.0a20211011001, 11.3.0a20211012001, 11.3.0a20211013001, 11.3.0a20211014001, 11.3.0a20211015001, 11.3.0a20211018001, 11.3.0a20211019001, 11.3.0a20211020001, 11.3.0a20211021001, 11.3.0a20211025001, 11.3.0a20211026001, 11.3.0a20211027001, 11.3.0a20211028001, 11.3.0a20211029001, 11.3.0a20211101001, 11.3.0a20211102001, 11.3.0a20211103001, 11.3.0a20211104001, 11.3.0a20211105001, 11.3.0a20211108001, 11.3.0a20211109001, 11.3.0a20211110001, 11.3.0a20211111001, 11.3.0a20211112001, 11.3.0a20211115001, 11.3.0a20211116001, 11.3.0a20211117001, 11.3.0a20211118001, 11.3.0a20211119001, 11.3.0a20211122001, 11.3.0a20211123001, 11.3.0a20211124001, 11.3.0a20211125001, 11.3.0a20211126001, 11.3.0a20211129001, 11.3.0a20211130001, 11.3.0a20211201001, 11.3.0a20211202001, 11.3.0a20211203001, 11.3.0a20211207001, 11.3.0a20211208001, 11.3.0a20211209001, 11.3.0a20211210001, 11.3.0a20211213001, 11.3.0a20211214001, 11.3.0a20211215001, 11.3.0a20211216001, 11.3.0a20211217001, 11.3.0a20211220001, 11.3.0a20211221001, 11.3.0a20211222001, 11.3.0a20211223001, 11.3.0a20211228001, 11.3.0a20211230001, 11.3.0a20220104001, 11.3.0a20220105001, 11.3.0a20220106001, 11.3.0a20220107001, 11.3.0a20220110001, 11.3.0a20220111002, 11.3.0a20220112001, 11.3.0a20220113001, 11.3.0a20220114002, 11.3.0a20220117001, 11.3.0a20220118001, 11.3.0a20220119001, 11.3.0a20220120001, 11.3.0a20220121001, 11.3.0a20220124001, 11.3.0a20220125001, 11.3.0a20220126001, 11.3.0a20220127001, 11.3.0a20220128001, 11.3.0a20220131001, 11.3.0a20220201001, 11.3.0a20220202001, 11.3.0a20220203001, 11.3.0a20220204001, 11.3.0a20220207001, 11.3.0a20220208001, 11.3.0a20220209001, 11.3.0a20220210001, 11.3.0a20220211001, 11.3.0a20220214001, 11.3.0a20220215001, 11.3.0a20220216001, 11.3.0a20220217001, 11.3.0a20220221001, 11.3.0a20220222001, 11.3.0a20220223001, 11.3.0a20220224001, 11.3.0a20220225001, 11.3.0a20220228001, 11.3.0a20220301001, 11.3.0a20220302001, 11.3.0a20220303001, 11.3.0a20220304001, 11.3.0a20220307001, 11.3.0a20220308001, 11.3.0a20220309001, 11.3.0a20220310001, 11.3.0a20220311001, 11.3.0a20220314001, 11.3.0a20220315001, 11.3.0a20220316001, 11.3.0a20220317001, 11.3.0a20220318001, 11.3.0a20220321001, 11.3.0a20220322001, 11.3.0a20220323001, 11.3.0a20220324001, 11.3.0a20220325001, 11.3.0a20220328001, 11.3.0a20220329001, 11.3.0a20220330001, 11.3.0a20220331001, 11.3.0a20220401001, 11.3.0a20220404001, 11.3.0a20220405001, 11.3.0a20220406001, 11.3.0a20220407001, 11.3.0a20220408001, 11.3.0a20220411001, 11.3.0a20220412001, 11.3.0a20220413001, 11.3.0a20220414001, 11.3.0a20220415001, 11.3.0a20220418001, 11.3.0a20220419001, 11.3.0a20220420001, 11.3.0a20220421001, 11.3.0a20220422001, 11.3.0a20220425001, 11.3.0a20220426001, 11.3.0a20220427001, 11.3.0a20220428001, 11.3.0a20220429001, 11.3.0a20220502001, 11.3.0a20220503001, 11.3.0a20220504001, 11.3.0a20220505001, 11.3.0a20220506001, 11.3.0a20220509001, 11.3.0a20220510001, 11.3.0a20220511001, 11.3.0a20220512001, 11.3.0a20220513001, 11.3.0a20220516001, 11.3.0a20220517001, 11.3.0a20220518001, 11.3.0a20220519001, 11.3.0a20220520001, 11.3.0a20220523001, 11.3.0a20220524001, 11.3.0a20220525001, 11.3.0a20220526001, 11.3.0a20220527001, 11.3.0a20220530001, 11.3.0a20220531001, 11.3.0a20220601001, 11.3.0a20220602001, 11.3.0a20220603001, 11.3.0a20220606001, 11.3.0a20220607001, 11.3.0a20220608001, 11.3.0a20220609001, 11.3.0a20220610001, 11.3.0a20220613001, 11.3.0a20220614001, 11.3.0a20220615001, 11.3.0a20220616001, 11.3.0a20220617001, 11.3.0a20220620001, 11.3.0a20220621001, 11.3.0a20220622001, 11.3.0a20220623001, 11.3.0a20220624001, 11.3.0a20220627001, 11.3.0a20220628001, 11.3.0a20220629001, 11.3.0a20220630001, 11.3.0a20220701001, 11.3.0a20220704001, 11.3.0a20220706001, 11.3.0a20220707001, 11.3.0a20220708001, 11.3.0a20220711001, 11.3.0a20220712001, 11.3.0a20220713001, 11.3.0a20220714001, 11.3.0a20220715001, 11.3.0a20220718001, 11.3.0a20220719001, 11.3.0a20220720001, 11.3.0a20220721001, 11.3.0a20220722001, 11.3.0a20220725001, 11.3.0a20220726001, 11.3.0a20220727001, 11.3.0a20220728001, 11.3.0a20220729001, 11.3.0a20220801001, 11.3.0a20220802003, 11.3.0a20220803001, 11.3.0a20220804001, 11.3.0a20220805001, 11.3.0a20220808001, 11.3.0a20220809001, 11.3.0a20220810001, 11.3.0a20220811001, 11.3.0a20220812001, 11.3.0a20220815001, 11.3.0a20220816001, 11.3.0a20220817001, 11.3.0a20220818001, 11.3.0a20220819001, 11.3.0a20220822001, 11.3.0a20220823001, 11.3.0a20220824001, 11.3.0a20220825001, 11.3.0a20220826001, 11.3.0a20220829001, 11.3.0a20220830001, 11.3.0a20220831001, 11.3.0a20220901001, 11.3.0a20220902001, 11.3.0a20220905001, 11.3.0a20220906001, 11.3.0a20220907001, 11.3.0a20220908001, 11.3.0b1, 11.3.0b2, 11.3.0b3, 11.3.0b4, 11.3.0b5, 11.3.0b6, 11.3.0b7, 11.3.0b8, 11.3.0, 11.4.0a20220909001, 11.4.0a20220912001, 11.4.0a20220913001, 11.4.0a20220914001, 11.4.0a20220915001, 11.4.0a20220916001, 11.4.0a20220919001, 11.4.0a20220920001, 11.4.0a20220921001, 11.4.0a20220922001, 11.4.0a20220923001, 11.4.0a20220926001, 11.4.0a20220927001, 11.4.0a20220928001, 11.4.0a20220929001, 11.4.0a20220930001, 11.4.0a20221003001, 11.4.0a20221004001, 11.4.0a20221005001, 11.4.0a20221006001, 11.4.0a20221007001, 11.4.0a20221010001, 11.4.0a20221011001, 11.4.0a20221012001, 11.4.0a20221013001, 11.4.0a20221014001, 11.4.0a20221017001, 11.4.0a20221018001, 11.4.0a20221019001, 11.4.0a20221020001, 11.4.0a20221021001, 11.4.0a20221024001, 11.4.0a20221025001, 11.4.0a20221026001, 11.4.0a20221027001, 11.4.0a20221028001, 11.4.0a20221031001, 11.4.0a20221101001, 11.4.0a20221102001, 11.4.0a20221103001, 11.4.0a20221104001, 11.4.0a20221107001, 11.4.0a20221108001, 11.4.0a20221109001, 11.4.0a20221110001, 11.4.0a20221111001, 11.4.0a20221114001, 11.4.0a20221115001, 11.4.0a20221116001, 11.4.0a20221117001, 11.4.0a20221118001, 11.4.0a20221121001, 11.4.0a20221122001, 11.4.0a20221123001, 11.4.0a20221124001, 11.4.0a20221125001, 11.4.0a20221128001, 11.4.0a20221129001, 11.4.0a20221130001, 11.4.0a20221201001, 11.4.0a20221202001, 11.4.0a20221205001, 11.4.0a20221206001, 11.4.0a20221207001, 11.4.0a20221208001, 11.4.0a20221209001, 11.4.0a20221212001, 11.4.0a20221213001, 11.4.0a20221214001, 11.4.0a20221215001, 11.4.0a20221216001, 11.4.0a20221219001, 11.4.0a20221220001, 11.4.0a20221221001, 11.4.0a20221222001, 11.4.0a20221223001, 11.4.0a20221226001, 11.4.0a20221227001, 11.4.0a20221228001, 11.4.0a20221229001, 11.4.0a20221230001, 11.4.0a20230102001, 11.4.0a20230103001, 11.4.0a20230104001, 11.4.0a20230105001, 11.4.0a20230106001, 11.4.0a20230109001, 11.4.0a20230110001, 11.4.0a20230111001, 11.4.0a20230112001, 11.4.0a20230113001, 11.4.0a20230116001, 11.4.0a20230117001, 11.4.0a20230118001, 11.4.0a20230119001, 11.4.0a20230120001, 11.4.0a20230123001, 11.4.0a20230124001, 11.4.0a20230125001, 11.4.0a20230126001, 11.4.0a20230127001, 11.4.0a20230130001, 11.4.0a20230131001, 11.4.0a20230201001, 11.4.0a20230202001, 11.4.0a20230203001, 11.4.0a20230206001, 11.4.0a20230207001, 11.4.0a20230208001, 11.4.0a20230209001, 11.4.0a20230210001, 11.4.0a20230213001, 11.4.0a20230214001, 11.4.0a20230215001, 11.4.0a20230216001, 11.4.0a20230217001, 11.4.0a20230220001, 11.4.0a20230221001, 11.4.0a20230222001, 11.4.0a20230223001, 11.4.0a20230224001, 11.4.0a20230227001, 11.4.0a20230228001, 11.4.0a20230301001, 11.4.0a20230302001, 11.4.0a20230303001, 11.4.0a20230306001, 11.4.0a20230307001, 11.4.0a20230308001, 11.4.0a20230309001, 11.4.0a20230310001, 11.4.0a20230313001, 11.4.0a20230314001, 11.4.0a20230315001, 11.4.0a20230316001, 11.4.0a20230317001, 11.4.0a20230320001, 11.4.0a20230321001, 11.4.0a20230322001, 11.4.0a20230323001, 11.4.0a20230324001, 11.4.0a20230328001, 11.4.0a20230329001, 11.4.0a20230330001, 11.4.0a20230331001, 11.4.0a20230403001, 11.4.0a20230404001, 11.4.0a20230405001, 11.4.0a20230406001, 11.4.0a20230407001, 11.4.0a20230410001, 11.4.0a20230411001, 11.4.0a20230412001, 11.4.0a20230413001, 11.4.0a20230414001, 11.4.0a20230417001, 11.4.0a20230418001, 11.4.0a20230419001, 11.4.0a20230420001, 11.4.0a20230421001, 11.4.0a20230424001, 11.4.0a20230425001, 11.4.0a20230426001, 11.4.0a20230427001, 11.4.0a20230428001, 11.4.0a20230502001, 11.4.0a20230503001, 11.4.0a20230504001, 11.4.0a20230505001, 11.4.0a20230508001, 11.4.0a20230508002, 11.4.0a20230508003, 11.4.0a20230509001, 11.4.0a20230509004, 11.4.0a20230510001, 11.4.0a20230511001, 11.4.0a20230512001, 11.4.0a20230515001, 11.4.0a20230516001, 11.4.0a20230517001, 11.4.0a20230518001, 11.4.0a20230519001, 11.4.0a20230522001, 11.4.0a20230523001, 11.4.0a20230524001, 11.4.0a20230525001, 11.4.0a20230526001, 11.4.0a20230529001, 11.4.0a20230530001, 11.4.0a20230531001, 11.4.0a20230601001, 11.4.0b1, 11.4.0b2, 11.4.0b3)
ERROR: No matching distribution found for azure-search-documents==11.5.0-alpha.20230522.2

Similarity search title and vector

The current Python demo creates embeddings for both the title and content, but only uses the content vector for searching.

https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-python/code/azure-search-vector-python-sample.ipynb

Logically looking at the code the field fields it suggests that it would be possible to search multiple vectors. This however does not seem to be the case, and it returns an error if you attempt to pass a comma separated list of values. Will this be something that could be supported in the future, or does a solution exist today?

KNearestNeighborsCount - odd ranking

I'm executing a vector search against Azure Cognitive Search, using the (currently) latest version of the Azure.Search.Documents Nuget package (11.5.0-beta4).

If I execute a search with KNearestNeighborsCount set to 3, the closest match (in my opinion) is returned 3rd in the list.

However, if I do the same search with KNearestNeighborsCount set to 10 (I want to see more search results), the match I mention above is returned 8th in the list.

This doesn't make sense to me. I would have thought the match would always appear in the same position, for any value of KNearestNeighborsCount >= 3.

Q. How do I return 10 results, but have my closest match appear at position 3?

var vector = new SearchQueryVector { KNearestNeighborsCount = numNearestNeighbours, Fields = { nameof(DocumentModel.vector1) }, Value = embeddings };

var searchOptions = new SearchOptions
            {
                Vectors = { vector },
                Size = numNearestNeighbours,
                Select = { idFieldName },
};

 SearchResults<SearchDocument> response = searchClient.Search<SearchDocument>(null, searchOptions);

 var result = new List<string>();

foreach (SearchResult<SearchDocument> searchResult in response.GetResults())
            {
                var documentId = $"{searchResult.Document[idFieldName]}";
                result.Add(documentId);
}

customskill endpoint 404 error but local test the endpoint is working(without add to customskill)

Hi I followed demo-python/code/azure-search-vector-image-python-sample.ipynb to create a customskill:

        customSkill_endpoint = "https://{myresource}.cognitiveservices.azure.com/computervision/retrieval:vectorizeImage"
        skillset_name = f"{index_name}-skillset"  
        skill_uri = customSkill_endpoint
          
        skill = WebApiSkill(  
            uri=skill_uri,  
            inputs=[  
                InputFieldMappingEntry(name="imageUrl", source="/document/metadata_storage_path"),  
                InputFieldMappingEntry(name="recordId", source="/document/metadata_storage_name")  
            ],  
            outputs=[OutputFieldMappingEntry(name="vector", target_name="imageVector")],  
        )  
          
        skillset = SearchIndexerSkillset(  
            name=skillset_name,  
            description="Skillset to extract image vector",  
            skills=[skill],  
        )  
          
        client = SearchIndexerClient(service_endpoint, AzureKeyCredential(key))  
        client.create_or_update_skillset(skillset)  
        print(f' {skillset.name} created')  
        
        index_client = SearchIndexClient(
            endpoint=service_endpoint, credential=credential)
        fields = [
            SimpleField(name="id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True),  
            SimpleField(name="imageUrl", type=SearchFieldDataType.String, retrievable=True),  
            SearchableField(name="title", type=SearchFieldDataType.String, searchable=True, retrievable=True),  
            SearchField(  
                name="imageVector",  
                type=SearchFieldDataType.Collection(SearchFieldDataType.Single),  
                searchable=True,  
                vector_search_dimensions=1024,  
                vector_search_configuration="my-vector-config",  
            ),  
        ]
        
        vector_search = VectorSearch(
            algorithm_configurations=[
                VectorSearchAlgorithmConfiguration(
                    name="my-vector-config",
                    kind="hnsw",
                    hnsw_parameters={
                        "m": 4,
                        "efConstruction": 400,
                        "efSearch": 1000,
                        "metric": "cosine"
                    }
                )
            ]
        )
        index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search,)
        result = index_client.create_or_update_index(index)
        print(f' {result.name} created')

Then I go to azure search service and check my indexer, it gave me this error:
Web Api response status: 'NotFound', Web Api response details: '{"error":{"code":"404","message": "Resource not found"}}'

but I test the endpoint in local with python code, it works fine:

        def get_image_embeddings(imageUrl):  
            cogSvcsEndpoint = "https://{my cognitive service resource}.cognitiveservices.azure.com/"
            cogSvcsApiKey = "{cogsvcsapikey}"
            url = f"{cogSvcsEndpoint}/computervision/retrieval:vectorizeImage"  
            params = {  
                "api-version": "2023-02-01-preview"  
            }  
            headers = {  
                "Content-Type": "application/json",  
                "Ocp-Apim-Subscription-Key": cogSvcsApiKey  
            }  
            data = {  
                "url": imageUrl  
            }  
            response = requests.post(url, params=params, headers=headers, json=data)  
            if response.status_code != 200:  
                print(f"Error: {response.status_code}, {response.text}")  
                response.raise_for_status()  
            embeddings = response.json()["vector"]  
            return embeddings  

May I know what did I do wrongly for adding the customskill to azure cognitive search?

BUG: KNearestNeighbors Not Working

Using the .NET example code it appears that the KNearestNeighbors is not returning more than a single result.

Example Code:

 internal static async Task SingleVectorSearch(SearchClient searchClient, OpenAIClient openAIClient, string query, int k = 3, int nearestNeighbors = 3)
        {
            // Generate the embedding for the query  
            var queryEmbeddings = await SemanticFunctions.GenerateEmbeddings(query, openAIClient);

            // Perform the vector similarity search  
            var searchOptions = new SearchOptions
            {
                Vectors = { new() { Value = queryEmbeddings.ToArray(), KNearestNeighborsCount = nearestNeighbors, Fields = { "contentVector" } } },
                Size = k,
                Select = { "id", "title", "content", "category", "url" },
            };

            SearchResults<SearchDocument> response = await searchClient.SearchAsync<SearchDocument>(null, searchOptions);

            int count = 0;
            await foreach (SearchResult<SearchDocument> result in response.GetResultsAsync())
            {
                count++;
                // for (int i = 0; i < nearestNeighbors; i++)
                // {
                    Console.WriteLine($"Id: {result.Document["id"]}");
                    Console.WriteLine($"Title: {result.Document["title"]}");
                    Console.WriteLine($"Score: {result.Score}\n");
                    Console.WriteLine($"Content: {result.Document["content"]}");
                    Console.WriteLine($"Category: {result.Document["category"]}\n\n");
                // }
                
            }
            Console.WriteLine($"Total Results: {count}");
        }

The index# I am using is a bit non-typical and I wonder if that may be the cause. Here is an example of a result:
image
I have confirmed that that is indeed a single chunk of text in this particular record. I would expect the KNearestNeighbors to add n+ record "content" field values before and after into a single string return for this record. Perhaps that is not the intent or the search options are not setup correctly. Please advise, thank you!

Unable to do Vector Search

I am able to insert vector to my search index. However, when trying to do vector search or hybrid search. I am facing this error:

SerializationError: (', DeserializationError: (", AttributeError: \'float\' object has no attribute \'lower\'", \'Unable to deserialize to object: type\', AttributeError("\'float\' object has no attribute \'lower\'"))', 'Unable to build a model: (", AttributeError: \'float\' object has no attribute \'lower\'", \'Unable to deserialize to object: type\', AttributeError("\'float\' object has no attribute \'lower\'"))', DeserializationError(", AttributeError: 'float' object has no attribute 'lower'", 'Unable to deserialize to object: type', AttributeError("'float' object has no attribute 'lower'")))

Roadmap for public preview and GA

Hello, can you please share a bit on the roadmap and approximate timelines being planned for ACS vector search public preview and GA please?

What is the ongoing vector search API going to look like?

I was using private preview, and during search I have to create azure.search.documents.models.Vector to pass that into search_client.search.

Now i am on azure-search-documents==11.4.0b6 installing by pip install azure-search-documents --pre, now according to the example I can skip the step of creating the Vector object, and instead pass in a list for the vector, and there is one topk argument instead of one in search_client call and one in Vector definition.

# Perform vector search  
results = search_client.search(  
    search_text=None,  
    vector=generate_embeddings(query, cogSvcsEndpoint, cogSvcsApiKey),
    topk=3,
    vector_fields="imageVector",
    select=["title", "imageUrl"]  
) 

https://github.com/Azure/cognitive-search-vector-pr/blob/main/demo-python/code/azure-search-vector-image-python-sample.ipynb

It is all good but it is not matching what's shown in the public documentation https://learn.microsoft.com/en-us/rest/api/searchservice/preview-api/search-documents#request-body. Here in the request body there is clearly a vectors field and it is using the old object definition.

     "vectors": [
      {
        "value": "a vector representation of the query",
        "k": an integer (number of nearest neighbors to return as top results),
        "fields": "a comma-delimited list of vector fields to use in the query"
      }

So, which way is it gonna be and what is the source of truth?

VectorSearchAlgorithmConfiguration import error

I am trying to use this approach to use embeddings in Azure. For this, I am using langchain framework as well.
While trying to create a vector store using AzureSearch(), I get this error:

ImportError: cannot import name 'VectorSearchAlgorithmConfiguration' from 'azure.search.documents.indexes.models' (/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/azure/search/documents/indexes/models/init.py)

Code for langchain:

embeddings: OpenAIEmbeddings = OpenAIEmbeddings(openai_api_key=os.getenv('OPENAI_API_KEY'),
                                                deployment=openai_embedding_deployment_name, 
                                                chunk_size=1)
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=os.environ["AZURE_COGNITIVE_SEARCH_ENDPOINT"],
    azure_search_key=os.environ["AZURE_COGNITIVE_SEARCH_API_KEY"],
    index_name=index_name,
    embedding_function=embeddings.embed_query,
)

I am using 'azure-search-documents==11.4.0b8' version.
Please advise
Thank you

cannot connect to dev feed - here's another alternative to write pip.conf

not an issue - just that some users have issues trying to edit/write a pip.conf file and could be a problem for different platforms, ie windows and linux

Alternatively - use this instead of creating a .conf file and editing it

python -m pip config set global.index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/

Embedded Images Treatment

I was able to setup and run the Azure Open AI Text Embedding function, I also used the Ingestion sample to be able to create indexes, indexers, custom skillsets and datasources.

However my PDF documents might have embedded images, so I was wondering what happens in those cases?

My discoveries:

  1. Every page of each pdf is generated as an image and stored in the knowledge store.
    image

  2. When I check the content field on the index, I see many references to jpg files.

  3. I see a warning on all documents for images:

Can you please explain why this is happening and how to fix it?

image

  1. I also have warnings in some of the file chunks:

Can you please explain why this is happening and how to fix it?

image

Version conflict with Semantic Kernel required package

Having a problem trying to get the preview to work with Semantic Kernel. The SK CognitiveSearch connector requires Azure.Search.Documents version 11.5.0-beta.2, but to use the Vector stuff in search I need to use the 11.5.0-alpha.20230522.2 version, and I can't seem to get this to jive. I would really like to use the SemanticHybrid vector search as part of my rag augmented chat app. Please advise

error: NU1605: Warning As Error: Detected package downgrade: Azure.Search.Documents from 11.5.0-beta.2 to 11.5.0-alpha.20230522.2. Reference the package directly from the project to select a different version. error: CopilotChatWebApi -> Microsoft.SemanticKernel.Connectors.Memory.AzureCognitiveSearch 0.16.230615.1-preview -> Azure.Search.Documents (>= 11.5.0-beta.2) error: CopilotChatWebApi -> Azure.Search.Documents (>= 11.5.0-alpha.20230522.2)

Python: azure-search-documents 11.5.x alpha not available

I tried configuring the index for search documents but, I don't see the 11.5 version.. Here is what I see

(cog-search) $ pip index versions azure-search-documents --pre
WARNING: pip index is currently an experimental command. It may be removed/changed in a future release without prior warning.
azure-search-documents (11.4.0b3)
Available versions: 11.4.0b3, 11.4.0b2, 11.4.0b1, 11.4.0a20230523001, 11.4.0a20230522001, 11.4.0a20230519001, 11.4.0a20230518001, 11.4.0a20230517001, 11.4.0a20230516001, 11.4.0a20230515001, 11.4.0a20230512001, 11.4.0a20230511001, 11.4.0a20230510001, 11.4.0a20230509004, 11.4.0a20230509001, 11.4.0a20230508003,

Action required: migrate or opt-out of migration to GitHub inside Microsoft

Migrate non-Open Source or non-External Collaboration repositories to GitHub inside Microsoft

In order to protect and secure Microsoft, private or internal repositories in GitHub for Open Source which are not related to open source projects or require collaboration with 3rd parties (customer, partners, etc.) must be migrated to GitHub inside Microsoft a.k.a GitHub Enterprise Cloud with Enterprise Managed User (GHEC EMU).

Action

✍️ Please RSVP to opt-in or opt-out of the migration to GitHub inside Microsoft.

❗Only users with admin permission in the repository are allowed to respond. Failure to provide a response will result to your repository getting automatically archived.🔒

Instructions

Reply with a comment on this issue containing one of the following optin or optout command options below.

✅ Opt-in to migrate

@gimsvc optin --date <target_migration_date in mm-dd-yyyy format>

Example: @gimsvc optin --date 03-15-2023

OR

❌ Opt-out of migration

@gimsvc optout --reason <staging|collaboration|delete|other>

Example: @gimsvc optout --reason staging

Options:

  • staging : This repository will ship as Open Source or go public
  • collaboration : Used for external or 3rd party collaboration with customers, partners, suppliers, etc.
  • delete : This repository will be deleted because it is no longer needed.
  • other : Other reasons not specified

Need more help? 🖐️

Difference between ACS Semantic Search and Vector Search?

Still comparing the different search options available so wanted to fully understand the difference between ACS Semantic Search and the new ACS vector search preview.

Wanted to check the difference in functionality and also pricing if available.

If this has been outlined somewhere before like an FAQ, happy to go through it.

Thanks!

Unable to run dontet sample

I'm trying to run the dotnet sample and getting the following error.
dotnet restore --interactive is not showing any prompt for authentication
Unable to find package OpenAI.Api. No packages exist with this id in source(s): azure-sdk-for-net

Feature Request: Collection of vector fields

For our use case, we are ingesting long documents and audio transcripts. The amount of text we're starting with exceeds the 8K limit of the Ada embedding model.

So we need to create multiple embeddings from each piece of content.

Since we can only store one vector per search document, I had to come up with a hacky solution to store 'n' search documents per content. (Basically one parent search document, and 'n' child search documents, n == # of chunks).

If the Cog Search index could support a collection of complex types, each which included a vector, it would make this scenario much cleaner for these use cases.

Currently, it errors with "Only a top-level field of the index can be a vector field."

How to deploy to azure webapp the alpha libraries

Hi, I have tried the code and it runs fine in my local windows machine with Python, connects to cognitive search and able to do vector search. Now, when I go to deploy it as an Azure webapp with python and flask, the deployment fails. I have added following in my "requirements" file.

artifacts-keyring
azure-search-documents==11.4.0a20230509004
python-dotenv

Also created the pip.conf file as per instruction here.

The error I get during deployment is as below.

Collecting artifacts-keyring\n[06:23:06+0000] Using cached artifacts_keyring-0.3.3-py2.py3-none-any.whl (6.6 MB)\nERROR: Cannot install azure-search-documents and azure-search-documents==11.4.0a20230509004 because these package versions have conflicting dependencies.\n[06:23:06+0000] \n[06:23:06+0000] The conflict is caused by:\n[06:23:06+0000] The user requested azure-search-documents\n[06:23:06+0000] The user requested azure-search-documents==11.4.0a20230509004\n[06:23:06+0000] \n[06:23:06+0000] To fix this you could try to:\n[06:23:06+0000] 1. loosen the range of package versions you've specified\n[06:23:06+0000] 2. remove package versions to allow pip attempt to solve the dependency conflict\n[06:23:06+0000] \nERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts\n\n
Appreciate any tips to get it deployed in azure webapp.

Alternative algorithm and metrics

The demo only shows the uses of hnsw. And cosine as metric. What are the other possible algorithms and metrices i can use. I have tried "euclidean" and it works, but "dotproduct" Is not supported. I couldn't find other available alternatives in the repo.

Cannot connect to Dev Feed

Hi, I'm trying to run demo-python/code/azure-search-vector-python-sample.ipynb and I configured the pip.ini and installed the keyring artifacts-keyring packages, but I get this error when running pip install azure-search-documents==11.4.0a20230509004:

Looking in indexes: https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/
Collecting azure-search-documents==11.4.0a20230509004
 [.....]
 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))':
ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='4zjvsblobprodcus390.vsblob.vsassets.io', port=443): Max retries exceeded with url: (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

in another attempt with pip install --trusted-host=pypi.python.org --trusted-host=pypi.org --trusted-host=files.pythonhosted.org --upgrade --proxy=http://127.0.0.1:3128 azure-search-documents==11.4.0a20230509004 I got:

WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x0000027A2E5581F0>: Failed to establish a new connection: [WinError 10061] 

No connection could be made because the target machine actively refused it'

))': /azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/azure-search-documents/

Do I need some sort of authentication? or do you have any other leads? Thank you

Will nullable vectors be supported?

In our index document, we have sections for different entities which we fill in, and some support vector search, and some do not.

I've found that we need to still send an array of zeros for a vector field, even if we're not using it for that entity type. Seems to waste some effort sending this over the wire, and makes our search document larger than they need to be, in this case.

It would be useful to allow these to be nullable, and handled on the server-side if Cog Search requires converting that null to an empty array or something.

Any plans for this, or is this just by-design for how Collection fields work?

API for creating Vector Indexers

Maybe this already exists in the API but I was unable to find it. Once an index is created the acs vector search works great. Having an Indexer that can be run as an azure function on a frequency (hourly, daily, weekly etc.) would allow us to keep the data up to date given new files added or removed from an index's datasource.

Is this possible today, and if not can this be added to the backlog?

Currently I have a self rolled file chunker that can handle multiple file types (.txt, .pdf, .doc, .docx etc) and looking to figure out how to run this on a frequency to keep the vectors up to date. Thinking an azure function tied into a power automate frequency might be the way to go but hoping theres already something in the works in this area. Thanks and great job getting vectors into ACS!!

Advantages of vector search over normal keyword+symantic search

Hi,

As part of this project, did anyone compare how the vector search performs in comparison to normal keyword search+symantic option in Azure Cognitive Search?
Especially in following areas.

  1. Relevance of search results - better or worse
  2. Performance or elapsed time - better or worse.
  3. Use cases where keyword+symantic is better and where Vector is better.

Curious to know in case someone has already done it.
--regards.

Performance difference when indexing with vector field?

We have a Cognitive Search instance with 3 partitions, and I'm noticing an average time to process a batch of 100 documents of 20-40sec.

We have one vector field in the index, and I'm curious if there's any known performance difference uploading documents (via .NET SDK UploadDocumentsAsync) when there's a vector field vs not?

Obviously, it's pushing a lot more data with the vector array, but 30sec for 100 documents seems a lot slower than I'd expect.

Anything we can look at to optimize our indexing throughput? (We're already multi-threading, and in this case, I had 7 batches of 100 in flight at the same time.)

Vector field in complex types?

I noticed this comment in the README:

  • Vector fields with complex types or collections of complex types aren't supported.

Does this also mean you cannot have a complex type with a vector (float[]) field inside of it?

I'd like to have a collection of vectors, per document, so I can store multiple embeddings in the same search document. (Reason being, the search document maps to one text document, and after chunking, I'd get a collection of embeddings for the same text document.)

We've been using the FieldBuilder, and since that doesn't yet support vector field attributes, I was trying to add the field myself, but it needs to live inside of an existing ComplexField. ComplexField doesn't let you add a SearchField instance, unfortunately.

Custom skill to add vector fields cannot create indexes if data source is Table/Cosmos DB

Overview

I have developped a custom skill to add vector fields into documents.
I tried it when creating index whose data source is in Azure Storage Account (BLOB), and it added vector fields successfully.
I also tried it for other indexes whose data sources are in Azure Storage Account (Table) or Cosmos DB, but it failed with the following errors:
"Could not map output field 'contentVector' to search index. Check the 'outputFieldMappings' property of your indexer."
(operation is "Projection.SearchIndex.OutputFieldMapping.contentVector")
I have tried various different definitions of outputFieldMappings and indexes, but have never had success.
Is the outputFieldMappings for vector fields supported for Azure Storage Account (Table) and Cosmos DB?

Details

Here's snippets from my settings.

Added a vector field and its config into Index JSON.

{
  "name": "contentVector",
  "type": "Collection(Edm.Single)",
  "searchable": true,
  "filterable": false,
  "retrievable": true,
  "sortable": false,
  "facetable": false,
  "key": false,
  "indexAnalyzer": null,
  "searchAnalyzer": null,
  "analyzer": null,
  "normalizer": null,
  "dimensions": 1536,
  "vectorSearchConfiguration": "vectorConfig",
  "synonymMaps": []
}
...
"vectorSearch": {
  "algorithmConfigurations": [
    {
      "name": "vectorConfig",
      "kind": "hnsw",
      "hnswParameters": {
        "metric": "cosine",
        "m": 4,
        "efConstruction": 400,
        "efSearch": 500
      }
    }
  ]
}
...

Added WebApiSkill definition into Skillset definition.

{
  "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
  "name": "embed",
  "description": null,
  "context": "/document",
  "uri": "https://...",
  "httpMethod": "POST",
  "timeout": "PT30S",
  "batchSize": 1,
  "degreeOfParallelism": null,
  "inputs": [
    {
      "name": "content",
      "source": "/document/content"
    }
  ],
  "outputs": [
    {
      "name": "vector",
      "targetName": "contentVector"
    }
  ],
  "httpHeaders": {}
}

The above uri is masked, but actually is the uri of Azure Function endpoint which creating vectors with the following response:

values=[{'recordId': '0', 'data': {'vector': [-0.017162053, -0.014763341, 0.014551301, -0.019984847, -0.007573833, 0.01179477, -0.016366901, -0.020753495, -0.026955688, -0.019852322, 0.012907985, -0.00060589, -0.007123246, -0.018010216, 
...]}, 'errors': None, 'warnings': None}]

Added outputFieldMappings into Indexer JSON.

  "outputFieldMappings": [
    {
      "sourceFieldName": "/document/vector",
      "targetFieldName": "contentVector"
    }
  ],

When I tried changing the custom skill and other definitions to use a string field instead of a vector field as an additional field, and I succeeded in creating an index for Storage Account Table. So, I think there's something wrong with the vector field (Cognitive Search's limitation or my wrong configuration)

Are there any plans of a serverless tier for Azure Cognitive Search vector storage?

Currently, Azure Cognitive Search is looking prohibitively expensive for storing vector data. The basic tier ($75/month) only gives us 1 GB of vector storage. Also, unless I am mistaken, moving from one tier to next is not possible without losing/migrating your stored data?

Other vector dbs like Pinecone, Weaviate etc have a "pay as you grow" pricing model where you are charged based on the amount of data stored. Are there any plans in Azure Cognitive Search to provide similar options?

Office Documents not supported?

When I analyzed the code for the Open AI Embedding Generator I see this:

    FILE_FORMAT_DICT = {
        "md": "markdown",
        "txt": "text",
        "html": "html",
        "shtml": "html",
        "htm": "html",
        "py": "python",
        "pdf": "pdf",
    }
    SENTENCE_ENDINGS = [".", "!", "?"]
    WORDS_BREAKS = ['\n', '\t', '}', '{', ']', '[', ')', '(', ' ', ':', ';', ',']
    TOKEN_ESTIMATOR = TokenEstimator()

    def _get_file_format(self, file_path: str) -> Optional[str]:
        """Gets the file format from the file name.
        Returns None if the file format is not supported.
        Args:
            file_path (str): The file path of the file whose format needs to be retrieved.
        Returns:
            str: The file format.
        """
        # in case the caller gives us a file path

So word and pptx files are not supported at this time?

Failing to Create Index in Python Sample Notebook

I'm running the python sample for the first time, but it's failing on the step where it's trying to create the search index with the semantic settings with the following error. My Azure Cognitive Search resources is in West US 2.
image

Thanks!

Cannot find nested property 'contentVector' on the resource type 'Microsoft.Azure.Search.V2023_07_01_Preview.Vector'."

Hello,

following the quick start vector search, everything is fine until the search on the contentVector field. From the UI in Azure Portal, I see that contentVector field is OK.

Then I try a POST request from Postman like this :
https://.search.windows.net/indexes//docs/search?api-version=2023-07-01-Preview
with a body similar to the documentation.

Finally, I get this 400 error :

"The request is invalid. Details: Cannot find nested property 'contentVector' on the resource type 'Microsoft.Azure.Search.V2023_07_01_Preview.Vector'."

Switching to "fields": "titleVector" in the body gives me the same issue, including the reference to nested property 'contentVector' .

Including a highlight parameter in a Simple Vector Search results in a minimumCoverage error

When I add a highlight parameter to the Single Vector Search query described here, the following error is returned. I understand that in principle the highlight function cannot be used in a Single Vector Search query, but I think it would be better to add it to the Docs restrictions or change the error message as it confuses users.

{ "error": { "code": "", "message": "Failed to execute query because not enough resources were available to cover 100% of the index (58.3333333333333% was covered). You may be reaching the limits of your provisioned capacity. Adjust the number of replicas/partitions, reduce the rate of requests, or specify a lower value for the minimumCoverage parameter. See http://aka.ms/azure-search-throttling for more information." } }

Semantic Hybrid search giving results of normal semantic serach

In comparing the c# sdk with the python sdk, we have noticed, with identical requests and parameters that the semantic hybrid search is returning the expected result on python but is returning the identical result to the ordinary semantic search on the c# call.

ex c# params:
var vector = new SearchQueryVector { KNearestNeighborsCount = 10, Fields = "embeddings", Value = queryEmbeddings.ToArray() };
var searchOptions = new SearchOptions
{
Vector = vector,
Size = 10,
QueryType = SearchQueryType.Semantic,
QueryLanguage = QueryLanguage.EnUs,
SemanticConfigurationName = SemanticSearchConfigName,
QueryCaption = QueryCaptionType.Extractive,
QueryAnswer = QueryAnswerType.Extractive,
QueryCaptionHighlightEnabled = true,
Select = { "{column_name}" },
};
SearchResults response = await searchClient.SearchAsync(query, searchOptions);

How to Calculate Search Scores and Rankings for Hybrid Searches

Hi team,
I am wondering how search scores and rankings are calculated for hybrid searches.
I did a search with keyword search(top=10) and vector search(vector.k=10) and got the following ranking.
RRFscore(d) = Σ (1 / (k + rank_d_i)) with constant k = 59.

1, Simple Vector Search Result

FileName Vector-Rank Vector-RRF
A.txt 2 1/(k+2)=0.016393443
B.txt 1 1/(k+1)=0.016666667
C.txt 3 1/(k+3)=0.016129032

2, Simple Keyword Search Result

FileName Keyword-Rank Keyword-RRF
A.txt 8 1/(k+8)=0.014925373
B.txt 10 1/(k+10)=0.014492754
C.txt None 1/(k+?)=None

3, Σ RRF

FileName calclated-RRF displayed-RRF
A.txt 1/(k+2)+1/(k+8)=0.031318816 ≈0.03131881356239319
B.txt 1/(k+1)+1/(k+10)=0.031159420 ≈0.031159421429038048
C.txt 1/(k+3)=0.016129032 ≠0.029286926612257957

Question 1.
The docs describe k= like 60, but my calculation seems to be 59, how much is it actually?

Question 2.
"C.txt" was not found in the Simple Keyword Search (top=10) results. How is the RRF calculated in this case? My guess is that internally a larger value than I specified in my top query is specified and its ranking is calculated.

How to determine using which searching mechanism result is coming.

Hi Farzad/Team,

I have created a index which have vector and semantic configuration both using an API POST call, But while I make a query search call to index we are using semantic config search not vector, If you see below query_type = semantic.
But I can see there is a difference in the response when we use index with just semantic config with when we use index with vector and semantic config(Again as mentioned above not using vector while making a search call).

could you please help me understanding when we are using index which has vector and semantic search and while making a search call to index we are just using semantic query type(Please see below code), Does result will be different and automatically vector will be used at the backend?

r = self.search_client.search(q,
filter=filter,
query_type=QueryType.SEMANTIC,
query_language="en-us",
query_speller="lexicon",
semantic_configuration_name="default",
top=top,
query_caption="extractive|highlight-false" if use_semantic_captions else None)

Unable to test the Search Service when running it in Micrsoft MCAPS

when I try to run:
POST https://{{YOUR-SEARCH-SERVICE-NAME}}.search.windows.net/indexes?api-version=2023-07-01-Preview
Content-Type: application/json
api-key: {{YOUR-ADMIN-API-KEY}}
I receive a 403 error stating "The given API key is not permitted in the URI query string." I'm running the Search Service on my MCAPS environment. Anyone know, how i can overcome this error?

The Postman collection does not match the documentation

I am referring the documentation found here.

Postman Collection: Vector Search QuickStart.postman_collection v0.2.json

For example the documentation and the Postman collection talk about creating a vector

Issue 1

{
            "name": "titleVector",
            "type": "Collection(Edm.Single)",
            "searchable": true,
            "retrievable": true,
            "dimensions": 1536,
            "vectorSearchConfiguration": "vectorConfig"
        }

In this case we will receive an error with vectorSearchConfiguration. Instead it should be algorithmConfiguration.

Issue 2

    "vectorSearch": {
        "algorithmConfigurations": [
            {
                "name": "my-vector-config",
                "kind": "hnsw",
                "hnswParameters": {
                    "m": 4,
                    "efConstruction": 400,
                    "efSearch": 500,
                    "metric": "cosine"
                }
            }
        ]
    }

The key kind and efSearch are currently not supported. Instead we have to replace kind with algorithm.

I have updated the PostMan collection and can make PR if it is easier to address the issue.

Finding or checking for existence of an index

If I create index and populate it with documents/embeddings, the next time I run I need to do one of the following:

  • Update existing index with additional documents, or
  • Delete existing index and create a new one.
  • Avoiding writing to existing index.

Therefore, how do I check if the index named MyTextIndex already exists before I do anything? Is there API?

ReadMe and Python notebook mis-aligned on variable names

The Readme in the following dir (https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-python) indicates the .env file should contain an entry for AZURE_SEARCH_API_KEY=YOUR-SEARCH-SERVICE-ADMIN-KEY. However, the azure-search-vector-python-sample.ipynb notebook is looking for a variable named os.getenv("AZURE_SEARCH_ADMIN_KEY").

This may trip up a newbie if they don't know what they are looking for. I added AZURE_SEARCH_ADMIN_KEY to the .env file as I am just starting down the path of exploring this repo, and I don't know if there are any references to the AZURE_SEARCH_API_KEY environment variable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.