Cognitive Search is an AI feature in Azure Search, used to extract text from images, blobs, and other unstructured data sources - enriching the content to make it more searchable in an Azure Search index. Extraction and enrichment are implemented through cognitive and custom skills attached to an indexing pipeline.
This repository contains an Azure Function (Python HTTP Trigger) that implements the Web API custom skill interface, allowing you to extend Cognitive Search by calling out to an API endpoint providing custom operations.
Before you start, you must have the following:
- Python 3.6 or later
- Azure Functions Core Tools v2.7.1575 or later
- Azure CLI v2.x or later
For a better development experience it's recommended the use of Visual Studio Code with Python and Azure Functions extensions.
The sample is a framework that can be used for any Azure Search custom skill you want, it is not tied to any specific service except Azure Functions. Key features/advantages:
- Developers only have to worry about the business logic and fill the
values
property as the output. - Built with Marshmallow schemas, strengthening data consistency by serializing/deserializing objects to primitive Python types and simplifying data validation.
import logging
import azure.functions as func
from typing import List
from .models.output import OutputRecord
from .utils.schemas_helper import output_dumps
from .utils.functions_helper import load_request, bad_request, ok
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Custom kill processed a request.')
req_result: RequestResult = load_request(req)
if not req_result.valid:
return bad_request(req_result.error)
input_skill: InputSkill = req_result.input_skill
# YOUR CODE HERE
values: List[OutputRecord] = [] # Update your values property
output_json, error = output_dumps(values)
if error:
return bad_request('Invalid output format')
return ok(output_json)
In the Azure Functions main file, there are basically three tasks that need to be done:
- Read data from
input_skill
and create your logic/processing (e.g. replace words from documents, apply regex, etc) - Update the
OutputData
class with the property name you defined on Azure Search. In this sample, the generic property name created wascontractTextProcessed
. - Update the
values
list with results from the previous processing.
The sample uses unittest framework. You can follow the Python testing in Visual Studio Code article to configure your VSCode to run unit tests. Otherwise, you can test through command line:
python -m unittest discover ./skill/tests
Note: All tests are under the skill/tests
folder, so make sure you are not looking only on skill
folder by default.