Is your feature request related to a problem? Please describe When

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Triage - attendees 1 <a href="https://github.

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Feature Request] Introduce Document Processors about opensearch HOT 11 OPEN

mingshl commented on September 22, 2024

[Feature Request] Introduce Document Processors

from opensearch.

Comments (11)

jackiehanyang commented on September 22, 2024

ingest processors and search processors share logics for document manipulation and processing.

Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.

from opensearch.

mingshl commented on September 22, 2024

ingest processors and search processors share logics for document manipulation and processing.

Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.

the J-J transformer method should be added to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.

from opensearch.

jackiehanyang commented on September 22, 2024

ingest processors and search processors share logics for document manipulation and processing.

Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.

the J-J transformer method should be moving to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.

The J-J transformer functions as a standalone utility within the Core package, making it adaptable for use by any processor. To clarify, we are not moving the J-J transformer to Document Processors. Instead, any processor, including Document Processors, can integrate the J-J transformer within their own processor if desired

from opensearch.

mingshl commented on September 22, 2024

ingest processors and search processors share logics for document manipulation and processing.

Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.

the J-J transformer method should be moving to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.

The J-J transformer functions as a standalone utility within the Core package, making it adaptable for use by any processor. To clarify, we are not moving the J-J transformer to Document Processors. Instead, any processor, including Document Processors, can integrate the J-J transformer within their own processor if desired

I don't mean to move the method within the Document Processors. But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.

Because it makes more sense that all document related transformation happens in document processors. And we don't have to copy the same codes between search processors and ingest processors.

from opensearch.

minalsha commented on September 22, 2024

@mingshl @jackiehanyang is building JtoJ transform as a utility function in Core to be used by any processor or any feature. How would that play in with this document processor?

from opensearch.

jackiehanyang commented on September 22, 2024

But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.

@mingshl Could you please provide further elaboration on this? What are the parameters and how will they be used in the J-J transformer?

from opensearch.

mingshl commented on September 22, 2024

But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.

@mingshl Could you please provide further elaboration on this? What are the parameters and how will they be used in the J-J transformer?

It depends on the j-j transform use case, as J-J transformer functions as a standalone utility, it can be used individually in a search or a ingest processor. If it's used in both ingest and search, then it makes more sense to be in the document processor then it can be shared to both ingest and search processors.

I will leave this options to builders and users for different processors.

from opensearch.

andrross commented on September 22, 2024

[Triage - attendees 1 2 3 4]
@mingshl Thanks for filing. Looking forward to seeing the outcome here.

from opensearch.

joshpalis commented on September 22, 2024

And this Document Processor Factories will produce different type of processors, for example, split document processors, that can be used in both in Search Response Processor Factories and also in Ingest Processors Factories.

Search Response and Ingest Processors expect SearchResponses and IngestDocuments respectively, and processors are implemented based on those interfaces. Just curious how Document Processors would be chained to these processors if the inputs dont line up.

Would a Search Response processor feed hits to a document processor and re-format the modified hits back into a Search Response?

from opensearch.

navneet1v commented on September 22, 2024

@mingshl

When developing ml_infernce ingest processors and ml_inferece search processors, an interesting question was brought up why it can not be just one ml_inference processor to work on both ingest and search phases? As it seems very similar in the APIs request.

One thing to think here is, during ingest you have document, but when you do the search it is not necessary that you will always have the documents. Example: a textEmbedding processor can convert 1 or more fields of a document to embedding but in a SearchRequestProcessor works on a field of the queryRequest. Also when a search request is completed actually what you are getting documents in the response is the fields of the documents. They are fundamentally different things but we generally call search response has a list of documents. If you remove _source from the search responses they are just _ids. Hence fundamentally different.

I think what you are looking here is transformers or may be convertors(+1 on @minalsha point), which does a particular task. May be something like Generic Transformers which can be called by ingestProcessors or SearchProcessors do a specific task.

from opensearch.

msfroh commented on September 22, 2024

Example: a textEmbedding processor can convert 1 or more fields of a document to embedding but in a SearchRequestProcessor works on a field of the queryRequest.

The common document processor logic isn't applicable to SearchRequestProcessor (since a search request doesn't have documents).

Search Response and Ingest Processors expect SearchResponses and IngestDocuments respectively, and processors are implemented based on those interfaces.

We will create a pair of adapters (with a single implementation of each) that extract the "documents" from a SearchResponse or IngestDocument, passes them through the DocumentProcessor, and returns the modified SearchRepsonse or IngestDocument, respectively.

Once you have a DocumentProcessorFactory, you'd be able to register the ingest Processor and SearchResponseProcessor via a plugin like:

class RenameFieldDocumentProcessor implements DocumentProcessor {
  // Implementation
}

class RenameFieldDocumentProcessorFactory implements DocumentProcessorFactory {
  RenameFieldDocumentProcessor create(Map<String, Object> config) {
    // Parse config, return processor
   }
}

class DocumentProcessorPlugin implements IngestPlugin, SearchPipelinePlugin {

  Map<String, org.opensearch.ingest.Processor.Factory> getProcessors(org.opensearch.ingest.Processor.Parameters parameters) {
    return Map.of(
      "rename_field", new DocumentIngestProcessorFactory(new RenameFieldDocumentProcessorFactory());
    );
  }

  Map<String, Processor.Factory<SearchResponseProcessor>> getResponseProcessors(Parameters parameters) {
    return Map.of(
      "rename_field", new DocumentSearchResponseProcessorFactory(new RenameFieldDocumentProcessorFactory());
    );    
  }
}

You get two processors for the price of one.

from opensearch.

[Feature Request] Introduce Document Processors about opensearch HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent