Coder Social home page Coder Social logo

Comments (11)

jackiehanyang avatar jackiehanyang commented on September 22, 2024

ingest processors and search processors share logics for document manipulation and processing.

Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.

from opensearch.

mingshl avatar mingshl commented on September 22, 2024

ingest processors and search processors share logics for document manipulation and processing.

Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.

the J-J transformer method should be added to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.

from opensearch.

jackiehanyang avatar jackiehanyang commented on September 22, 2024

ingest processors and search processors share logics for document manipulation and processing.

Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.

the J-J transformer method should be moving to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.

The J-J transformer functions as a standalone utility within the Core package, making it adaptable for use by any processor. To clarify, we are not moving the J-J transformer to Document Processors. Instead, any processor, including Document Processors, can integrate the J-J transformer within their own processor if desired

from opensearch.

mingshl avatar mingshl commented on September 22, 2024

ingest processors and search processors share logics for document manipulation and processing.

Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.

the J-J transformer method should be moving to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.

The J-J transformer functions as a standalone utility within the Core package, making it adaptable for use by any processor. To clarify, we are not moving the J-J transformer to Document Processors. Instead, any processor, including Document Processors, can integrate the J-J transformer within their own processor if desired

I don't mean to move the method within the Document Processors. But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.

Because it makes more sense that all document related transformation happens in document processors. And we don't have to copy the same codes between search processors and ingest processors.

from opensearch.

minalsha avatar minalsha commented on September 22, 2024

@mingshl @jackiehanyang is building JtoJ transform as a utility function in Core to be used by any processor or any feature. How would that play in with this document processor?

from opensearch.

jackiehanyang avatar jackiehanyang commented on September 22, 2024

But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.

@mingshl Could you please provide further elaboration on this? What are the parameters and how will they be used in the J-J transformer?

from opensearch.

mingshl avatar mingshl commented on September 22, 2024

But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.

@mingshl Could you please provide further elaboration on this? What are the parameters and how will they be used in the J-J transformer?

It depends on the j-j transform use case, as J-J transformer functions as a standalone utility, it can be used individually in a search or a ingest processor. If it's used in both ingest and search, then it makes more sense to be in the document processor then it can be shared to both ingest and search processors.

I will leave this options to builders and users for different processors.

from opensearch.

andrross avatar andrross commented on September 22, 2024

[Triage - attendees 1 2 3 4]
@mingshl Thanks for filing. Looking forward to seeing the outcome here.

from opensearch.

joshpalis avatar joshpalis commented on September 22, 2024

And this Document Processor Factories will produce different type of processors, for example, split document processors, that can be used in both in Search Response Processor Factories and also in Ingest Processors Factories.

Search Response and Ingest Processors expect SearchResponses and IngestDocuments respectively, and processors are implemented based on those interfaces. Just curious how Document Processors would be chained to these processors if the inputs dont line up.

Would a Search Response processor feed hits to a document processor and re-format the modified hits back into a Search Response?

from opensearch.

navneet1v avatar navneet1v commented on September 22, 2024

@mingshl

When developing ml_infernce ingest processors and ml_inferece search processors, an interesting question was brought up why it can not be just one ml_inference processor to work on both ingest and search phases? As it seems very similar in the APIs request.

One thing to think here is, during ingest you have document, but when you do the search it is not necessary that you will always have the documents. Example: a textEmbedding processor can convert 1 or more fields of a document to embedding but in a SearchRequestProcessor works on a field of the queryRequest. Also when a search request is completed actually what you are getting documents in the response is the fields of the documents. They are fundamentally different things but we generally call search response has a list of documents. If you remove _source from the search responses they are just _ids. Hence fundamentally different.

I think what you are looking here is transformers or may be convertors(+1 on @minalsha point), which does a particular task. May be something like Generic Transformers which can be called by ingestProcessors or SearchProcessors do a specific task.

from opensearch.

msfroh avatar msfroh commented on September 22, 2024

Example: a textEmbedding processor can convert 1 or more fields of a document to embedding but in a SearchRequestProcessor works on a field of the queryRequest.

The common document processor logic isn't applicable to SearchRequestProcessor (since a search request doesn't have documents).

Search Response and Ingest Processors expect SearchResponses and IngestDocuments respectively, and processors are implemented based on those interfaces.

We will create a pair of adapters (with a single implementation of each) that extract the "documents" from a SearchResponse or IngestDocument, passes them through the DocumentProcessor, and returns the modified SearchRepsonse or IngestDocument, respectively.

Once you have a DocumentProcessorFactory, you'd be able to register the ingest Processor and SearchResponseProcessor via a plugin like:

class RenameFieldDocumentProcessor implements DocumentProcessor {
  // Implementation
}

class RenameFieldDocumentProcessorFactory implements DocumentProcessorFactory {
  RenameFieldDocumentProcessor create(Map<String, Object> config) {
    // Parse config, return processor
   }
}

class DocumentProcessorPlugin implements IngestPlugin, SearchPipelinePlugin {

  Map<String, org.opensearch.ingest.Processor.Factory> getProcessors(org.opensearch.ingest.Processor.Parameters parameters) {
    return Map.of(
      "rename_field", new DocumentIngestProcessorFactory(new RenameFieldDocumentProcessorFactory());
    );
  }

  Map<String, Processor.Factory<SearchResponseProcessor>> getResponseProcessors(Parameters parameters) {
    return Map.of(
      "rename_field", new DocumentSearchResponseProcessorFactory(new RenameFieldDocumentProcessorFactory());
    );    
  }
}

You get two processors for the price of one.

from opensearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.