Comments (11)
ingest processors and search processors share logics for document manipulation and processing.
Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.
from opensearch.
ingest processors and search processors share logics for document manipulation and processing.
Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.
the J-J transformer method should be added to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.
from opensearch.
ingest processors and search processors share logics for document manipulation and processing.
Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.
the J-J transformer method should be moving to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.
The J-J transformer functions as a standalone utility within the Core package, making it adaptable for use by any processor. To clarify, we are not moving the J-J transformer to Document Processors. Instead, any processor, including Document Processors, can integrate the J-J transformer within their own processor if desired
from opensearch.
ingest processors and search processors share logics for document manipulation and processing.
Can the shared logic of document processing be handled by the J-J transformer? If not, should we create another processor for this task? Piling up multiple processors might become cumbersome for customers.
the J-J transformer method should be moving to Document Processors so that it can also be shared in ingest and search processors. Hope this make sense.
The J-J transformer functions as a standalone utility within the Core package, making it adaptable for use by any processor. To clarify, we are not moving the J-J transformer to Document Processors. Instead, any processor, including Document Processors, can integrate the J-J transformer within their own processor if desired
I don't mean to move the method within the Document Processors. But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.
Because it makes more sense that all document related transformation happens in document processors. And we don't have to copy the same codes between search processors and ingest processors.
from opensearch.
@mingshl @jackiehanyang is building JtoJ transform as a utility function in Core to be used by any processor or any feature. How would that play in with this document processor?
from opensearch.
But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.
@mingshl Could you please provide further elaboration on this? What are the parameters and how will they be used in the J-J transformer?
from opensearch.
But if adding the parameters in a document processor that used j-j transform, then it can shared in search/ingest processors.
@mingshl Could you please provide further elaboration on this? What are the parameters and how will they be used in the J-J transformer?
It depends on the j-j transform use case, as J-J transformer functions as a standalone utility, it can be used individually in a search or a ingest processor. If it's used in both ingest and search, then it makes more sense to be in the document processor then it can be shared to both ingest and search processors.
I will leave this options to builders and users for different processors.
from opensearch.
[Triage - attendees 1 2 3 4]
@mingshl Thanks for filing. Looking forward to seeing the outcome here.
from opensearch.
And this Document Processor Factories will produce different type of processors, for example, split document processors, that can be used in both in Search Response Processor Factories and also in Ingest Processors Factories.
Search Response and Ingest Processors expect SearchResponses and IngestDocuments respectively, and processors are implemented based on those interfaces. Just curious how Document Processors would be chained to these processors if the inputs dont line up.
Would a Search Response processor feed hits to a document processor and re-format the modified hits back into a Search Response?
from opensearch.
When developing ml_infernce ingest processors and ml_inferece search processors, an interesting question was brought up why it can not be just one ml_inference processor to work on both ingest and search phases? As it seems very similar in the APIs request.
One thing to think here is, during ingest you have document, but when you do the search it is not necessary that you will always have the documents. Example: a textEmbedding processor can convert 1 or more fields of a document to embedding but in a SearchRequestProcessor works on a field of the queryRequest. Also when a search request is completed actually what you are getting documents in the response is the fields of the documents. They are fundamentally different things but we generally call search response has a list of documents. If you remove _source from the search responses they are just _ids. Hence fundamentally different.
I think what you are looking here is transformers or may be convertors(+1 on @minalsha point), which does a particular task. May be something like Generic Transformers which can be called by ingestProcessors or SearchProcessors do a specific task.
from opensearch.
Example: a textEmbedding processor can convert 1 or more fields of a document to embedding but in a SearchRequestProcessor works on a field of the queryRequest.
The common document processor logic isn't applicable to SearchRequestProcessor
(since a search request doesn't have documents).
Search Response and Ingest Processors expect SearchResponses and IngestDocuments respectively, and processors are implemented based on those interfaces.
We will create a pair of adapters (with a single implementation of each) that extract the "documents" from a SearchResponse
or IngestDocument
, passes them through the DocumentProcessor
, and returns the modified SearchRepsonse
or IngestDocument
, respectively.
Once you have a DocumentProcessorFactory
, you'd be able to register the ingest Processor
and SearchResponseProcessor
via a plugin like:
class RenameFieldDocumentProcessor implements DocumentProcessor {
// Implementation
}
class RenameFieldDocumentProcessorFactory implements DocumentProcessorFactory {
RenameFieldDocumentProcessor create(Map<String, Object> config) {
// Parse config, return processor
}
}
class DocumentProcessorPlugin implements IngestPlugin, SearchPipelinePlugin {
Map<String, org.opensearch.ingest.Processor.Factory> getProcessors(org.opensearch.ingest.Processor.Parameters parameters) {
return Map.of(
"rename_field", new DocumentIngestProcessorFactory(new RenameFieldDocumentProcessorFactory());
);
}
Map<String, Processor.Factory<SearchResponseProcessor>> getResponseProcessors(Parameters parameters) {
return Map.of(
"rename_field", new DocumentSearchResponseProcessorFactory(new RenameFieldDocumentProcessorFactory());
);
}
}
You get two processors for the price of one.
from opensearch.
Related Issues (20)
- Add baseline comparison results to pull requests for performance benchmarks HOT 2
- [BUG] Unable to delete an index when a merge is ongoing
- Manual approval required for workflow run 9997165342: Request to approve/deny benchmark run for PR #14829 HOT 2
- [Feature Request] Storage Reduction for id fields HOT 1
- [Feature Request] Star Tree File Formats
- [BUG] <Backend role mapping is not found by AWS resource if deployed via api>
- Manual approval required for workflow run 10012131044: Request to approve/deny benchmark run for PR #14840 HOT 2
- [Feature Request] Add Persian stem support
- [Question] Understanding OpenSearch Remote-Backed Storage
- Allocator changes to support tiering with replicas HOT 2
- [Feature Request] Compressing HTTP requests with LZ4 and other compression algorithms HOT 2
- [Star Tree] Parse aggregation request to star tree query & star tree aggregation HOT 3
- [Search] [Star Tree] Option/Param to Disable search via star-tree HOT 2
- Manual approval required for workflow run 10047857363: Request to approve/deny benchmark run for PR #14832 HOT 2
- QueryGroup Task Cancellation Framework and Implementation
- [BUG] [Batch Mode] Allocation explain API is stuck in AWAITING_INFO even though deciders are returning NO
- [Feature Request] Add ability to configure maxExpansions parameter for Intervals query HOT 3
- [BUG] "." as field name yields array_index_out_of_bounds_exception HOT 1
- [Feature Request] Create Higher-Level APIs for Plugins to switch contexts HOT 7
- [Feature Request] Build Grok Search Response Processor
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opensearch.