Comments (8)
If we agree that merging the joined data into the existing document should be handled in a javascript vm in the source, then we're presented with a bit of a dilemma about how to request lookups
i.e. consider the case of redis, and mongo,
with redis we might want to query with
GET <key>
or HGET <key> <value>
whereas with mongo, we'd want to do a db.findOne({_id: <id>})
, or possibly a findOne(<bson query>)
We are forced to either provide a generic function that can take a variety of ways to query,
ie
Mongo
module.exports = function(doc, source) {
doc["author"] = source.lookup({namespace: "boom.authors", query: {_id: doc.author_id}});
}
Redis
module.exports = function(doc, source) {
doc["author"] = source.lookup({method: "HGET", key: "authors", value: doc.author_id});
}
or we provide specialized functions for each source type.
Redis
module.exports = function(doc, source) {
doc["author"] = source.HGET("authors", doc.author_id);
}
Each of these options has drawbacks.
opinions?
from transporter.
I think you can probably get a long way with a stupid simple lookup interface right now, even just k/v lookups (find by _id
in Mongo, get
in Redis. Advanced queries (anything special on Redis probably counts) can come later.
And, the less actual work that happens in Javascript the more chance there is to optimize / scale this stuff later. Letting people do arbitrary queries and then run logic against them in JS seems like it's going to create a really hard-to-optimize performance bottleneck.
from transporter.
π
from transporter.
π This is also one feature that I'm looking forward to
from transporter.
+1 Definitely a sought after feature.
from transporter.
+1 denormalization of data for elasticsearch should be possible
from transporter.
+1 from me as well I can also contribute tothe code if required .
from transporter.
dumping what the plan is here so I don't forget when I get to this soon...
t.Source(mongodb({uri: "connection string"β¦}).
Join(postgres({uri: "connection string"β¦}), {
id_map: {"account_id": "id"},
field_map: {
"name": "flegergle",
"slug": "account_slug"
},
query_ref: "accounts"}).
Save(elasticsearch({uri: "connection string"...})
)
NOTE this is contingent on changes to the javascript DSL which is currently in progress.
the general idea here is to have a new method Join(...)
that takes two parameters, an adaptor and a configuration for performing the query.
in the above pipeline, the following scenario would take place:
original doc
{
"_id": "somespecial_ID",
"name": "fancypants",
"type": "foo",
"account_id": 1567
}
and when sent to the Join
the following query would be executed:
SELECT name AS flegergle, slug AS account_slug FROM accounts WHERE id = 1567
which would then send the following document down the pipeline:
{
"_id": "somespecial_ID",
"name": "fancypants",
"type": "foo",
"account_id": 1567,
"flegergle": "Super Duper",
"account_slug": "super-duper"
}
The initial implementation of this will likely only support joins to a single table/collection.
from transporter.
Related Issues (20)
- Support for case sensitive table identifiers in Postgres
- elastic cloud uri for cloud enterprise HOT 1
- Panic New Pipeline
- mongodb not compatible with elastic search 7.3.0 HOT 4
- Logging from inside goja or otto transform functions
- sync mongoDB to es οΌlost some data HOT 2
- how to set batch operation?such as batch insert ES size, mongo batch query size; because of too slowly
- ls
- Rejecting mapping update as the final mapping would have more than 1 type HOT 2
- MongoDB -> Elasticsearch :: json: unsupported value: NaN
- Postgres sink update error where table name exists in multiple schemas HOT 1
- How to sync all the databases from DB-A to DB-B based on filters provided
- SSL is not enabled on the server from postgresql to elasticsearch
- ERRO[0114] elastic: Error 400 (Bad Request): Action/metadata line [1] contains an unknown parameter [_type] [type=illegal_argument_exception] executionID=8 version=5 writer=elasticsearch HOT 1
- transporter `run` reports "connection error, no reachable servers" even though `test` says OK. HOT 3
- Deletions from postgres to mongo are not handled correctly when mongo is down
- Mongo driver deprecated
- connection error, no reachable servers when trying to connect to a atlas cluster
- mongodb source: "sort exceeded memory limit"
- Repository has been archived HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transporter.