compose / transporter Goto Github PK

View Code? Open in Web Editor NEW

1.5K 56.0 213.0 23.45 MB

Sync data between persistence engines, like ETL only not stodgy

Home Page: https://github.com/compose/transporter/issues/523

License: BSD 3-Clause "New" or "Revised" License

Go 97.80% JavaScript 0.43% Shell 1.33% Dockerfile 0.45%

go etl mongodb elasticsearch rethinkdb postgresql rabbitmq mysql

transporter's People

Contributors

Stargazers

Watchers

Forkers

sberryman nkabir alexbevi tribemedia alexiswtd mandeepm91 mm- se77en alindeman suitingtseng simonmorley osterzel tripolissolutions conversion-partners exnexu damdsl tombyrer kaminrunde alino georgel letsgolesco winslett shockitv apigonthsky hivehq tenthousandcoffees franklinwise cleitonmarx engvik garyguo110 matohawk e0d robmurtha wasabiair tonmanna thorning stondo travisjeffery mikpan albertzak ndouba dm04806 arpinum dortort williamelts cornerjob sourcec0de sukhikhn appbaseio codezip sheroesdev uw-xdd paedae shanfei junzeron trakout 51smith fiona2014 tomzhang ukch containerz 1mdata rock-you flyvictor pombredanne rishiloyola bobquest33 ch7025833 saribmahmood55 zhilinwang codepope noviumcollective haugasdev gitter-badger kiavashi hinike zhufu chuong2v benelgar yupengfei williamtran29 guptaishabh greengrass2015 juliogt89 gloomyboy001 huangdehui2013 joe2hpimn sambartrum johnjjung ibc789 pmjhonwang jmptrader lucyio sabya1981 johnnason holys alexandrefabre22 cyrusmg yourhe codeaudit

transporter's Issues

Configuration is confusing and incomplete

The current use of config.yaml means that only two parameters can be passed to a node from the yaml file and any other parameters have to be passed within the JavaScript application.

On the understanding that the idea of a config file is to allow the pipeline configuration in the JavaScript to be practically immutable and any changes in setup should be made in the config.yaml, then the current setup fails that goal.

Consider the mongodb adapter. It requires the uri and namespace parameters to be set. The uri is set in config.yaml but the namespace is set in the application.js.

A stable application JavaScript and a adaptable and flexible config.yaml is desirable, for resiliance in production and consistency in documentation.

Bad error message when using wrong credentials on mongo adaptor

When a node is created with bad credentials, or we can't create a mongo sesstion, then we return with a panic message, rather then a useful error

Version support for elasticsearch, mongodb and others

Having trouble to set up a simple transport between mongodb and elasticsearch, I was wondering if my problems could come from the versions I use. Which version do you recommend to work with transporter?

Thanks

tar.gz releases seems to be compressed twice

Hi,
I wanted to bring to your attention that your release files are compressed twice, which unnecessarily complicates decompression using a single tar command.

$ file transporter_linux_amd64.tar.gz
-> transporter_linux_amd64.tar.gz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT)
$ gunzip transporter_linux_amd64.tar.gz && file transporter_linux_amd64.tar
-> transporter_linux_amd64.tar: gzip compressed data

Sorry to slice hairs for such little details.

Transformers don't seem to be able to drop messages

In 0.0.4, passing the better Msg object - and having to return it - means you can't return false - it generates an error message.

In previous versions, returning false (or some other non map/interface type) would mean the message was dropped from the pipeline.

Transporter as demon listening to OpLog

How can I run transporter as demon, such that it will always be listening to OpLog and update elastic-search accordingly?

Enhancement: Add the possibility to require npm packages in transformers

It seems that it is currently not possible to use require("...") in transformers (transporter does not send any document when require is being used). NPM packages would add a lot of possibilities in the transform step when it comes to parsing and rewriting fields.

Joins, or a way to pulling extra data from other namespaces

After fetching a document from a source, we need a way to resolve pieces of the document when data might exist in other namespaces
eg, if we have a document from a namespace of 'posts' that looks like this

{
    title: "this is a title",
    author: ObjectId("54179ce06570544fb3892b69"),
    content: "post content"
}

then we need to be able to query another namespace on the source to turn ObjectId("54179ce06570544fb3892b69") into an appropriate object.

One way to solve this would be to add a javascript vm to the source, and let the user run a js function with a javascript builtin or other mechanism that would perform a lookup against the source

Not specifying API URI generates multiple errors

If the config.yaml lacks the api uri value, rather than take it as an indication that no metrics are needed
the transporter generates an error everytime the interval expires. eg.

transporter: EventEmitter Error: Post : unsupported protocol scheme ""
transporter: EventEmitter Error: Post : unsupported protocol scheme ""
transporter: EventEmitter Error: Post : unsupported protocol scheme ""

Build error - Ubuntu

When I try to build transporter on a Ubuntu box, I get the following error. Any idea, what might be wrong?

../../compose/transporter/pkg/adaptor/influxdb.go:98: undefined: client.Series
../../compose/transporter/pkg/adaptor/influxdb.go:104: i.influxClient.WriteSeries undefined (type *client.Client has no field or method WriteSeries)
../../compose/transporter/pkg/adaptor/influxdb.go:104: undefined: client.Series
../../compose/transporter/pkg/adaptor/influxdb.go:111: undefined: client.ClientConfig
../../compose/transporter/pkg/adaptor/rethinkdb.go:133: unknown gorethink.ConnectOpts field 'IdleTimeout' in struct literal

syntax errors in transformers are swallowed silently

errors in compiling the transformers should be shown and should be fatal

Migration error from Transformer not helpful

The transformer in the javascript-builder code produces

first argument must be an hash. (got string instead)

If called in the previous format. This doesn't indicate where the error occured. I suggest changing it to

Transformer: first argument must be an hash. (got string instead)

And adopting a similar strategy for all errors

Support for environment variables in config

I've been attempting to run a setup locally involving mongodb, elasticsearch and transporter all on docker.

The dockerization works for all 3, but in order for them to communicate via docker links (which are preferred over exposed ports) the config must have access to environment variables, as docker link addresses/ports are defined in env variables.

This isn't just an issue with docker - anyone using a PaaS that exposes config info via env variables (e.g. heroku) may hit issues here.

Support for environment variables in the config.yaml or via command line args would be greatly appreciated!

Mongodb -> Elasticsearch: repeated error "mejson: unknown type: uint8"

I'm using the transporter to move data from a mongodb instance to elasticsearch. I'm using a transformer to pick certain fields as per your examples.

When I run the transporter, this error is repeated like crazy: mejson: unknown type: uint8

It doesn't kill the transporter (data still makes it through), but it's pretty annoying and makes me worried that some data isn't going to make it through the pipe.

Any idea what's going on here?

When adding data in mongodb but looking in the elastic is not seen unless running back run command ...

I have connected the elasticsearch and mongodb. But when I add the data in mongodb .then I search in elatissearch, then no data. What do I do?

Installation failing at go get -a ./cmd/...

I am getting the following error when I execute the command:

go get -a ./cmd/...

Error:

go build github.com/compose/transporter/cmd/transporter: signal: killed

log errors to stdout

Currently the only error reporting is through the api. We should log Errors and Fatals to stderr as well.
Once we decide how the loglevel setting will work in the config, this will become a tunable. Until then, lets default to dumping adaptor.ERROR and adaptor.CRITICAL
https://github.com/compose/transporter/blob/master/pkg/adaptor/errors.go#L21-L22

Sync data between Mongodb and ElasticSearch errors.

Hello all,
We have an issue when config the sync data from mongodb to ES.
thanhtruong$ ./transporter eval --config ./test/config.yaml 'Source({name:"localmongo", namespace: "testdb.2b66d7a9-8cb5-4802-838a-f1f58869bbf5_campaigns"}).save({name:"es", namespace: "2b66d7a9-8cb5-4802-838a-f1f58869bbf5_campaigns.testdata"})'
Mongo Config {URI:mongodb://localhost/testdb Namespace:testdb.2b66d7a9-8cb5-4802-838a-f1f58869bbf5_campaigns Debug:true Tail:false Wc:0 FSync:false Bulk:false}
setting start timestamp: 6111893325843791872
transporter: CRITICAL: elasticsearch error (Bulk Insertion Error. Failed item count [4])
transporter: CRITICAL: elasticsearch error (Bulk Insertion Error. Failed item count [4])
transporter: CRITICAL: elasticsearch error (Bulk Insertion Error. Failed item count [3])
transporter: CRITICAL: elasticsearch error (Bulk Insertion Error. Failed item count [4])
transporter: CRITICAL: elasticsearch error (Bulk Insertion Error. Failed item count [4])

Did someone see it before? Here are the nodes of my transporter:
thanhtruong$ ./transporter list --config ./test/config.yaml
Name Type URI
localmongo mongo mongodb://localhost/tokyo_api_development_campaigns
es elasticsearch http://127.0.0.1:9200/2b66d7a9-8cb5-4802-838a-f1f58869bbf5_campaigns

Thanks for your help.

Transporter does not remove deleted document from index

Hi,
I have an issue here, where transporter does not remove deleted documents from index.
Since transporter replaces rivers, I expected such functionality. Is it out of scope of transporter or am I doing something wrong?
Thanks for help

api configuration should be optional

if the api information isn't set in the config.yaml, then transporter currently bails out with an error

 transporter test --config test/config.yaml test/application.js
time: invalid duration

This isn't right, these settings should be optional, and it should fall back to a NoopEmitter in that case

Sync multiple namespaces at the same time

It should be possible to sync more then one namespace with the pipeline.
I can think of a few ways this can work, but in general, I favour the idea of allowing regex / wildcard matches on a namespace. i.e. something like Source({name: "mongo", namespace: "database.*"})
This will cause problems on the sink, as it is expecting a constant namespace. as well, transformers will need to be aware of the messages namespace.

about adapter could do with more details

The transporter about adaptername command works well (though an underline under the headings might be useful visually). What's more important though is that about doesn't tell you if the adapter can be used as a source or a sink or both.

Change _bulk api endpoint to /{index}/_bulk

Hi,
I am currently investigating the possibility of adding ACL to my ES instance. I'm doing that by using a reverse proxy and filter http request based on urls. It seems that transporter access directly the /_bulk endpoint. This api allows edition and deletion of any documents for any index.
Do you think it is possible that transporter uses instead a variation, the /{index}/_bulk endpoint?

facing issue dumping data from mongo to elasticsearch without transformation

Here are the contents of my tes/config.yaml file and tes/application.js file

config.yaml

# api:
#   interval: 60s
#   uri: "http://requestb.in/13gerls1"
#   key: "48593282-b38d-4bf5-af58-f7327271e73d"
#   pid: "something-static"
nodes:
  localmongo:
    type: mongo
    uri: mongodb://localhost/foo
  es:
    type: elasticsearch
    uri: http://localhost:9200/

application.js

pipeline = Source({name:"localmongo", namespace:"foo.bar"}).save({name:"es", namespace:"foo.bar"});

However it fails while inserting to elasticsearch with the error:

failed to parse [_id]

However, it used to work fine earlier ( a month ago ). Also, if I apply a log transform to see the documents, I can see _id is stored differently:

{"_id":{"$oid":"54df376881ee34db87b27377"},"firstName":"Robert","lastName":"Baratheon"}

In the transformation, changing

doc._id = doc._id['$oid'];

does solve the problem though. Just want to confirm if there has been a change in the way mongo _id is handled or I am doing something wrong ?

Intermittent Elasticsearch bulk insertion errors

Hi,

We are disgruntled users of Elasticsearch-Mongo river and eager to replace it with transporter on production. While running transporter on our staging environment, we often see it throwing Elasticsearch bulk insertion errors.

transporter: CRITICAL: elasticsearch error (Bulk Insertion Error. Failed item count [33])

Does Transporter retries to insert failed documents? If not, whats the recommended way to catch these documents in file or something, so that we can insert these document into Elasticsearch through another service (or potentially through Transporter again).

mongo adaptor bulkWrite doesn't check for empty array

panic: runtime error: index out of range

goroutine 146 [running]:
github.com/compose/transporter/pkg/adaptor.(_Mongodb).writeBuffer(0xc208066e70)
/Users/nick/hackery/go/src/github.com/compose/transporter/pkg/adaptor/mongodb.go:238 +0xe18
github.com/compose/transporter/pkg/adaptor.(_Mongodb).bulkWriter(0xc208066e70)

/Users/nick/hackery/go/src/github.com/compose/transporter/pkg/adaptor/mongodb.go:205 +0x61f
created by github.com/compose/transporter/pkg/adaptor.(*Mongodb).Listen
/Users/nick/hackery/go/src/github.com/compose/transporter/pkg/adaptor/mongodb.go:139 +0xa4

goroutine 1 [semacquire]:
sync.(*WaitGroup).Wait(0xc2080e91a0)
/usr/local/Cellar/go/1.4/libexec/src/sync/waitgroup.go:132 +0x169
main.main()

rename message.Msg.Document to message.Msg.Data

there are times when we want to pass data through the message that isn't a map[string]interface{} (or bson.M.
to fix this, make Msg.Data an interface{} type, and do type checking in the adaptors when we need to know the type

adaptor for mysql

need adaptor for mysql.

Enhancement: Add the ability to save and restore the state of an adaptor

Currently, if transporter fails, it is not able to start back where it left off during a copy/tail. In order to support this, each adaptor will want to save/persist the most recent document it has processed. We should be able to support multiple types of persistent stores using a simple interface like so:

type SessionStore interface {
    Set(path string, msg *message.Msg) error
    Get(path string) (string, int64, error)
}

The path would typically be a combination of the Transporter key and the node path. When retrieving the last known State, we will only return the last _id and the timestamp of the operation. It may be necessary in the future to support retuning the entire message.Msg.

Ideally, an implementing class will not constantly write the last operation but have the ability to "flush" on an interval which can be defined in the config.yaml.

The beginning of this can be seen in adaptor-state branch but it is incomplete. It currently introduces a sessionTicker where the Pipeline will call the Set func for each Node. As of right now, I have not added the ability to retrieve/get the state during startup/initialization.

Thoughts and feedback?

Can transporter be used to sync a target collection with ElasticSearch?

I originally asked this question through MongoHQ/Compose support and was told to post it here instead; please let me know if I should provide additional details.

In our implementation, the documents we index within ElasticSearch are composites of multiple collections in our source MongoDB database. I would like to to specify a given collection within the database as the source of a transporter and then update our application to write "search-formatted" documents to that collection.

Is that currently possible or in the short-term road map for transporter?

gorethink package blocking build from source

FYI @nstott @jipperinbham:

➜  transporter git:(master) go get -a ./cmd/...
# gopkg.in/dancannon/gorethink.v0
../../../gopkg.in/dancannon/gorethink.v0/connection.go:22: undefined: ql2.Response_ResponseNote

Elasticsearch errors

First of all, this project is amazing! :)

I am currently trying to insert documents from a mongo collection into elasticsearch. It works for most of the documents, but I see also a lot of errors like below. The first part, I understand (because there is no API endpoint), but I do not understand what the Elasticsearch error (%!s(\u003cnil\u003e)) means.

I am using this transform function. Does the elastic search object need to have a certain format?


module.exports = function(doc) {
    var saveDoc = {
        _id: JSON.stringify(doc._id),
        text: doc.text,
        meta: doc.meta
    }
    return saveDoc
}

transporter: EventEmitter Error: http error code, expected 200 or 201, got 405, ({"ts":1419023160,"name":"error","path":"","record":{"_id":"316254105985245184"},"message":"ERROR: Elasticsearch error (%!s(\u003cnil\u003e))"})

Adaptor configs are too opaque

There's no way to query adaptors for their configuration options.
Adaptors take very specific flags in their config and we need to expose them to any commands.
My current thoughts is that the adaptor package should have a config registry, just as it has a registry of constructors.
Each config struct can then document itself via reflection tags, something like

type WhateverConfig struct {
    Uri string `json:"uri" transporter:"the uri to use to connect"`
}

this will address some of the concerns raised in #12

File adapter + Transformer works but emits unlabeled warning

Run any Transformer with the File adapter and the process will work but the console will be full of

unknown type: map[string]interface {}
unknown type: map[string]interface {}
unknown type: map[string]interface {}
unknown type: map[string]interface {}
...

Reason is the File adapter emits a map. The map is detected by the Transformer's transformOne function. This then calls mejson.Marshal(msg.Data). Mejson has no code to match with a map[string]interface() so it falls through to default...

default:
        fmt.Fprintf(os.Stderr, "unknown type: %T\n", v)
        return json.Marshal(v)
    }

which emits the error but still marshals the value allowing the code to work.

Don't know whether to fix in mejson or in the transformer.

How can i make the transporter to run as service/daemon ?

Here is the logs :
[tungns@server3 transporter]$ transporter run --config ./test/config.yaml ./test/application.js
Mongo Config {URI:mongodb://xxx:[email protected]:xxxx/nameDB nameDB.bar Debug:true Tail:false Wc:0 FSync:false Bulk:false}
setting start timestamp: 6157120426687332352

But when I checked it does not sync!
Pls help me!
Thanks all!!

Bug: unable to use a config node for source and sink

The following does not work anymore due to the changes to how the config gets merged between the yaml and JS:

transporter eval --config ./test/config.yaml 'Source({name:"localmongo", namespace: "boom.foo"}).save({name:"localmongo", namespace: "boom.baz"})'

It results in:

Mongo Config {URI:mongodb://localhost/boom Namespace:boom.baz Debug:true Tail:false}
Mongo Config {URI:mongodb://localhost/boom Namespace:boom.baz Debug:true Tail:false}
setting start timestamp: 6100960276138426368

where the namespace is the same for both source and sink.

Message is noisy when no id present

If incoming documents have no id or _id field, the extractID function in message.go will emit an error and the document in full. (line 59 of message.go). There's no way to supress this so if you are using the transporter to import un-id'd raw data with the intent of letting the target database create an id for it, you'll get a lot of errors on stdout. Options? Remove the error printing/move the error to stderr/add a bool noid argument to the NewMsg which when true, skips the id manipulation at line 39 and the generation of the error? (I prefer the latter as it would allow adapter authors to decide on behaviour).

In MongoDB "_id" is a custom string field, when trying to transfer data to ElasticSearch getting parse error on "_id" field.

MongoDB that I'm using have a custom type "_id" field which is a string representation of GUID such as "21EC2020-3AEA-4069-A2DD-08002B30309D".
I've setup the application to get the data transferred to ElasticSearch without any transformations and I'm getting an error:
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [_id].
The index in ES have "_id" field specified as string.
I've been using River plugin so far and never came across this problem.
So I wonder is it a problem with the way transporter is assuming that this field will be an object or the fault is actually on ES side?

How can i make the transporter to run as service/daemon ?

How can i make the transporter to run as a background/daemon mode? right now after run the transporter command, it was quite after that. Here is the logs :
thanhtruong$ ./transporter run --config ./test/config.yaml ./test/aaa.js
Mongo Config {URI:mongodb://localhost/testdb Namespace:testdb.foo Debug:true Tail:false Wc:0 FSync:false Bulk:false}
setting start timestamp: 6112294905285967872
thanhtruong$

Thanks

Elasticsearch bulk insertion error

I can't get transporter to work with mongo. What I'm missing? Thanks.

My env

ubuntu 14.04
transporter 0.0.3
mongodb 2.6.10
elasticsearch 1.5.2

My mongo collection

> use boom
switched to db boom
> db.foo.find()
{ "_id" : ObjectId("558463dc13c7ddef81f3bc76"), "name" : "bar" }

My config

nodes:
  mongodb:
    type: mongo
    uri: mongodb://localhost/boom
    namespace: boom.foo
    tail: true
    debug: true
  es:
    type: elasticsearch
    uri: http://localhost:9200/boom
    namespace: boom.foo

My app

pipeline = Source({name:"mongodb"}).save({name:"es"})

Run transporter

$ transporter run --config config.yaml app.js
Mongo Config {URI:mongodb://localhost/boom Namespace:boom.foo Debug:true Tail:true Wc:0 FSync:false Bulk:false}
setting start timestamp: 6162164565128773632

After a while

transporter: CRITICAL: elasticsearch error (Bulk Insertion Error. Failed item count [1])

Elasticsearch error log

[2015-06-19 16:05:42,468][DEBUG][action.bulk              ] [Perseus] [boom][3] failed to execute bulk item (index) index {[boom][foo][AU4NNllCvshBiB_rZsxF], source[{"_id":"558463dc13c7ddef81f3bc76","name":"bar"}]}
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [_id]
    at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:409)
    at org.elasticsearch.index.mapper.internal.IdFieldMapper.parse(IdFieldMapper.java:295)
    at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
    at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
    at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
    at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:453)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:432)
    at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:149)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:515)
    at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:422)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.index.mapper.MapperParsingException: Provided id [AU4NNllCvshBiB_rZsxF] does not match the content one [558463dc13c7ddef81f3bc76]
    at org.elasticsearch.index.mapper.internal.IdFieldMapper.parseCreateField(IdFieldMapper.java:310)
    at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:399)
    ... 13 more

Comment doesnt reflect jsonlog output

In https://github.com/compose/transporter/blob/master/pkg/events/emitter.go#L217

The comment is cut and pasted from above so the example is of a LogEmitter not a JSONLog.

MongoDB adapter should test oplog access first

It would be useful if the MongoDB adapter tested oplog access before copying the initial data set over to the destination. This would allow it to error out immediately instead of waiting for a potentially large data set to be copied over and then erroring.

Unable to get in Ubuntu

Hey guys,

I simply can't get/build:

sudo go get -a ./cmd/...
/usr/lib/go/src/pkg/code.google.com/p/goprotobuf/proto/text.go:39:2: no Go source files in /usr/lib/go/src/pkg/encoding

unable to build

You provide a command to build which doesn't appear to be working on at least two of the Macs I have.

$ go version
go version go1.4 darwin/amd64

$ git clone https://github.com/compose/transporter.git
$ cd transporter
$ go build -a ./cmd/...
cmd/transporter/javascript_builder.go:8:2: cannot find package "github.com/compose/transporter/pkg/adaptor" in any of:
    /usr/local/Cellar/go/1.4/libexec/src/github.com/compose/transporter/pkg/adaptor (from $GOROOT)
    ($GOPATH not set)
cmd/transporter/javascript_builder.go:9:2: cannot find package "github.com/compose/transporter/pkg/transporter" in any of:
    /usr/local/Cellar/go/1.4/libexec/src/github.com/compose/transporter/pkg/transporter (from $GOROOT)
    ($GOPATH not set)
cmd/transporter/command.go:7:2: cannot find package "github.com/mitchellh/cli" in any of:
    /usr/local/Cellar/go/1.4/libexec/src/github.com/mitchellh/cli (from $GOROOT)
    ($GOPATH not set)
cmd/transporter/javascript_builder.go:10:2: cannot find package "github.com/nu7hatch/gouuid" in any of:
    /usr/local/Cellar/go/1.4/libexec/src/github.com/nu7hatch/gouuid (from $GOROOT)
    ($GOPATH not set)
cmd/transporter/javascript_builder.go:11:2: cannot find package "github.com/robertkrimen/otto" in any of:
    /usr/local/Cellar/go/1.4/libexec/src/github.com/robertkrimen/otto (from $GOROOT)
    ($GOPATH not set)
cmd/transporter/config.go:9:2: cannot find package "gopkg.in/yaml.v2" in any of:
    /usr/local/Cellar/go/1.4/libexec/src/gopkg.in/yaml.v2 (from $GOROOT)
    ($GOPATH not set)

So I'm either stuck because everything isn't checked in, version conflict with go or I have no clue what I'm doing. Any guidance would be great!

Thanks for open sourcing this! I've been thinking about how I could take advantage of transporter on compose.io in production but wanted to build and test locally. This will solve the problem of designing and testing locally!

Custom Adapters are difficult to create deploy/maintain

If a user makes a custom adaptor for Transporter, that adaptor requires edits to the registry file. For this to be maintainable would require a fork. It could be suggested that rather than a custom adapter, the user develop a command using the library components that are available. It seems, though, that this means losing out on the flexibility of being able to create pipelines in JavaScript unless you roll your own pipeline generator.

So a user is left with two unenviable choices if they want an adapter and the flexibility of Transporter configuration

A) fork the Transporter and maintain it with a custom adaptor in place
B) build a command which uses the Transporter libraries and either forego or reimplement the configurable pipeline.

If it were possible to build and install standalone adaptors which were run and managed by the transporter and used some form of IPC/Messaging to communicate with the Transporter pipeline, it would seem to be a more flexible solution.

facing issue while fetching dependencies

I was trying to install transporter on ubuntu 14.04. Earlier it used to work fine. I'm trying on a new machine and after executing the command:

go get -a ./cmd/...

I am getting the error:

go install: no install location for directory /home/mandeep/go/transporter/cmd/transporter outside GOPATH

Namespaces

Hi!
I'm trying to use this library to sync mongodb with elasticsearch but I have a malformed mongo namespace
This is my config.yaml:

nodes:
  mongodb:
    type: mongo
    uri: http://mongodb:27017
  es:
    type: elasticsearch
    uri: http://es:9200

and this is my application.js

pipeline = Source({name:"mongodb", namespace:"tf"}).save({name:"es", namespace:"tf"});

Obviously I don't understand namespaces (perhaps the lack of documentation has its part on this)
What are namespaces?

How can I configure transporter to replicate a whole database?

Thanks!

No Binary Builds available

There are no ready to go Transporter binary builds available. There should be as we'd like people to exercise the Transporter engine without having to install Go, adapt to the build system of Go, git clone and build.

Error while using tail

Hi,

After a lot of tests, I can't find a way to use tail option between mongodb and elasticsearch.

I have created same user on user database and system local database, also tried on admin database
With this user I can log and query local["oplog.rs"] collection using mongodb shell but transporter return error

CRITICAL: Mongodb error (error reading collection not authorized for query on local.oplog.rs)

I tried several uri format, my understanding is that this pattern should work
mongodb://:@xxxxxx.mongolab.com:12345/local?authSource=db-stats

Could you see if this is an issue or a misconfiguration from my side.
Regards

MongoDB is version 2.4.10 hosted on mogolab
Darwin Kernel Version 14.3.0: Mon Mar 23 11:59:05 PDT 2015; root:xnu-* 2782.20.48~5/RELEASE_X86_64
transporter 0.0.3

Version number has not updated to 0.0.4

Version is hard-defined in pkg/transporter/pipeline.go and is still 0.0.3.