sysadminmike / couch-to-postgres Goto Github PK
View Code? Open in Web Editor NEWNode libary to stream CouchDB changes into PostgreSQL
License: BSD 2-Clause "Simplified" License
Node libary to stream CouchDB changes into PostgreSQL
License: BSD 2-Clause "Simplified" License
not sure how to do this
Need help with this as i am node beginner and think code needs some re-factoring
Look at: https://www.npmjs.com/package/forever for keeping client up in case of issue
Do with bulk updates setting doc._deleted flag to true
Should be straight forward to do once bulk function in place - as this can be then done with
json_object_set_key(doc::json, '_deleted'::text, true):
Make a function to help which just gets passed the doc id and rev plus array of fields to keep in the deleted doc so not upset anyone with elastic search couch river (https://github.com/elasticsearch/elasticsearch-river-couchdb - Indexing Databases with Multiple Types)
ie bulk submit needs to be something like:
"docs" : [{
"_id: 2,
"_rev" : "rev",
"_deleted" : true,
"type" : "Person"
}]'
When no changes for a while it can take a while before the daemon notices the next change
Perhaps add a field in checkpoint_settings - when sending stuff to couch check after send
Old way:
http_post(url VARCHAR, params VARCHAR, data VARCHAR, contenttype VARCHAR DEFAULT NULL)
is no good any more
Need to collect up current db schema as its currently on my laptop/head/readme
is it possible use couchdb replicator over the postgres database on couch-to-postgres?
I think the daemon may stop following a _changes feed but think all is ok if couch crashes and is restarted in between a watchdog check.
Still testing and not sure if its the daemon part or library - could possible be the npm 'follow' but not sure.
A restart of the daemon is all that is needed to fix this but can leave postgres out of sync with couch
_add
_enable
_disable
_remove
All need to accept json as get/post request and keep it couchy
_feeds_status - list all feeds - current /_status does this - add more info about feed
_status - change to give status about daemon + postgres connection + watchdog + pg_watchdog and any other global info
I dont think works with _attachments - or is ignoring them - as they are in couch and I think postgres is more use manipulating/generating reports/ad hoc queries on the data rather than dealing with attahments.
Not done any tests with them yet
I've installed couch-to-postgres and the http extension for postgres. When I insert from Postgres to Couchdb, postgres disconnects, however in Couchdb data has been succesfully inserted.
This is the change in the trigger function couchdb_put() I made, because with POST it never worked for me:
SELECT status FROM http_put('http://127.0.0.1:5984/' || TG_TABLE_NAME || '/' || NEW.id::text, NEW.doc::text, 'Content-Type:application/json'::text) INTO RES;
This is the test:
INSERT INTO example (id, doc, from_pg) VALUES ('i', json_object('{_id,myvar}','{i, 100}')::jsonb, true);
This is the message I get from wireshark:
PUT /example/i HTTP/1.1\r\n
this is the message I get from the couchdb log:
Sun, 04 Oct 2015 16:29:33 GMT] [info] [<0.439.0>] 127.0.0.1 - - PUT /example/i 201
and this is the postgres log:
2015-10-04 10:33:35 CST [1727-3] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally an
d possibly corrupted shared memory.
2015-10-04 10:33:35 CST [1727-4] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2015-10-04 10:33:35 CST [709-20] LOG: all server processes terminated; reinitializing
2015-10-04 10:33:35 CST [1770-1] LOG: database system was interrupted; last known up at 2015-10-04 10:29:34 CST
2015-10-04 10:33:35 CST [1770-2] LOG: database system was not properly shut down; automatic recovery in progress
2015-10-04 10:33:35 CST [1770-3] LOG: record with zero length at 0/18146D0
2015-10-04 10:33:35 CST [1770-4] LOG: redo is not required
2015-10-04 10:33:35 CST [1770-5] LOG: MultiXact member wraparound protections are now enabled
Versions:
postgresql-9.4
couchdb 1.6.1
pgsql-http 1.1
I would appreciate any help from you, thank you very much in advance!!
G.Carranza
Replace put function with https://github.com/jchris/hovercraft - needs erlang extension For postgres like PL/Perl - any one know if this exists as i think it would be quite simple to embed hovercraft in postgres then and should then be possible to do proper transactions and do very large updates (i havnt tried more than a few dozen docs at the moment) - oif no pl/erlang then perhaps using pl/sh - https://github.com/petere/plsh
cat ~/.bashrc | erl -noshell -s rot13 rot13 | wc
http://www.erlang.org/faq/how_do_i.html
and do something like:
http://www.softwarepassion.com/importing-data-to-couchdb-java-ruby-and-erlang-way/
Maybe have 2 options so one for when postgres and couch are on the same machine and can communicate via pipes (for bulk updates I am sure this will be the fastest method without PL/Erlang plus less bits to go wrong and maybe with exit codes from the shell transactions could be possible) and another version for calls over http.
Need to modify the sql in the library and trigger.
Update readme accordingly
I dont think its possible not to do this without somehow identifying to the trigger where the query came from
I worked through getting this to work on my desktop with PG and couch on our server.
When I copy the folder( couch-to-Postgres) to my server (same server as pg and couch) it will not sync.
I get this on the server. there is blue text that starts with "follow:" This text was green when I first ran it.
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug } +9ms
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug JSON: '{"seq":"25-g1AAAACTeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEhFyjAbmxqaGiRYpzCwFmal5KalpmXmoJHex4LkGRoAFL_oaZIQkyxNDNISjbBpi8LAJ9ZLRw","id":"wo::1010","changes":[{"rev":"2-779eb03dff0f507cbedf9dd44b5c1a24"}],"deleted":true,"doc":{"_id":"wo::1010","_rev":"2-779eb03dff0f507cbedf9dd44b5c1a24","_deleted":true}}' +19ms
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug Object: {
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug seq: '25-g1AAAACTeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEhFyjAbmxqaGiRYpzCwFmal5KalpmXmoJHex4LkGRoAFL_oaZIQkyxNDNISjbBpi8LAJ9ZLRw',
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug id: 'wo::1010',
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug changes: [ { rev: '2-779eb03dff0f507cbedf9dd44b5c1a24' } ],
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug deleted: true,
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug doc: {
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug _id: 'wo::1010',
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug _rev: '2-779eb03dff0f507cbedf9dd44b5c1a24',
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug _deleted: true
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug }
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug } +3ms
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug JSON: '{"seq":"27-g1AAAACTeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEhFyjAbmxqaGiRYpzCwFmal5KalpmXmoJHex4LkGRoAFL_oaZIQ0yxNDNISjbBpi8LAJ-dLR4","id":"wo::12365","changes":[{"rev":"2-e0d0a9a121da8776801f59ccea7cc691"}],"deleted":true,"doc":{"_id":"wo::12365","_rev":"2-e0d0a9a121da8776801f59ccea7cc691","_deleted":true}}' +6ms
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug Object: {
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug seq: '27-g1AAAACTeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEhFyjAbmxqaGiRYpzCwFmal5KalpmXmoJHex4LkGRoAFL_oaZIQ0yxNDNISjbBpi8LAJ-dLR4',
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug id: 'wo::12365',
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug changes: [ { rev: '2-e0d0a9a121da8776801f59ccea7cc691' } ],
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug deleted: true,
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug doc: {
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug _id: 'wo::12365',
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug _rev: '2-e0d0a9a121da8776801f59ccea7cc691',
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug _deleted: true
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug }
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug } +13ms
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug emit: data +3ms
follow:http://administrator:[email protected]:5984/tcsoffice:debug Data from 2021-03-26T20:33:11.106Z +2ms
follow:http://administrator:[email protected]:5984/tcsoffice:debug Req 2021-03-26T20:33:11.106Z timeout=37500, inactivity=30000: http://192.168.0.12:5984/tcsoffice +11ms
example: Starting checkpointer
example: Checkpoint 1 is current next check in: 120 seconds
follow:http://administrator:[email protected]:5984/tcsoffice:debug Req 2021-03-26T20:33:11.106Z made no changes for 30.002s +30s
follow:http://administrator:[email protected]:5984/tcsoffice:debug Stop +1ms
follow:http://administrator:[email protected]:5984/tcsoffice:debug Destroying req 2021-03-26T20:33:11.106Z +4ms
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug destroy +1ms
follow:http://administrator:[email protected]:5984/tcsoffice:debug Checking database: http://192.168.0.12:5984/tcsoffice +4ms
follow:stream http://administrator:[email protected]:5984/tcsoffice:debug write: { data: '', buf: '' } +8ms
follow:http://administrator:[email protected]:5984/tcsoffice:debug Confirmed database: http://192.168.0.12:5984/tcsoffice +7ms
example: {"db_name":"tcsoffice","purge_seq":"0-g1AAAABXeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUnlsQBJhgYg9R8IshIZ8KhNZEiqhyjKAgBm5Rxs","update_seq":"151-g1AAAABXeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUnlsQBJhgYg9R8IshL78KhNZEiqByvizAIAfJkdAw","sizes":{"file":467308,"external":95,"active":25024},"props":{"partitioned":true},"doc_del_count":39,"doc_count":1,"disk_format_version":8,"compact_running":false,"cluster":{"q":2,"n":1,"w":1,"r":1},"instance_start_time":"0"}
follow:http://administrator:[email protected]:5984/tcsoffice:debug Feed query 2021-03-26T20:33:41.278Z: http://192.168.0.12:5984/tcsoffice/_changes?since=11-g1AAAACTeJzLYWBgYMpgTmEQTM4vTc5ISXLIyU9OzMnILy7JAUklMiTV____PyuDOZEhFyjAbmxqaGiRYpzCwFmal5KalpmXmoJHex4LkGRoAFL_oaZwQ0yxNDNISjbBpi8LAJ19LQ4&feed=continuous&heartbeat=30000&include_docs=true +2ms
follow:http://administrator:[email protected]:5984/tcsoffice:debug Remove feed from agent pool: 2021-03-26T20:33:41.278Z +9ms
follow:http://administrator:[email protected]:5984/tcsoffice:debug Good response: 2021-03-26T20:33:41.278Z +1ms
follow:http://administrator:[email protected]:5984/tcsoffice:debug Req 2021
Function needs to deal with status code returns from http_post
Also use POST not PUT requests so no need to edit pgsql-http extension before compilation.
ie implement:
post_docs(docs,chunk_size) - returning recordset of status codes? or just true/false?
web interface to add/remove enable/disable reset initial_since checkpoint restart stream etc
Do a wiki/readme page of example sql queries eg:
SELECT DISTINCT doc->>'type' as doctype, count(doc->>'type')
FROM mytable GROUP BY doctype ORDER BY doctype
And any things which might bite like the ORDER BY issue
we have postgres or couch to dump some statistics about the feeds to
eg inserts/updates/deletes a sec / min / hr
should be pretty easy to pump to couch and use elastic search + kibrana for some pretty graphs
http://www.craigkerstiens.com/2015/05/08/upsert-lands-in-postgres-9.5/
replace the 'select count' and insert or update stuff in update function to make just one request of an upsert to postgres for new/changes to docs - i am guessing but is possible there could be a real performance boost by doing this ... need to test
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.