codex-digital / cypher-stream Goto Github PK
View Code? Open in Web Editor NEWNeo4j Cypher queries as Node.js object streams
Neo4j Cypher queries as Node.js object streams
I've been doing some preliminary benchmarking to determine the performance impact of stream-parsing vs waiting for the complete result set to load.
I took CypherStream and replaced oboe
with request
and ran some sieges against a REST endpoint that runs a cypher query and returns the results.
Jump to the bottom to see the conclusion if you're not interested in the numbers.
Transactions: 10177 hits
Availability: 100.00 %
Elapsed time: 59.45 secs
Data transferred: 96.36 MB
Response time: 0.09 secs
Transaction rate: 171.19 trans/sec
Throughput: 1.62 MB/sec
Concurrency: 15.25
Successful transactions: 10177
Failed transactions: 0
Longest transaction: 1.00
Shortest transaction: 0.00
Transactions: 7705 hits
Availability: 100.00 %
Elapsed time: 59.54 secs
Data transferred: 69.86 MB
Response time: 0.27 secs
Transaction rate: 129.41 trans/sec
Throughput: 1.17 MB/sec
Concurrency: 35.38
Successful transactions: 7705
Failed transactions: 0
Longest transaction: 0.91
Shortest transaction: 0.01
Transactions: 228 hits
Availability: 87.69 %
Elapsed time: 59.57 secs
Data transferred: 3.91 MB
Response time: 13.25 secs
Transaction rate: 3.83 trans/sec
Throughput: 0.07 MB/sec
Concurrency: 50.71
Successful transactions: 228
Failed transactions: 32
Longest transaction: 30.50
Shortest transaction: 0.12
Transactions: 329 hits
Availability: 95.36 %
Elapsed time: 59.95 secs
Data transferred: 6.93 MB
Response time: 11.61 secs
Transaction rate: 5.49 trans/sec
Throughput: 0.12 MB/sec
Concurrency: 63.69
Successful transactions: 329
Failed transactions: 16
Longest transaction: 30.50
Shortest transaction: 0.04
Stream-parsing small, fast data sets doesn't make sense. Being that this is likely the most common use-case, I don't think the current implementation makes sense as a default generic database-querying mechanism.
Some options I can think of:
Query results that require heavy I/O processing could also make sense since you could do the processing eagerly and in parallel. Maybe backpressure would be useful in this case too (I have yet to test whether back-pressure actually works).
I'm assuming the slowdown is due to CPU bottlenecking. I'm not sure if multiple cores are being taken advantage of currently.
When running a basic test example, I get this error.
{ [Error: No operations allowed until you send an INIT message successfully.] message: 'No operations allowed until you send an INIT message successfully.', code: 'Neo.ClientError.Request.Invalid' }
var cypher = require('cypher-stream')('bolt://localhost','neo4j','neo4j')
cypher('MATCH (n:User) RETURN n LIMIT 1')
.on('data', function(result){
console.log(result)
})
.on('error', function(error){
console.log(error)
})
Installed using:
npm install [email protected]
Am I missing something really obvious here? I don't see anything that requires anything else. I'm running neo4j 3.0.4 if that makes a difference as well
Hey @brian-gates, I'm working on thingdom/node-neo4j#143 now, specifically the design of the transactional API that'd wrap this guy. =)
Thinking about a need / use case we at @fiftythree have in our own app, one thing we'd need is the ability to differentiate results across the various queries that make up a transaction.
AFAICT right now, the current cypher-stream design doesn't differentiate. A transaction is a single stream of results, with data
events that don't tell you which query (statement) each result corresponds to.
For example, if the caller doesn't know in advance how many results a query will give, they don't know when the results for one query ends and the next begins.
What are your thoughts for how to achieve that?
Running a server-side process, where webclients connect to via socket.io.
On server startup once a cypherStream(url)
is given,
and based on the webclients it issue a query.
It works fine when the server process is just started. When I reconnect the client, it gives errors.
Getting the following errors (attaching to the error event of cypher-stream):
{ [Error: write after end] statusCode: undefined, body: undefined, jsonBody: undefined }
followed by a stream of [Error: stream.push() after EOF]
Even when I do not connect the cypher stream to the client (so it stays serverside only), I get these errors.
Trying to figure out what underlying stream is having the issue....
I am getting an error with Neo4J 2.3 and Node 4.2.2
[email protected] test /Users/redpanda/work/github/cypher-stream
make test
./node_modules/.bin/mocha -b
(node) child_process: options.customFds option is deprecated. Use options.stdio instead.
․․
1 passing (656ms)
1 failing
expected 'AssertionError
+ expected - actual
+"Error: Query Failure: Invalid input 'i': expected <init> (line 1, column 1)\n\"invalid query\"\n ^"
-"AssertionError: expected 'Error: Query Failure: Invalid input \\'i\\': expected <init> (line 1, column 1 (offset: 0))\\n\"invalid query\"\\n ^' to be 'Error: Query Failure: Invalid input \\'i\\': expected <init> (line 1, column 1)\\n\"invalid query\"\\n ^'"
at Assertion.prop.(anonymous function) [as exactly] (/Users/redpanda/work/github/cypher-stream/node_modules/should/lib/should.js:60:14)
at CypherStream.<anonymous> (/Users/redpanda/work/github/cypher-stream/test/cypher-stream-test.js:50:30)
at emitOne (events.js:77:13)
at CypherStream.emit (events.js:169:7)
at CypherStreamHandleFailure (/Users/redpanda/work/github/cypher-stream/CypherStream.js:182:12)
at applyEach (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:496:20)
at Object.emit (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:1929:10)
at /Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:2323:33
at apply (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:171:14)
at /Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:2300:10
at applyEach (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:496:20)
at emit (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:1929:10)
at emitMatchingNode (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:2104:7)
at /Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:2149:13
at applyEach (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:496:20)
at emit (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:1929:10)
at nodeClosed (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:1494:7)
at /Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:1048:19
at applyEach (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:496:20)
at emit (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:1929:10)
at handleData (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:761:14)
at applyEach (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:496:20)
at Object.emit (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:1929:10)
at IncomingMessage.<anonymous> (/Users/redpanda/work/github/cypher-stream/node_modules/oboe/dist/oboe-node.js:1128:34)
at emitOne (events.js:77:13)
at IncomingMessage.emit (events.js:169:7)
at IncomingMessage.Readable.read (_stream_readable.js:360:10)
at flow (_stream_readable.js:743:26)
at resume_ (_stream_readable.js:723:3)
at doNTCallback2 (node.js:441:9)
at process._tickCallback (node.js:355:17)
make: *** [test] Error 1
npm ERR! Test failed. See above for more details.
Hi there,
Fantastic library! I'm the author/maintainer of node-neo4j, looking to simplify and experiment. (Especially since Neo4j 2.0 embraces Cypher.) This lib was recommended to me by @wfreeman. (Thanks Wes!)
I'm trying it out in our main backend service at @fiftythree, and one thing I noticed was that the stream emits an end
event even in the case of errors. This strikes me as unusual, but after some fair bit of googling, doc reading, and experimentation, I'm not able to confirm that.
I can just say that as a caller, if I want to wrap the streaming in a callback-based API, I can't simply say callback(error)
on error
and callback(null, results)
on end
; I have to keep track of whether an error happened or not myself. I would expect the stream to do that, but am I wrong?
Thanks for the consideration, and great work again!
Cheers,
Aseem
An important use case for us at @fiftythree is to pass custom headers in the underlying requests. (And they vary across requests, so it's not just a static default.)
Examples of how we use those:
driver/version
), btw, so cypher-stream should do this by default I think (e.g. cypher-stream/0.2.1
), but we're also thinking of expanding this to include our app's name, since we'll soon be having two apps talking to the same Neo4j database. (That'll let us differentiate and understand our load across the two.)User_getByEmail
), which we then log. E.g.:Example screenshot of the last case:
So it'd be really valuable to be able to pass custom headers along to Cypher requests.
WDYT of supporting these alternate signature then?
var cypher = require('cypher-stream')({
url: 'http://localhost:7474',
headers: {...} // default headers, if you want to set any; e.g. User-Agent
});
cypher({
query: 'match (user:User {email: {email}}) return user',
params: {email: '[email protected]'},
headers: {...} // e.g. X-Query-Name
})
.on('data', function (result){
console.log(result.user.first_name);
})
.on('end', function() {
console.log('all done');
})
;
Neo4j 2.0 supports a new "transactional Cypher" endpoint which, importantly, returns leaner JSON: just property data, no more hyperlinks. You're already returning just the data, not extracting native node IDs or similar, so in theory this shouldn't lose any functionality.
Would you be open to updating to this? I'd be happy to give a pull request a stab if so. The only thing is that I don't have any experience with Oboe, so it might take me some time. =)
One thing to note is that error handling will get a bit more complex now, because errors will no longer result in a different HTTP response.
http://docs.neo4j.org/chunked/stable/rest-api-transactional.html#rest-api-handling-errors
Hello,
The username/password have to be passed in a single hash like { username: username, password: password }, not 2 separate arguments like it says in the documentation.
Hi,
Can you tell me, how can I pass authentication details using cypher-stream?
For example, if I have to use neo4j driver I could do something like
neo4j.driver("bolt://localhost", neo4j.auth.basic("neo4j", "neo4j"));
Thanks,
I'd recommend tagging your final versions which allows npm and package-managers distinguish semver-updates.
Quick bug first: extractData
recursively calls itself if an object has a data
property, so this means it'll incorrectly drop legitimate property data if there happens to be a property named data
. =)
But wait! That bug's not worth fixing, because (I think) you don't even need that function anymore, since the transactional endpoint returns just property data by default now. So you should just remove it.
But on that note, I'd like to request an option to return the REST format instead of the lean property-only format. Having node and relationship metadata is needed for ORM/OGM-type libraries.
Easy enough to request it:
This could potentially be just another option to add to #10's options-based API, e.g. format
.
WDYT? SGTY? ORLY?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.