honeycombio / beeline-nodejs Goto Github PK
View Code? Open in Web Editor NEWLegacy instrumentation for node.js applications with Honeycomb
Home Page: https://honeycomb.io
License: Apache License 2.0
Legacy instrumentation for node.js applications with Honeycomb
Home Page: https://honeycomb.io
License: Apache License 2.0
context doesn't seem to get propagated for native node async/await usage.
There appears to be an issue with the mongodb instrumentation where promises don't return any value. This means any promise payloads & errors get swallowed
See https://zeit.co/blog/streaming-server-rendering-at-spectrum
In general node streams are an interesting problem. Do we want to generate an event for every on('data')
callback? what about the push/source side of things?
not everyone uses mysql :)
Should walk through the basics of shimming, how/where to monkeypatch in, etc.
Also should ๐ฏgive some tips on testing
honeycombio/honeycomb-rails sends to two datasets: dataset
and db_dataset
. for the nodejs stuff we likely need (at least 3): app
, db
, and services
where services
is requests made to non-db external resources.
There's no reason we couldn't create the traces in the http/http instrumentations. That would let other http frameworks possibly gain the ability to trace, even if we don't handle the framework itself.
I say "possibly" because there's no telling if the async context will get propagated everywhere it needs to be.
the mongodb instrumentation now supports wrapping methods that don't take callbacks, and also supports methods that return promises. should be pretty easy to add listIndexes
support now.
Would it make sense for this instrumentation to automatically send memory usage and uptime?
There's an example in libhoney-js of sending memory in use. It sends rss_before, heapTotal_before, heapUsed_before
The golang-gatekeeper sends meta.memory_inuse
Also in the golang-gatekeeper sample uptime is sent as meta.process_uptime_sec
It'd be nice to pick some eslint conventions and add a config for them. Should we use a config similar to the libhoney-js one?
while honeycomb-nodejs-magic
is all kinds of :jazzhands:, it's not a good name. Bikeshedding here, please.
Suggestions so far:
libhoney-auto-node
node <v8 has another api available which might be enough for what we need: async_wrap
.
see: http://blog.trevnorris.com/2015/02/asyncwrap-tutorial-introduction.html
I expect chances are pretty good, since we don't really use that much in the way of async_hook functionality.
The output from this issue doesn't need to be an async_wrap
implementation. It could just be possible/impossible.
In #72 (Automatically send memory usage and uptime) there's a few fields mentioned that might make sense to add to the beeline-nodejs.
I can see adding suggested fields could become difficult as there are a growing number of beelines.
Perhaps it would make sense to have a beeline-specs repo that can be a reference of fields that are shared between beelines?
I suggest this as it's what OpenCensus have done: https://github.com/census-instrumentation/opencensus-specs and they have a similar challenge, writing instrumentation libraries and exporters in many languages, attempting to all conform to the same spec.
Our planned use of Honeycomb appears to be storing everything in a single dataset, as this is currently required to use cross-service tracing. Thus we'd want to make sure to be consistent in our field names.
This might end up being our first top-level instrumentation that is not a web framework. It's useful in both modes, though (pushing things onto the queue vs pulling them off.)
Possible middleware targets:
If we can use this to decorate events with usernames/userids automatically, ๐ฅ
right now it looks like it's invoked at the top-level, which it most definitely should not be.
craft an actual example, and explain better how it behaves (and also what it adds to the event!)
It would be nice to be able to send traces from my OT-instrumented code directly to the HC mothership without requiring the zipkin proxy to translate.
Hi folks,
We're trying to use beeline-nodejs in a service, but are blocked because there is a nsp check
failure. Looks like libhoney uses superagent which has a vulnerability.
Also, when running npm audit
there are a number of vulnerabilities listed.
Could you bump these dependencies so we can give this a go?
Thanks!
(+) 1 vulnerability found
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Large gzip Denial of Service โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Name โ superagent โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ CVSS โ 3.7 (Low) โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Installed โ 2.3.0 โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Vulnerable โ <3.7.0 โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Patched โ >=3.7.0 โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Path โ [email protected] > [email protected] > [email protected] > โ
โ โ [email protected] โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ More Info โ https://nodesecurity.io/advisories/479 โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
API.md is more reference than guide (even with the examples.)
There's probably room for a guide showing just the parts of the API that are likely to be useful for app developers (i.e. missing things like addContext
), and with more "you should do it this way" than "here's a selection of artisanally crafted API for your consideration."
Also either more fleshed out examples or a link to one or more repos with example apps.
Our main app server is a GraphQL service using apollo. It would be super awesome if the node beeline was GQL aware and could pull out events and attributes like resolver paths, resolver execution times, etc. Apollo-server includes some built-in tracing support, which it exposes via the extensions
section in GQL response. I'm not sure if that would be helpful or not..
in my little test app I have a single app.get
:
app.get('/', (req, res) => { ... });
which works. If I try to load /hello
, though, there is no 404 logged.
if express is behind a load balancer of some sort, chances are there's a request id header we can grab and use instead of generating our own.
we should also add a meta.request_id_source
that indicates where we sourced it.
q
is another promise library. less used than bluebird, but given the ease of writing these passthroughs, ๐
Our nodejs stack uses koa rather than express, so we don't benefit from the automagic instrumentation of the request pipeline..
prettier
is pretty opinionated.
We've set them up to fight each other forever.
lint-staged
forces prettier
to change package.json
even if it's listed in .prettierignore
, so all commits will have long package-lock.json
lines wrapped and short package.json
lists collapsed into a single line.
When we check out and npm install
, npm
reverses the change. And, so, ad infinitum.
If I try to add the NodeJS beeline to https://github.com/developit/express-es6-rest-api - I get an error message:
The following modules were required before honeycomb-beeline: express
These modules will not be instrumented. Please ensure honeycomb-beeline is required first.
But the src/index.js
looks like this:
require('honeycomb-beeline')({
writeKey: '<REDACTED>',
datasetName: 'nathanleclaire.examples-node-beeline',
});
import http from 'http';
import express from 'express';
import cors from 'cors';
import morgan from 'morgan';
import bodyParser from 'body-parser';
import initializeDb from './db';
import middleware from './middleware';
import api from './api';
import config from './config.json';
This should work, right?
Beeline assumes that, within a trace, all actions are sequential, and keeps an internal stack of spans to generate parent-child relationships. This assumption breaks down with e.g. concurrent database queries. Is it possible to trace those events and preserve the correct span containment?
can mostly c&p this from the documentation of each library (express, mysql2, etc.)
I have a separate demos repo. should we include an examples/
dir with some/one of them?
Node 10.0.0 - 10.4.0 breaks async hooks wrt Promise
usage.
We need the extra promise instrumentation only in that case. Doing a simple version check should be sufficient (or do we care since it's a really simple case?)
I'd also really rather not litter up instrumentation files with millions of version checks (both for nodejs and for the packages being instrumented.)
should we have minimum-version file names? ex:
lib/instrumentation/express-4.16.3.js
lib/instrumentation/express-4.0.0.js
lib/instrumentation/express.js
where we pick all that match and apply them (in some order)? ugh.
mysql2 is the only package without in-package tests, and should absolutely have them. mock the mysql2 package and instrument that?
testing is going to be ... exceedingly involved I think.
That said, it needs to be there.
It'd be useful to have the active set of instrumentations (and count?) sent along with every app-level event.
active_instrumentation: "express, mongoose, mysql2", // sorted list stringified, e.g.
active_instrumentation_count: 3
right now we know express
, mysql2
, react
generated an event, but it'd be useful to include the npm version of the package.
GraphQL instrumentation could be super cool.
TL;DR every field and object in an API has it's own resolver (function which resolves the value), so you could instrument those to quickly figure out which resolvers are the slowest. There are existing products that focus exclusively on this (most notably engine.apollographql.com), but having that data in conjunction with all my other tracing data would be ๐ฏ
Some references from the @apollographql folks: https://github.com/apollographql/apollo-tracing
I can't get it to work on my express installation. I am using 4.16.2
I have manually added some logging and below is the result of:
let instrumentExpress = function(express) {
console.log('instrumentExpress', express.Route.prototype)
instrumentExpress Route {
_handles_method: [Function: _handles_method],
_options: [Function: _options],
dispatch: [Function: dispatch],
all: [Function: all],
acl: [Function],
bind: [Function],
checkout: [Function],
connect: [Function],
copy: [Function],
delete: [Function],
get: [Function],
head: [Function],
link: [Function],
lock: [Function],
'm-search': [Function],
merge: [Function],
mkactivity: [Function],
mkcalendar: [Function],
mkcol: [Function],
move: [Function],
notify: [Function],
options: [Function],
patch: [Function],
post: [Function],
propfind: [Function],
proppatch: [Function],
purge: [Function],
put: [Function],
rebind: [Function],
report: [Function],
search: [Function],
subscribe: [Function],
trace: [Function],
unbind: [Function],
unlink: [Function],
unlock: [Function],
unsubscribe: [Function] }
no original function use to wrap
Right now we shim app.handle
and res.send
when really we should probably just have it insert a middleware. There's a bunch of useful fields we don't have direct access to given the current instrumentation.
the bluebird magic ended up not needing to wrap .catch
and .caught
(as they both call .then
).
I'm guessing mpromise is similar, but we don't currently have a test app that uses it.
I get this error when running sequelize queries:
Executing (default): SELECT `id`, `title`, `description`, `price`, `imageURL`, `createdAt`, `updatedAt` FROM `products` AS `product`;
TypeError: Cannot read property 'setMaxListeners' of undefined
at Utils.Promise (C:\Users\Alex\WebstormProjects\untitled\node_modules\sequelize\lib\dialects\mysql\query.js:77:58)
at Promise._execute (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\debuggability.js:313:9)
at Promise._resolveFromExecutor (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:483:18)
at new Promise (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:79:10)
at Query.run (C:\Users\Alex\WebstormProjects\untitled\node_modules\sequelize\lib\dialects\mysql\query.js:57:12)
at Promise.try.then.connection (C:\Users\Alex\WebstormProjects\untitled\node_modules\sequelize\lib\sequelize.js:558:20)
at C:\Users\Alex\WebstormProjects\untitled\node_modules\honeycomb-beeline\lib\async_tracker.js:47:19
at tryCatcher (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\util.js:16:23)
at Promise._settlePromiseFromHandler (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:512:31)
at Promise._settlePromise (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:569:18)
at Promise._settlePromise0 (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:614:10)
at Promise._settlePromises (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:694:18)
at _drainQueueStep (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\async.js:138:12)
at _drainQueue (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\async.js:131:9)
at Async._drainQueues (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\async.js:147:5)
at Immediate.Async.drainQueues [as _onImmediate] (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\async.js:17:14)
at processImmediate (timers.js:632:19)
TypeError: Cannot read property 'setMaxListeners' of undefined
at Utils.Promise (C:\Users\Alex\WebstormProjects\untitled\node_modules\sequelize\lib\dialects\mysql\query.js:77:58)
at Promise._execute (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\debuggability.js:313:9)
at Promise._resolveFromExecutor (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:483:18)
at new Promise (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:79:10)
at Query.run (C:\Users\Alex\WebstormProjects\untitled\node_modules\sequelize\lib\dialects\mysql\query.js:57:12)
at Promise.try.then.connection (C:\Users\Alex\WebstormProjects\untitled\node_modules\sequelize\lib\sequelize.js:558:20)
at C:\Users\Alex\WebstormProjects\untitled\node_modules\honeycomb-beeline\lib\async_tracker.js:47:19
at tryCatcher (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\util.js:16:23)
at Promise._settlePromiseFromHandler (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:512:31)
at Promise._settlePromise (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:569:18)
at Promise._settlePromise0 (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:614:10)
at Promise._settlePromises (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\promise.js:694:18)
at _drainQueueStep (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\async.js:138:12)
at _drainQueue (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\async.js:131:9)
at Async._drainQueues (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\async.js:147:5)
at Immediate.Async.drainQueues [as _onImmediate] (C:\Users\Alex\WebstormProjects\untitled\node_modules\bluebird\js\release\async.js:17:14)
at processImmediate (timers.js:632:19)
Node version is v11.0.0
.
I prepared this snippet for you to illustrate the issue:
// require("honeycomb-beeline")({
// writeKey: process.env.HONEYCOMB_INGEST_KEY
// /* ... additional optional configuration ... */
// });
const express = require("express");
const app = express();
const Sequelize = require("sequelize");
const sequelize = new Sequelize(
"honeycomb-repro",
process.env.LOCAL_MYSQL_USER,
process.env.LOCAL_MYSQL_PASSWORD,
{
dialect: "mysql",
host: "localhost"
}
);
const Item = sequelize.define("item", {
id: {
type: Sequelize.INTEGER,
autoIncrement: true,
allowNull: false,
primaryKey: true
},
title: {
type: Sequelize.STRING,
allowNull: false
}
});
sequelize
.sync()
.then(data => {
return Item.create({
title: "hello honeycomb"
});
})
.then(data => {
return Item.findById(data.dataValues.id);
})
.then(data => {
console.log(data.dataValues);
app.use("/", (req, res, next) => {
Item.findAll()
.then(items => {
res.send(items);
})
.catch(err => console.log(err));
});
app.listen(8080);
});
If you run this (I run it against a local mysql instance) you should see it fail with above error. If you uncomment the first 4 lines and run again you'll see it working.
Right now we rely on these passthrough instrumentations (e.g. mpromise, bluebird) to propagate our async context around.
One could imagine exposing enough of an api such that #36 was fixable without addition to this repo (I would still want the fix upstreamed, but in the general case users should be able to fix gaps themselves).
This onramp is meant to be an easy way to get people started sending events with useful (but generic) data about the node modules they're using. This obviously only gets people so far.
We need to make it super easy to add additional context to events. Individual metrics are somewhat nicely served by the magic.customContext.add(key, value)
api. But that is only nice for single point in time metrics. Timers around functions, or worse - async operations, are much more difficult and gross to add.
@nathanleclaire mentions some NR sugar here: #4 (comment)
adding something along those lines would help in the case where a user isn't using a supported top-level module. Right now the only top-level module is express. People using less popular frameworks (or heck, the framework they wrote on their own) should be able to get reasonable instrumentation out of it just as easily.
given https://github.com/honeycombio/dynsampler-js is now a thing, let's make it super easy for users to use it.
add config-time options for dynamic sampler selection + key generation?
Here's the stack trace (node 8.14.0):
dev_1 | 2018-12-12T01:10:37.360Z honeycomb-beeline:express -------------------
dev_1 | honeycomb-beeline error: we lost our tracking somewhere in the middleware registered:
dev_1 | at Function.<anonymous> (/app/node_modules/express/lib/application.js:220:21)
dev_1 | at Array.forEach (<anonymous>)
dev_1 | at Function.use (/app/node_modules/express/lib/application.js:217:7)
dev_1 | at Object.<anonymous> (/app/app.js:115:5)
dev_1 | at Module._compile (module.js:653:30)
dev_1 | at Object.Module._extensions..js (module.js:664:10)
dev_1 | at Module.load (module.js:566:32)
dev_1 | at tryModuleLoad (module.js:506:12)
dev_1 | at Function.Module._load (module.js:498:3)
dev_1 | at Function._load (/app/node_modules/honeycomb-beeline/lib/instrumentation.js:152:28)
dev_1 | at Module.require (module.js:597:17)
dev_1 | at require (internal/module.js:11:18)
dev_1 | at Object.<anonymous> (/app/index.js:8:11)
dev_1 | at Module._compile (module.js:653:30)
dev_1 | at Object.Module._extensions..js (module.js:664:10)
dev_1 | at Module.load (module.js:566:32)
dev_1 | at tryModuleLoad (module.js:506:12)
dev_1 | at Function.Module._load (module.js:498:3)
dev_1 | at Function.Module.runMain (module.js:694:10)
dev_1 | at startup (bootstrap_node.js:204:16)
dev_1 | at bootstrap_node.js:625:3
dev_1 | please paste this message (everything between the "----" lines) into an issue
dev_1 | at https://github.com/honeycombio/beeline-nodejs/issues. feel free to edit
dev_1 | out any application stack frames if you'd rather not share those
dev_1 | Error
dev_1 | at LibhoneyEventAPI.askForIssue (/app/node_modules/honeycomb-beeline/lib/api/libhoney.js:249:7)
dev_1 | at Object._askForIssue (/app/node_modules/honeycomb-beeline/lib/api/index.js:133:20)
dev_1 | at wrappedNext (/app/node_modules/honeycomb-beeline/lib/instrumentation/express.js:125:15)
dev_1 | at /app/node_modules/body-parser/lib/read.js:130:5
dev_1 | at invokeCallback (/app/node_modules/raw-body/index.js:224:16)
dev_1 | at done (/app/node_modules/raw-body/index.js:213:7)
dev_1 | at IncomingMessage.onEnd (/app/node_modules/raw-body/index.js:273:7)
dev_1 | at emitNone (events.js:106:13)
dev_1 | at IncomingMessage.emit (events.js:208:7)
dev_1 | at endReadableNT (_stream_readable.js:1064:12)
dev_1 | at _combinedTickCallback (internal/process/next_tick.js:139:11)
dev_1 | at process._tickDomainCallback (internal/process/next_tick.js:219:9)
dev_1 |
dev_1 | -------------------
Please let me know if I can provide any other info!
At the moment, we aren't sending a sampleRate in any events, even if the deterministic sampler is in use.
Should check how the go beeline is handling this and duplicate behavior.
If we're going to do MEAN, we need mongodb.
I see the readme says to add issues for desired instrumentations, so I'm filing this :).
We use Amazon's SQS extensively in our systems (occasionally together with SNS), it'd be great if the beeline had instrumentation. Any info that would be helpful around this for eventual implementation?
There are two good pieces here:
Probably pretty easy - mongodb-memory-server is looks easy to deal with. right now we aren't running babel so dealing with all the promises might be a PITA, but shouldn't be too too bad.
This is likely a lot of work. Unclear how to set things up, but it'd be lovely to run whatever test suite mongodb has after our instrumentation has been applied (both with an active trace and without) on travis-ci.
Testing against multiple versions of the mongodb package would be ๐ฅ but I imagine a tremendous increase in complexity/test time.
At present we shim Router.prototype.use
and Router.prototype[METHOD]
(for all http methods) and insert our magic middleware if we need to. The problem is someone could conceivably use app.get("/....")
and then also create a Router
and then app.use("/subroute", router)
.
In this case we'd end up with a magic middleware both in the app's router and in that Router
instance. we should make sure we're only modifying the app's router.
right now the beeline is using package for serviceName. it should be configurable by the user (as part of the configure args, along with dataset name, write key, etc.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.