Comments (13)
from stevejobs.
Thank you for your answer.
Database setup is simple : one single mongo (v4.2.1) node running on a distinct server with replication in place (but no nodes to replicate to, we only set it as a single replica of a replica set to use Oplog tailing on the meteor side)
Jobs are checking data into the database to send emails to specific users.
I did saw some error log in my server log like this one, but I'm not sure how this is relevant :
Exception in setInterval callback: MongoError: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1590604213, 1) } with id: 6797469174001041409
at Connection.<anonymous> (/var/www/application/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:450:61)
from stevejobs.
This happened again. Here is the log from the time when jobs_dominator_3
collection stopped updating:
Exception in setTimeout callback: Error: read ECONNRESET at TCP.onStreamRead (internal/stream_base_commons.js:205:27)
Exception in setInterval callback: MongoError: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1593369058, 1) } with id: 6797469174001041409
at Connection.<anonymous> (/var/www/application/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:450:61)
from stevejobs.
from stevejobs.
Yes, only one server.
I updated to v4 and I have the same issue. Any hint on where to look in your code to add logging in order to debug this ?
from stevejobs.
from stevejobs.
I think I got it for my setup: this mongo unavailability raise when runing backups of the database.
I added some try/catch blocks to dominator queries (find and remove), I will let you know if they fix the issue and propose the update.
from stevejobs.
from stevejobs.
Thanks for following up
I tried to add a few try/catch in the code but I couldn't make the jobs to run again as excepted (errors are now caught and I don't see any other uncaught in the logs that are directly related...).
Here are the functions where I added a try/catch and where I saw errors being caught:
dominator.getActive
queue.prototype.run
dominator.setAsActive
I still have the following uncaught but I'm not sure how they are related to the current issue:
'0': 'Exception in setInterval callback: MongoError: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1594405818, 1) } with id: 6797469174001041409\n' +
' at Connection.<anonymous> (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:450:61)\n' +
' at Connection.emit (events.js:311:20)\n' +
' at Connection.EventEmitter.emit (domain.js:482:12)\n' +
' at processMessage (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:384:10)\n' +
' at Socket.<anonymous> (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:553:15)\n' +
' at Socket.emit (events.js:311:20)\n' +
' at Socket.EventEmitter.emit (domain.js:482:12)\n' +
' at addChunk (_stream_readable.js:294:12)\n' +
' at readableAddChunk (_stream_readable.js:275:11)\n' +
' at Socket.Readable.push (_stream_readable.js:209:10)\n' +
' at TCP.onStreamRead (internal/stream_base_commons.js:186:23)'
and that one, that seems related to zuuk:stale-session
'0': 'Exception in setInterval callback: MongoError: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1594405818, 1) } with id: 6797469174001041409\n' +
' at Connection.<anonymous> (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:450:61)\n' +
' at Connection.emit (events.js:311:20)\n' +
' at Connection.EventEmitter.emit (domain.js:482:12)\n' +
' at processMessage (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:384:10)\n' +
' at Socket.<anonymous> (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:553:15)\n' +
' at Socket.emit (events.js:311:20)\n' +
' at Socket.EventEmitter.emit (domain.js:482:12)\n' +
' at addChunk (_stream_readable.js:294:12)\n' +
' at readableAddChunk (_stream_readable.js:275:11)\n' +
' at Socket.Readable.push (_stream_readable.js:209:10)\n' +
' at TCP.onStreamRead (internal/stream_base_commons.js:186:23)\n' +
' => awaited here:\n' +
' at Function.Promise.await (/var/www/app/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:56:12)\n' +
' at packages/mongo/mongo_driver.js:1049:14\n' +
' at /var/www/app/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40\n' +
' => awaited here:\n' +
' at Promise.await (/var/www/app/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:60:12)\n' +
' at SynchronousCursor._nextObject (packages/mongo/mongo_driver.js:1098:38)\n' +
' at SynchronousCursor.forEach (packages/mongo/mongo_driver.js:1112:22)\n' +
' at SynchronousCursor.map (packages/mongo/mongo_driver.js:1122:10)\n' +
' at SynchronousCursor.fetch (packages/mongo/mongo_driver.js:1146:17)\n' +
' at Cursor.<computed> (packages/mongo/mongo_driver.js:894:44)\n' +
' at Cursor.cursorProto.<computed> (packages/lmachens_kadira.js:3353:32)\n' +
' at Cursor.kadira_Cursor_fetch [as fetch] (packages/lmachens_kadira.js:3747:32)\n' +
' at Object.<anonymous> (packages/matb33_collection-hooks.js:484:79)\n' +
' at Object.collection.<computed> [as update] (packages/matb33_collection-hooks.js:150:21)\n' +
' at Collection.update (packages/mongo/collection.js:588:31)\n' +
' at Collection.Mongo.Collection.<computed> [as update] (packages/aldeed:collection2/collection2.js:214:19)\n' +
' at packages/zuuk_stale-session.js:55:18\n' +
' at Meteor.EnvironmentVariable.EVp.withValue (packages/meteor.js:1234:12)\n' +
' at packages/meteor.js:550:25\n' +
' at runWithEnvironment (packages/meteor.js:1286:24)'
Once it stopped, I tried to run a job manually again (with a server-side method I call to run Jobs.execute(id)
) I got an error like Jobs: Unable to execute job - queue is busy: myjob/r2ecan5aACartW9Kz
When the mongo connexion is lost for some time (about one minute), it stops. I think it's not so difficult to reproduce with a local setup running mongo on the side with MONGO_URL and then stopping and restarting it.
from stevejobs.
@msavin We also observed this problem. Database seemed to have a hickup and this breaks the job queue. In our logs we find:
Exception in setInterval callback: MongoError: Pool was force destroyed at Pool.destroy (/src/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:679:21) at Server.destroy (/src/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/topologies/server.js:902:15) at Timeout._onTimeout (/src/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/topologies/replset.js:357:26) at listOnTimeout (internal/timers.js:549:17) at processTimers (internal/timers.js:492:7)
=> awaited here:
at Function.Promise.await (/src/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:56:12)
at packages/mongo/mongo_driver.js:1073:14
at /src/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40
=> awaited here:
at Promise.await (/src/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:60:12)
at SynchronousCursor._nextObject (packages/mongo/mongo_driver.js:1122:38)
at SynchronousCursor.forEach (packages/mongo/mongo_driver.js:1136:22)
at SynchronousCursor.map (packages/mongo/mongo_driver.js:1146:10)
at SynchronousCursor.fetch (packages/mongo/mongo_driver.js:1170:17)
at Cursor.<computed> [as fetch] (packages/mongo/mongo_driver.js:918:44)
...
at queue.grabDoc (packages/msavin:sjobs/server/imports/operator/queue/index.js:65:36)
at queue.run (packages/msavin:sjobs/server/imports/operator/queue/index.js:100:20)
at queue.trigger (packages/msavin:sjobs/server/imports/operator/queue/index.js:55:9)
at Meteor.EnvironmentVariable.EVp.withValue (packages/meteor.js:1234:12)
can we catch this gracefully?
from stevejobs.
@vparpoil did you manage to solve it? Any idea you can share? Or do you have a workaround?
from stevejobs.
@msavin do you have any idea? its unfortunatly very urgent and i have to come up with a solution these days
from stevejobs.
i think i found the problem:
just catching the error in trigger
is not enough, you have to set self.available = true;
afterwards.
i opened a pr #95
Edit: i initialy set available= true i a finally block, but that would set it always. We observed jobs running twice, so i think its because of that.
By the way: We run jobs only on a dedicated instance (its marked with an env var and is not part of the load balancer). I think distributing potentially cpu-bound work accross web-applications is often not a good idea as you might block user requests. Its better to run it on them on a dedicated workers
from stevejobs.
Related Issues (20)
- List of available jobs to be run HOT 5
- Self is not defined error in 4.2.0 HOT 1
- Jobs Stuck Pending on Galaxy HOT 3
- server/imports/utilities/collection/index.js --> Cannot read property 'parent' of null HOT 4
- Prevent queue from stopping after failure HOT 8
- How to resume pending jobs when the server is restarted? HOT 1
- Missing index?
- Meteor 3.0 support (Fibers-free)
- Jobs.find not part of Atmosphere package HOT 4
- So what is the date format that's valid HOT 1
- Collection hooks HOT 1
- Version 3 causing rendering issue on instagram browser on iPhone HOT 2
- Possible to convert this to a pure Node.js package removing Meteor? HOT 3
- jobs are not run consecutively HOT 2
- Jobs: Unable to execute job - queue is busy: HOT 6
- Jobs: Unable to execute job - queue is busy, randomly appearing. HOT 2
- async / await Failure HOT 7
- Know if a queue is still processed HOT 2
- work with meteor 2.3 ? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stevejobs.