Coder Social home page Coder Social logo

Comments (13)

msavin avatar msavin commented on June 16, 2024

from stevejobs.

vparpoil avatar vparpoil commented on June 16, 2024

Thank you for your answer.
Database setup is simple : one single mongo (v4.2.1) node running on a distinct server with replication in place (but no nodes to replicate to, we only set it as a single replica of a replica set to use Oplog tailing on the meteor side)
Jobs are checking data into the database to send emails to specific users.
I did saw some error log in my server log like this one, but I'm not sure how this is relevant :

Exception in setInterval callback: MongoError: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1590604213, 1) } with id: 6797469174001041409
    at Connection.<anonymous> (/var/www/application/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:450:61)

from stevejobs.

vparpoil avatar vparpoil commented on June 16, 2024

This happened again. Here is the log from the time when jobs_dominator_3 collection stopped updating:

Exception in setTimeout callback: Error: read ECONNRESET  at TCP.onStreamRead (internal/stream_base_commons.js:205:27)
Exception in setInterval callback: MongoError: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1593369058, 1) } with id: 6797469174001041409
 at Connection.<anonymous> (/var/www/application/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:450:61)

from stevejobs.

msavin avatar msavin commented on June 16, 2024

from stevejobs.

vparpoil avatar vparpoil commented on June 16, 2024

Yes, only one server.
I updated to v4 and I have the same issue. Any hint on where to look in your code to add logging in order to debug this ?

from stevejobs.

msavin avatar msavin commented on June 16, 2024

from stevejobs.

vparpoil avatar vparpoil commented on June 16, 2024

I think I got it for my setup: this mongo unavailability raise when runing backups of the database.
I added some try/catch blocks to dominator queries (find and remove), I will let you know if they fix the issue and propose the update.

from stevejobs.

msavin avatar msavin commented on June 16, 2024

from stevejobs.

vparpoil avatar vparpoil commented on June 16, 2024

Thanks for following up 👍
I tried to add a few try/catch in the code but I couldn't make the jobs to run again as excepted (errors are now caught and I don't see any other uncaught in the logs that are directly related...).
Here are the functions where I added a try/catch and where I saw errors being caught:

dominator.getActive
queue.prototype.run
dominator.setAsActive

I still have the following uncaught but I'm not sure how they are related to the current issue:

  '0': 'Exception in setInterval callback: MongoError: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1594405818, 1) } with id: 6797469174001041409\n' +
    '    at Connection.<anonymous> (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:450:61)\n' +
    '    at Connection.emit (events.js:311:20)\n' +
    '    at Connection.EventEmitter.emit (domain.js:482:12)\n' +
    '    at processMessage (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:384:10)\n' +
    '    at Socket.<anonymous> (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:553:15)\n' +
    '    at Socket.emit (events.js:311:20)\n' +
    '    at Socket.EventEmitter.emit (domain.js:482:12)\n' +
    '    at addChunk (_stream_readable.js:294:12)\n' +
    '    at readableAddChunk (_stream_readable.js:275:11)\n' +
    '    at Socket.Readable.push (_stream_readable.js:209:10)\n' +
    '    at TCP.onStreamRead (internal/stream_base_commons.js:186:23)'

and that one, that seems related to zuuk:stale-session

'0': 'Exception in setInterval callback: MongoError: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1594405818, 1) } with id: 6797469174001041409\n' +
    '    at Connection.<anonymous> (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:450:61)\n' +
    '    at Connection.emit (events.js:311:20)\n' +
    '    at Connection.EventEmitter.emit (domain.js:482:12)\n' +
    '    at processMessage (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:384:10)\n' +
    '    at Socket.<anonymous> (/var/www/app/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/connection.js:553:15)\n' +
    '    at Socket.emit (events.js:311:20)\n' +
    '    at Socket.EventEmitter.emit (domain.js:482:12)\n' +
    '    at addChunk (_stream_readable.js:294:12)\n' +
    '    at readableAddChunk (_stream_readable.js:275:11)\n' +
    '    at Socket.Readable.push (_stream_readable.js:209:10)\n' +
    '    at TCP.onStreamRead (internal/stream_base_commons.js:186:23)\n' +
    ' => awaited here:\n' +
    '    at Function.Promise.await (/var/www/app/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:56:12)\n' +
    '    at packages/mongo/mongo_driver.js:1049:14\n' +
    '    at /var/www/app/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40\n' +
    ' => awaited here:\n' +
    '    at Promise.await (/var/www/app/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:60:12)\n' +
    '    at SynchronousCursor._nextObject (packages/mongo/mongo_driver.js:1098:38)\n' +
    '    at SynchronousCursor.forEach (packages/mongo/mongo_driver.js:1112:22)\n' +
    '    at SynchronousCursor.map (packages/mongo/mongo_driver.js:1122:10)\n' +
    '    at SynchronousCursor.fetch (packages/mongo/mongo_driver.js:1146:17)\n' +
    '    at Cursor.<computed> (packages/mongo/mongo_driver.js:894:44)\n' +
    '    at Cursor.cursorProto.<computed> (packages/lmachens_kadira.js:3353:32)\n' +
    '    at Cursor.kadira_Cursor_fetch [as fetch] (packages/lmachens_kadira.js:3747:32)\n' +
    '    at Object.<anonymous> (packages/matb33_collection-hooks.js:484:79)\n' +
    '    at Object.collection.<computed> [as update] (packages/matb33_collection-hooks.js:150:21)\n' +
    '    at Collection.update (packages/mongo/collection.js:588:31)\n' +
    '    at Collection.Mongo.Collection.<computed> [as update] (packages/aldeed:collection2/collection2.js:214:19)\n' +
    '    at packages/zuuk_stale-session.js:55:18\n' +
    '    at Meteor.EnvironmentVariable.EVp.withValue (packages/meteor.js:1234:12)\n' +
    '    at packages/meteor.js:550:25\n' +
    '    at runWithEnvironment (packages/meteor.js:1286:24)'

Once it stopped, I tried to run a job manually again (with a server-side method I call to run Jobs.execute(id)) I got an error like Jobs: Unable to execute job - queue is busy: myjob/r2ecan5aACartW9Kz

When the mongo connexion is lost for some time (about one minute), it stops. I think it's not so difficult to reproduce with a local setup running mongo on the side with MONGO_URL and then stopping and restarting it.

from stevejobs.

macrozone avatar macrozone commented on June 16, 2024

@msavin We also observed this problem. Database seemed to have a hickup and this breaks the job queue. In our logs we find:

Exception in setInterval callback: MongoError: Pool was force destroyed at Pool.destroy (/src/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/connection/pool.js:679:21) at Server.destroy (/src/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/topologies/server.js:902:15) at Timeout._onTimeout (/src/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/core/topologies/replset.js:357:26) at listOnTimeout (internal/timers.js:549:17) at processTimers (internal/timers.js:492:7)
=> awaited here:
at Function.Promise.await (/src/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:56:12)
at packages/mongo/mongo_driver.js:1073:14
at /src/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/fiber_pool.js:43:40
=> awaited here:
at Promise.await (/src/bundle/programs/server/npm/node_modules/meteor/promise/node_modules/meteor-promise/promise_server.js:60:12)
at SynchronousCursor._nextObject (packages/mongo/mongo_driver.js:1122:38)
at SynchronousCursor.forEach (packages/mongo/mongo_driver.js:1136:22)
at SynchronousCursor.map (packages/mongo/mongo_driver.js:1146:10)
at SynchronousCursor.fetch (packages/mongo/mongo_driver.js:1170:17)
at Cursor.<computed> [as fetch] (packages/mongo/mongo_driver.js:918:44)
...
at queue.grabDoc (packages/msavin:sjobs/server/imports/operator/queue/index.js:65:36)
at queue.run (packages/msavin:sjobs/server/imports/operator/queue/index.js:100:20)
at queue.trigger (packages/msavin:sjobs/server/imports/operator/queue/index.js:55:9)
at Meteor.EnvironmentVariable.EVp.withValue (packages/meteor.js:1234:12)

can we catch this gracefully?

from stevejobs.

macrozone avatar macrozone commented on June 16, 2024

@vparpoil did you manage to solve it? Any idea you can share? Or do you have a workaround?

from stevejobs.

macrozone avatar macrozone commented on June 16, 2024

@msavin do you have any idea? its unfortunatly very urgent and i have to come up with a solution these days

from stevejobs.

macrozone avatar macrozone commented on June 16, 2024

i think i found the problem:

just catching the error in trigger is not enough, you have to set self.available = true; afterwards.

i opened a pr #95

Edit: i initialy set available= true i a finally block, but that would set it always. We observed jobs running twice, so i think its because of that.

By the way: We run jobs only on a dedicated instance (its marked with an env var and is not part of the load balancer). I think distributing potentially cpu-bound work accross web-applications is often not a good idea as you might block user requests. Its better to run it on them on a dedicated workers

from stevejobs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.