Coder Social home page Coder Social logo

mozilla / fxa-auth-db-mysql Goto Github PK

View Code? Open in Web Editor NEW
12.0 30.0 26.0 2.42 MB

DEPRECATED - Migrated to https://github.com/mozilla/fxa

License: Mozilla Public License 2.0

JavaScript 66.63% SQLPL 9.38% Shell 0.87% PLpgSQL 23.12%
firefox-accounts javascript mysql-backend fxa

fxa-auth-db-mysql's Introduction

fxa-auth-db-mysql's People

Contributors

chilts avatar dannycoates avatar deeptibaghel avatar eoger avatar g-k avatar jamonation avatar jbuck avatar jrgm avatar pdehaan avatar philbooth avatar rfk avatar seanmonstar avatar tda avatar vbudhram avatar vladikoff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fxa-auth-db-mysql's Issues

Should an invalid password from auth-db-server return 404 (and should it have a detail code)

I was looking at what was happening when auth-db-mysql was using the wrong version of auth-db-server. Right here, we treat any 404 as meaning 'password mismatch'.

Is 404 the right code for a password mismatch? Should it return a sub-code on the wire to indicate more precisely what is wrong?

(p.s., it's hard to figure out where to put bugs sometimes in cases like these where the 404 is consumed by one repo but produced by another. I started writing this in fxa-auth-server, but realized it was a question about the api).

Implement endpoint and query for retention metrics

The requirement for this comes from mozilla/fxa#45, which describes a user retention curve with points plotted at days 0, 1, 3, 7, 14, 30, 60 and 90.

Day 0 will always be 100% and, in our case, represents all accounts that have a lastAccessTime set in the sessionTokens table. Statistically, we can't make comparisons against accounts without a qualifying session because we don't know how long those users were retained. Thus the graph will end up being populated gradually until it is fully available, 90 days after the deployment of mozilla/fxa-auth-server#983.

The subsequent days represent the percentage of those users who were still active at that point and can be measured by comparing lastAccessTime to createdAt from the accounts table.

Concretely, this is the kind of thing I'm thinking of in SQL terms (although there may be a more performant way to do it):

SELECT
    COUNT(s0.lastAccessTime) AS d0,
    COUNT(s1.lastAccessTime) AS d1,
    COUNT(s3.lastAccessTime) AS d3,
    COUNT(s7.lastAccessTime) AS d7,
    COUNT(s14.lastAccessTime) AS d14,
    COUNT(s30.lastAccessTime) AS d30,
    COUNT(s60.lastAccessTime) AS d60,
    COUNT(s90.lastAccessTime) AS d90
FROM accounts AS a
    LEFT JOIN sessionTokens AS s0 ON a.uid = s0.uid
    LEFT JOIN sessionTokens AS s1 ON a.uid = s1.uid
    LEFT JOIN sessionTokens AS s3 ON a.uid = s3.uid
    LEFT JOIN sessionTokens AS s7 ON a.uid = s7.uid
    LEFT JOIN sessionTokens AS s14 ON a.uid = s14.uid
    LEFT JOIN sessionTokens AS s30 ON a.uid = s30.uid
    LEFT JOIN sessionTokens AS s60 ON a.uid = s60.uid
    LEFT JOIN sessionTokens AS s90 ON a.uid = s90.uid
WHERE
    s0.lastAccessTime > 0 AND
    s1.lastAccessTime >= a.createdAt + 86400000 AND
    s3.lastAccessTime >= a.createdAt + 259200000 AND
    s7.lastAccessTime >= a.createdAt + 604800000 AND
    s14.lastAccessTime >= a.createdAt + 1209600000 AND
    s30.lastAccessTime >= a.createdAt + 2592000000 AND
    s60.lastAccessTime >= a.createdAt + 5184000000 AND
    s90.lastAccessTime >= a.createdAt + 7776000000;

@ckarlof, mentioning you in case you want to speak about this on Thursday.

Test impact of metrics gathering on production traffic

(Filing this to capture the work as part of our planning process).

We're hoping to land some new metrics-gathering scripts in #72, but they're heavy queries that will make the auth db work hard. Let's try them out in stage and get a feel for how they might impact production traffic.

metrics script does not append to log file sanely

Stupid error, this.

The metrics script appends JSON-serialised data to a file for later processing by the Heka filter, but it doesn't delimit between writes. So the file contains valid JSON for the first write and unparseable nonsense subsequently.

Instead it should either:

  1. Treat the file as newline-separated JSON blobs and continue to append to it.
  2. If the file doesn't exist, write the data object wrapped in an array. If the file does exist, parse it, push the new data onto the array and then overwrite the file with the increased array.

1 is obviously more efficient but it also requires changes to the Heka filter, which I'm not entirely comfortable making yet. I'm going to look into that, but may end up going with 2 if it seems hairy.

Investigate/adjust use of poolee retries

During load tests, there was at least one episode where a series of timeouts happened on the fxa-auth servers, followed by a series of 500 errors. I'm not sure exactly what the root cause was but roughly something caused some PUT /sessionToken/ to block, and then the auth-server apparently retried these PUT requests. The retry of the PUT's resulted in 409 Conflict due to duplicate key when the first PUT was unblocked.

STR:
1 - Drive traffic into a test environment
2 - In mysql: lock tables sessionTokens write; select sleep(11); unlock tables;

Anyways, we should look into whether we should be disabling retires in poolee, or using a retryFilter that allows known idempotent queries to retry but disallows others.

openid test fails on second run of npm test.

STR

  1. mysql -uroot -e 'drop database fxa'; node ./bin/db_patcher.js; npm test ;#=> PASS
  2. npm test ;#=> FAIL
...
    ok 177 account no longer exists for this email address
    ok 178 unlockCode is deleted for this uid
    ok 179 Returned an empty object for unlockAccount
    not ok 180 Record already exists
...

which is https://github.com/mozilla/fxa-auth-db-mysql/blob/master/fxa-auth-db-server/test/backend/db_tests.js#L909-L937, presumably the constant value of the openid.

Pool does not reconnect on db failover

From email conversation with @jrgm:

"""
I was testing RDS Multi-AZ failover in stage under load on Friday.

The TL;DR is that fxa-auth does not reconnect on failover. Ugh.

fxa-oauth and fxa-profile did reconnect, and on a restart fxa-auth
reconnects. But on 3 of 4 occasions, fxa-auth wound up in a state where it
kept putting queries in the queue and returning 500 on 'Error: Queue limit
reached'.
"""

We do some mucking around with the error-recovery options, which may be affecting this.
From https://github.com/mozilla/fxa-auth-db-mysql/blob/master/lib/db/mysql.js#L34:

    // poolCluster will remove the pool after `removeNodeErrorCount` errors.
    // We don't ever want to remove a pool because we only have one pool
    // for writing and reading each. Connection errors are mostly out of our
    // control for automatic recovery so monitoring of 503s is critical.
    // Since `removeNodeErrorCount` is Infinity `canRetry` must be false
    // to prevent inifinite retry attempts.
    this.poolCluster = mysql.createPoolCluster(
      {
        removeNodeErrorCount: Infinity,
        canRetry: false
      }
    )

remove eslint complexity checks

Running npm test on master produces this chirp. I don't think the linter complexity tests have ever uncovered any real bugs, and mostly just whine about || defaulting. Can we remove the linting for "complexity"?

Running "eslint:files" (eslint) task

lib/db/mem.js
  306:34  warning  Function 'anonymous' has a complexity of 7  complexity

โœ– 1 problem (0 errors, 1 warning)

FInish conversion of lib/db/mysql.js to use mozlog

The lazy way to file a bug report:

[03-16 17:31:03]    jrgm    I was looking at logging in fxa-auth-db-mysql. 
[03-16 17:31:18]    jrgm    Some log lines are like this - https://github.com/mozilla/fxa-local-dev
[03-16 17:31:20]    jrgm    oops
[03-16 17:31:33]    jrgm    Some log lines are like this - https://github.com/mozilla/fxa-auth-db-mysql/blob/master/lib/db/mysql.js#L119-L122
[03-16 17:31:55]    jrgm    And others like this (mostly on error paths) - https://github.com/mozilla/fxa-auth-db-mysql/blob/master/lib/db/mysql.js#L774
[03-16 17:32:24]    jrgm    seanmonstar: the former style is correct, and the latter, well, borked, right? 
[03-16 17:32:56]    seanmonstar The latter is what was used prior to mozlog
[03-16 17:33:24]    jrgm    yeah, I see in history where some, but not all, were converted. 
[03-16 17:33:34]    seanmonstar The first argument should be the String op
[03-16 17:35:06]    jrgm    Or else you get `"Type":"bin.server.[object Object]"`
[03-16 17:37:52]    seanmonstar To catch that in tests, the mozlog config accepts a 'debug: true' param
[03-16 17:38:04]    seanmonstar That will assert a few things about the logs

Fix the change log

Back in #74, I did some jiggling about to get the base directory stuff from the old fxa-auth-db-server repo into the base directory of this repo. One of those changes was to have the two changelogs side-by-side in CHANGELOG-db.md and CHANGELOG-server.md.

This broke our version script, which hasn't been updating the change logs since then. We should reinstate CHANGELOG.md so that the script works correctly again.

Bonus points for:

  1. Sanely merging the two historical change logs into the canonical one.
  2. Going back through history and adding commit info for all of the missing releases since the problem was introduced.

Cleanup old stored procedures?

As seen in the following (infra private, sorry!) bug:

https://bugzilla.mozilla.org/show_bug.cgi?id=1138660#c2

We are creating new stored procedures with each db migration, but we don't ever clean up the old ones. We now have procedures forgotPasswordVerified_1, forgotPasswordVerified_2 and forgotPasswordVerified_3 in the db.

We should come up with a scheme for cleaning up old versions once we're confident they'll no longer be needed.

npm shrinkwrap is out of date on train-31

I noticed that a Kibana graph of auth-db errors (that only runs in production, not stage) had stopped updating with the release. I chatted with @whd and it was because the log lines were missing an HTTP 'code' value. After much digging and head scratching, the root cause is that the npm-shrinkwrap.json file in fxa-auth-db-mysql is specifically pinned to a git SHA in https://github.com/mozilla/fxa-auth-db-mysql/blob/train-31/npm-shrinkwrap.json#L286-L290. This obviously is pretty disturbing, and we can talk later about how to prevent this from occuring (i.e., tools to validate that package.json, node_modules and npm-shrinkwrap.json are in sync, and a grunt task that will consistently maintain npm-shrinkwrap.json).

However, the "good" news is that the only actual code change that was dropped was this change, which is why the log line in production was missing.

I considered asking @ckolos to rollback to train-30 on auth-server, but have decided that the missing log line is "liveable". I think we should pick up the L10N strings and a fix for train-31 npm-shrinkwrap to get back to where we should be, and I'll work on that now.

Add a third token state to unverifiedTokens

Arising from the discussion at mozilla/fxa-auth-server#1374 (comment), it has become clear that the binary state modelled by unverifiedTokens is insufficient. Specifically, there is a class of tokens for which we would like to say "this token hasn't been verified and doesn't need to be".

@rfk suggested we could use a NULL in the tokenVerificationId column to imply the state. Alternatively, if we didn't fancy implying the state in that way, a migration to add an extra column should be quite quick to execute because records are expunged from that table as tokens get verified.

Rename this repo

Might we consider deleting the fxa-auth-db-server repo, renaming this repo to fxa-auth-db-server and re-jiggling the directory structure / dependencies to eliminate the fxa-auth-db-server sub-directory?

My thinking is that the mysqlness of this repo is both an implementation detail and not true when testing against db-mem.

There's also a weird (imho) back-to-frontness about the server bit being a child of the actual database bit. The server is the outwardly-visible interface that everything else cares about, why should it live tucked away as a child of the bit that nobody interacts with directly?

And all of our other services are named fxa-xxx-server so this one is going to seem pretty odd to people not familiar with the history.

Not to mention the fact that Shane's amazing high-level architecture diagram is now WRONG! ๐Ÿ™€

I guess an argument could be made that it's just a name and names don't matter, but I kind of think it does in this case.

Access tables in a consistent order when inside a transaction

(Migrated to this repo from mozilla/fxa-auth-db-server#101)

As noted in mozilla/fxa-auth-server#785 (comment) it's generally a good idea, where possible, to have all your tranasctions touch tables in a consistent order. This reduces the possibility of deadlocks due to two threads waiting on a table lock held by the other.

It seems like a pretty small change that could reduce a small-but-definitely-there source of errors in production.

Implement device metrics dashboard

This work will actually take place in mozilla/accounts-dashboard but I'm raising it here to give it visibility in our planning process.

We need to create new charts for the device metrics that we will soon be recording. To limit intrusiveness on other views, they should probably be grouped together into a new sub-dashboard for FxA.

Can we eliminate two frequent sources of programmer error?

I woke up thinking about this PR comment:

Thanks for a keen eye

The thing is, a keen eye had nothing to do with it. The reason I spotted those oversights was because I specifically looked for them. And the reason I specifically looked for them was because I make the exact same oversights every single time I write a new migration. I don't think this is a coincidence.

The two oversights are:

  1. Forgetting to update every point in the documentation that mentions the thing you're changing.
  2. Using the wrong patch level in the rollback migration.

The docs

The docs are really long. Using marks and split windows in vim makes them a bit more navigable but its still easy to lose track of where you are or where related sections are.

They also repeat themselves. Every section looks identical at a glance, so it's really easy to scroll past the end of the section you're supposed to be working on and start editing the wrong bit. I do this a lot.

The two documents for the server API and the database API express identical information for different layers. It's really easy to remember to update one of them but not the other. This is exacerbated by them living in different directories; it's too easy not to notice that you forgot to edit one.

Some ideas:

  • We should break the docs down into a more hierarchical structure. One for each table in the db doc, one for each base path in the server doc.
  • We should have a top-level doc for each (db and server) that tries to establish some of the common boilerplate so that individual sections can be shorter. Not sure how doable this is in practice, things like the full JSON response body and the full curl incantation have value for the reader.
  • The db and server docs should live side-by-side in the same directory. It would be harder to forget one but not the other then.
  • Would it be easier to maintain the db doc if it was generated from jsdoc comments? Maybe people are more likely to remember to update them as they're right there in the same bit of code that you have to edit.

The rollback migration

There is no automation around applying and testing the rollback migration, so it is really easy to leave copy-paste errors behind. Especially the patch level, but also to a lesser extent stored procedure versions.

The easiest way to create this migration is copy-pasting an old one to use as a template. Then you make your changes and use MySQL Workbench to make sure that it applies cleanly and the database seems correct with it applied. But you can't (easily) run the tests against it at this point because they and all of the code still expect the db to be in the pre-rollback state. So you tell yourself it looks okay and there's a code review that will double-check it for you.

Some ideas:

  • Maybe this isn't enough of a problem to actually fix? Have we ever used a rollback migration? In the event of needing to rollback the db, would these errors be obvious to the roller-backer?
  • We could write a script that generates the skeleton rollback migration based on the one it rolls back. A DROP for every CREATE etc and the correct patch level. It wouldn't be able to write a complete migration (unless we introduced some serious logic that went through all the migrations looking for earlier versions of stored procedures etc), but it would be enough of a start to avoid the common pitfalls.
  • Would it be possible to automate some kind of test that applies your migration to master, then applies the rollback migration, then runs the tests in master to make sure they all still work?

Remove cruft from fxa-auth-db-server subdirectory

This is a follow-up to #56. There's leftover top-level-repo cruft in the fxa-auth-db-server repo, such as .gitignore files and travis config nd grunt tasks. We should move them out to the top level where possible, or delete them if not.

errors when running `npm test` locally.

โžœ fxa-auth-db-mysql git:(master) npm test

> [email protected] test /Users/stomlinson/development/fxa-auth-db-mysql
> grunt && node ./bin/db_patcher.js >/dev/null &&  ./scripts/tap-coverage.js test/backend test/local

Running "jshint:files" (jshint) task
>> 23 files lint free.

Running "copyright:app" (copyright) task

Running "copyright:tests" (copyright) task

Done, without errors.
{"name":"db-patcher","hostname":"Shanes-Retina-MBP.local","pid":81159,"level":30,"msg":"Database patched to level 7","time":"2015-02-04T17:05:57.347Z","v":0}
ok test/backend/db_tests.js ......................... 135/135
404 '/account/hello-world'
404 '/emailRecord/3639623637316162613462663936643435396238393936393935373631646165406578616d706c652e636f6d'
404 '/sessionToken/0df4ec7f3dbea05cf365ac9b9f62b645cb52ed630c452b73a48797ad08585e6c'
404 '/sessionToken/0df4ec7f3dbea05cf365ac9b9f62b645cb52ed630c452b73a48797ad08585e6c'
404 '/keyFetchToken/3bde20c4ac621055d0e37c1e798fa4b1395d35a1b1aadd55155f37520cedc63a'
404 '/keyFetchToken/3bde20c4ac621055d0e37c1e798fa4b1395d35a1b1aadd55155f37520cedc63a'
404 '/accountResetToken/0ec2d9f2c6f9807d600fb49bd85940828b6b98abf3409d096ee71a23db3e9b73'
404 '/accountResetToken/0ec2d9f2c6f9807d600fb49bd85940828b6b98abf3409d096ee71a23db3e9b73'
404 '/passwordChangeToken/e4128f89d8f515f17ddbfb02cb803baedf282951a6bbe5e50aae8ea33849f528'
404 '/passwordChangeToken/e4128f89d8f515f17ddbfb02cb803baedf282951a6bbe5e50aae8ea33849f528'
404 '/passwordForgotToken/42985d3970a2a9608d2eb90b07b2632a71d151659c84a0d154344bb365b82e90'
404 '/passwordForgotToken/42985d3970a2a9608d2eb90b07b2632a71d151659c84a0d154344bb365b82e90'
404 '/passwordForgotToken/b080f8dcf51095cafad246807469f47e33b488506c2b899c5e47f7b6cf47ed51'
ok test/backend/remote.js ........................... 128/128
ok test/local/incorrect-patch-level.js .................. 4/4
{"name":"db-api","hostname":"Shanes-Retina-MBP.local","pid":81165,"level":30,"op":"MySql.connect","patchLevel":7,"patchLevelRequired":7,"msg":"","time":"2015-02-04T17:05:59.487Z","v":0}
ok test/local/log-stats.js .............................. 5/5
not ok test/local/mysql_tests.js ...................... 32/34
    Command: "/Users/stomlinson/development/nvm/v0.10.25/bin/node mysql_tests.js"
    TAP version 13
    ok 1 Got the ping ok
    ok 2 we have an error
    ok 3 should be equal
    ok 4 should be equal
    ok 5 should be equal
    ok 6 should be equal
    ok 7 we have an error
    ok 8 should be equal
    ok 9 should be equal
    ok 10 should be equal
    ok 11 should be equal
    ok 12 we have an error
    ok 13 should be equal
    ok 14 should be equal
    ok 15 should be equal
    ok 16 should be equal
    ok 17 we got an error
    ok 18 should be equal
    ok 19 should be equal
    ok 20 should be equal
    ok 21 should be equal
    ok 22 we got an error
    ok 23 should be equal
    ok 24 should be equal
    ok 25 should be equal
    ok 26 should be equal
    ok 27 the function was retried
    ok 28 Drop procedure was successful
    ok 29 The stored procedure creation was successful
    ok 30 The call to the stored procedure failed as expected
    not ok 31 error stringified is correct
      ---
        file:   /Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js
        line:   43
        column: 21
        stack:
          - |
            getCaller (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:418:17)
          - |
            assert (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:21:16)
          - |
            Function.equal (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:162:10)
          - |
            Test._testAssert [as equal] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-test.js:87:16)
          - |
            /Users/stomlinson/development/fxa-auth-db-mysql/test/local/mysql_tests.js:204:17
          - |
            tryCatch1 (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js:43:21)
          - |
            Promise$_callHandler [as _callHandler] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:627:13)
          - |
            Promise$_settlePromiseFromHandler [as _settlePromiseFromHandler] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:641:18)
          - |
            Promise$_settlePromiseAt [as _settlePromiseAt] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:800:14)
          - |
            Promise$_settlePromises [as _settlePromises] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:934:14)
        found:  |
          Error: ER_NO_DEFAULT_FOR_FIELD
        wanted: |
          Error: ER_BAD_NULL_ERROR
        diff:   |
          FOUND:  Error: ER_NO_DEFAULT_FOR_FIELD
          WANTED: Error: ER_BAD_NULL_ERROR
                            ^ (at position = 10)
      ...
    ok 32 error code is correct
    not ok 33 error errno is correct
      ---
        file:   /Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js
        line:   43
        column: 21
        stack:
          - |
            getCaller (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:418:17)
          - |
            assert (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:21:16)
          - |
            Function.equal (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-assert.js:162:10)
          - |
            Test._testAssert [as equal] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/tap/lib/tap-test.js:87:16)
          - |
            /Users/stomlinson/development/fxa-auth-db-mysql/test/local/mysql_tests.js:206:17
          - |
            tryCatch1 (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js:43:21)
          - |
            Promise$_callHandler [as _callHandler] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:627:13)
          - |
            Promise$_settlePromiseFromHandler [as _settlePromiseFromHandler] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:641:18)
          - |
            Promise$_settlePromiseAt [as _settlePromiseAt] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:800:14)
          - |
            Promise$_settlePromises [as _settlePromises] (/Users/stomlinson/development/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:934:14)
        found:  1364
        wanted: 1048
      ...
    ok 34 test/local/mysql_tests.js

    1..34
    # tests 34
    # pass  32
    # fail  2

ok test/local/prune_tokens.js ......................... 11/11
total ............................................... 315/317

not ok
code coverage: 92.7%
generating coverage.html: complete in 0.4s
npm ERR! Test failed.  See above for more details.
npm ERR! not ok code 0
โžœ  fxa-auth-db-mysql git:(master)

The sha is 1fcf15c

I'm sure this is operator error, but I haven't figured out where I am going wrong.

/account/reset sometimes fails with 'invalid token'

Sometimes /account/reset starts failing for no apparent reason. It seems to fail more often on @shane-tomlinson's machine than on mine. Once it starts to fail, it continues doing so until the db server is restarted. Re-starting the auth server at that point has no effect.

This has been happening for a long time, but may have become more frequent recently. We've only seen it happen when running against the memory backend, although that doesn't necessarily mean the problem is there.

In the auth server logs, the uid in the failing request payload is always a null byte / 00. In the db server logs, there is a SyntaxError: Unexpected token ILLEGAL.

auth server log:

auth-server-logs

db server log:

auth-db-logs

db server syntax error:

auth-db-with-compile-error

Discussion on mysql-patcher and production

Just trying to capture a few options for the patcher and production to see if we can somehow automate this in a particular way but being safe about it as well.

Current Solution Problems

  • the patching is done manually in prod, an issues needs to be raised for the patch to be applied
  • the patch needs manual adjusting if it contains stored procedures, this could add introduce an error when being created
  • patching (if using the mysql-patcher) is somewhat opaque due to the entire SQL file being sent to MySql in one go, i.e. it worked or it failed and there isn't any statement level insight or logging

Solution 1: Run mysql-patcher in Prod

The current mysql-patcher sends the entire patch file to mysql in one go, for it to execute every statement. Without some overseeing, some MySql statements (such as adding a column or an index) may not complete correctly. Since MySql doesn't have transactional DDL, this may have failed in the middle of a patch file and therefore would need manual inspection to check which statements have or haven't been applied correctly. Whether the patch level in dbMetadata is or isn't updated correctly the database then differs from any patch.

Solution 2 : Change mysql-patcher to run each individual statement in each patch file

This may work. It introduces some overhead to delimit the statements in the patch files so that the mysql-patcher can send each statement to MySql separately. The patcher program could then keep re-trying (up to a certain limit) to make sure each statement is executed in turn. If any statement fails up to a maximum of times, it could then try to reverse the patch number. The ability to log more info related to which statement is being applied, the length of time it took and what failed would make this process more transparent. This would still require at least manual oversight to make sure nothing is going awry.

Solution 3 : Perform all patching in code, not SQL

By having each statement in code means we wouldn't need to instrument any patch files with delimiters since they would already be separate. Again, each statement could be performed in turn with some retry logic and perhaps a patch reversal process if anything in the patch fails too many times. Again, manual oversight would also be required though the extra logging information would also help make sure it went smoothly.

@jrgm: Do either of the last two solutions appeal to you, or anything else? Have you got any other suggestions? /cc @rfk @ckolos. Thanks.

Is PUT /accountResetToken/:id ever hit?

Over in mozilla/fxa-auth-server#1384, we're removing createAccountResetToken because it is never called, forgotPasswordVerified is instead.

Both those methods map to separate endpoints and stored procedures in this repo. Should we remove the PUT /accountResetToken/:id endpoint and the createAccountResetToken stored procedure on the basis that the auth server never calls them?

[prod] Error: Queue limit reached

At about 2015-02-11T13:36Z, across all production fxa-auth-db servers, there were hundreds of 500 errors with this stack. I don't really have more information at this time. Things we need to look at are stackdriver/cloudwatch mysql stats; show engine innodb status on prod.

Error: Queue limit reached.
at Pool.getConnection (/data/fxa-auth-db-server/node_modules/mysql/lib/Pool.js:64:15)
at PoolCluster._getConnection (/data/fxa-auth-db-server/node_modules/mysql/lib/PoolCluster.js:148:13)
at PoolNamespace.getConnection (/data/fxa-auth-db-server/node_modules/mysql/lib/PoolCluster.js:188:17)
at PoolCluster.getConnection (/data/fxa-auth-db-server/node_modules/mysql/lib/PoolCluster.js:80:13)
at promisified (eval at makeNodePromisifiedEval (/data/fxa-auth-db-server/node_modules/bluebird/js/main/promisify.js:195:12), :2:343)
at retryable (/data/fxa-auth-db-server/db/mysql.js:636:12)
at MySql.getConnection (/data/fxa-auth-db-server/db/mysql.js:605:12)
at MySql.singleQuery (/data/fxa-auth-db-server/db/mysql.js:490:17)
at MySql.read (/data/fxa-auth-db-server/db/mysql.js:575:17)
at MySql.readFirstResult (/data/fxa-auth-db-server/db/mysql.js:556:17)

dependencies do not fit

tried the following:

# Using Ubuntu
curl -sL https://deb.nodesource.com/setup_0.10 | sudo -E bash -
sudo apt-get install -y nodejs

afterwards I checked out the code

git clone https://github.com/mozilla/fxa-auth-db-mysql.git
cd fxa-auth-db-mysql/
fxa@vps229008:~/myGIT/fxa-auth-db-mysql$ npm install
npm WARN deprecated This version of npm lacks support for important features,
npm WARN deprecated such as scoped packages, offered by the primary npm
npm WARN deprecated registry. Consider upgrading to at least npm@2, if not the
npm WARN deprecated latest stable version. To upgrade to npm@2, run:
npm WARN deprecated 
npm WARN deprecated   npm -g install npm@latest-2
npm WARN deprecated 
npm WARN deprecated To upgrade to the latest stable version, run:
npm WARN deprecated 
npm WARN deprecated   npm -g install npm@latest
npm WARN deprecated 
npm WARN deprecated (Depending on how Node.js was installed on your system, you
npm WARN deprecated may need to prefix the preceding commands with `sudo`, or if
npm WARN deprecated on Windows, run them from an Administrator prompt.)
npm WARN deprecated 
npm WARN deprecated If you're running the version of npm bundled with
npm WARN deprecated Node.js 0.10 LTS, be aware that the next version of 0.10 LTS
npm WARN deprecated will be bundled with a version of npm@2, which has some small
npm WARN deprecated backwards-incompatible changes made to `npm run-script` and
npm WARN deprecated semver behavior.

> [email protected] install /var/fxa/myGIT/fxa-auth-db-mysql/node_modules/restify/node_modules/dtrace-provider
> node scripts/install.js

npm WARN engine [email protected]: wanted: {"node":">=0.10.40","npm":">=2.0.0"} (current: {"node":"0.10.41","npm":"1.4.29"})
npm WARN engine [email protected]: wanted: {"node":">=0.12.0"} (current: {"node":"0.10.41","npm":"1.4.29"})
[email protected] node_modules/grunt-copyright

[email protected] node_modules/eslint-config-fxa

[email protected] node_modules/clone

[email protected] node_modules/proxyquire
โ”œโ”€โ”€ [email protected]
โ””โ”€โ”€ [email protected] ([email protected], [email protected])

[email protected] node_modules/grunt-bump
โ””โ”€โ”€ [email protected]

[email protected] node_modules/bluebird

[email protected] node_modules/mozlog
โ”œโ”€โ”€ [email protected]
โ””โ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected])

[email protected] node_modules/mysql
โ”œโ”€โ”€ [email protected]
โ””โ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected])

[email protected] node_modules/sinon
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ””โ”€โ”€ [email protected] ([email protected])

[email protected] node_modules/mysql-patcher
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
โ””โ”€โ”€ [email protected]

[email protected] node_modules/request
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected])
โ””โ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected])

[email protected] node_modules/fxa-jwtool
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ””โ”€โ”€ [email protected] ([email protected])

[email protected] node_modules/load-grunt-tasks
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected])
โ””โ”€โ”€ [email protected] ([email protected], [email protected])

[email protected] node_modules/restify
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected])
โ””โ”€โ”€ [email protected] ([email protected], [email protected])

[email protected] node_modules/nock
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ””โ”€โ”€ [email protected]

[email protected] node_modules/grunt
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ””โ”€โ”€ [email protected] ([email protected], [email protected], [email protected])

[email protected] node_modules/convict
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ”œโ”€โ”€ [email protected]
โ””โ”€โ”€ [email protected] ([email protected])

[email protected] node_modules/ass
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected])
โ””โ”€โ”€ [email protected] ([email protected], [email protected], [email protected])

[email protected] node_modules/grunt-nsp
โ””โ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected])

[email protected] node_modules/tap
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected]
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected])
โ””โ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])

[email protected] node_modules/grunt-eslint
โ”œโ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected])
โ””โ”€โ”€ [email protected] ([email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected])

When you depend on nodejs 0.10, why do I run into errors.
Can you please document dependencies.
Thank you very much

Be consistent about how timestamps are generated

While digging to our use of the verifierSetAt field, I noticed that at account-creation time we explicitly pass in a timestamp from the auth-server:

https://github.com/mozilla/fxa-auth-db-mysql/blob/master/db/mysql.js#L177

While at account-reset time we allow the auth-db-server to generate its own timestamp:

https://github.com/mozilla/fxa-auth-db-mysql/blob/master/db/mysql.js#L425

It's likely nothing bad will come of this, but we should probably be consistent. My vote would be to pass in all timestamps from the auth-server since that's what we're already doing in the majority of cases.

Leak in train-48

re: https://bugzilla.mozilla.org/show_bug.cgi?id=1220397

There is a significant leak in train-48 in the auth-db-server, which required us to rollback to train-47 version of auth-db/auth.

I don't see a leak in the actual changes in train-48. I set up some some branches where I undid the module version updates, then selectively updated each separately and ran some load tests. There is noise in these numbers, but the update to bluebird (and how we use it?) looks to be a good candidate as to what is doing (most of?) the leaking.

                                          RSS end        Pct. change             RSS post        Pct. change
                                          of test                               forced GC

48-no-module-updates-#1 (10min)             89412               1.0%                74188               6.6%
48-no-module-updates-#2 (10min)             89780               1.5%                68896              -1.0%
48-no-module-updates-#3 (10min)             86268              -2.5%                65756              -5.5%
48-no-module-updates-average                88487               0.0%                69613               0.0%
train-46 (10min)                            89110               0.7%                71564               2.8%
48 update clone,convict,mozlog (10min)      89232               0.8%                68728              -1.3%
48 update only bluebird (10min)             93520               5.7%                74820               7.5%
48 update only fxa-jwtool (10min)           90984               2.8%                71076               2.1%
48 update only mysql (10min)                96404               8.9%                71124               2.2%
48 update only request (10min)              91404               3.3%                68968              -0.9%
48 update only restify (10min)              91760               3.7%                68804              -1.2%
48 update only mysql (1hr)                  93360               5.5%                72516               4.2%
48 update all except mysql (1hr)           111764              26.3%                87728              26.0%
48 update only bluebird (1hr)              112016              26.6%                87256              25.3%
48 update all except bluebird (1hr)         96792               9.4%                72652               4.4%

Not sure how to proceed further in isolating the root cause of the leak. @dannycoates?

Drop old stored procedures

I'm not certain that all of these are okay for retirement but a peek at my local database shows the following stored procedures have newer versions:

  • account_2
  • createAccount_2
  • createSessionToken_1
  • deleteAccount_4
  • emailRecord_2
  • passwordChangeToken_1
  • sessionToken_1
  • sessionToken_2

lib/db/mysql.js - unhandled TypeError when running on an fxa-dev box

Noticed that the auth-db-mysql server on an fxa-dev is filling its logs with the exception below.

I believe that this is due to the change from p-promise to bluebird, and that this function should ensure that it always returns a promise on any path (events or no events), (P.resolve()?).

Possibly unhandled TypeError: Cannot call method 'then' of undefined
    at query.then.then.log.error.op (/data/fxa-auth-db-mysql/lib/db/mysql.js:734:20)
    at tryCatch1 (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/util.js:43:21)
    at Promise$_callHandler [as _callHandler] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:627:13)
    at Promise$_settlePromiseFromHandler [as _settlePromiseFromHandler] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:641:18)
    at Promise$_settlePromiseAt [as _settlePromiseAt] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:800:14)
    at Promise$_settlePromises [as _settlePromises] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/promise.js:934:14)
    at Promise.b (domain.js:183:18)
    at Async$_consumeFunctionBuffer [as _consumeFunctionBuffer] (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/async.js:75:12)
    at Async$consumeFunctionBuffer (/data/fxa-auth-db-mysql/node_modules/bluebird/js/main/async.js:38:14)
    at process._tickDomainCallback (node.js:492:13)

Provide API for listing session tokens

Let's provide an API for listing all session tokens active on an account. I'm thinking as simple as:

GET /account/:id/sessions => [ {session data}, {session data}, ...]

Most immediately, we'll want to be able to determine whether a user is single-device or multi-device, which can be accomplished by counting their session tokens. We'll eventually want to display session tokens and their metadata on a control dashboard.

One thing to be careful of: we probably dont want to return the token's secret data in this query, to guard against accidental leakage. You should only get at that data if you look up a session directly by its id.

db connections must reconnect on multi-az RDS failover

With Multi-AZ RDS Failover, from a client connection view, what will happen is that the current master database will shut down, terminating the TCP connection, then DNS will propagate a new IP for the (new) master db, and the database connection should reconnect.

In my tests in stage, under load, this only happened once in 4 tries.

I can provide further details, setup assistance as needed.

This should be a priority to fix to ensure minimal downtime.

db migration for push key formats can't be deployed in production

/cc @eoger @jrgm; from #133 (comment)

In #133 we added a database migration that changes the column type for push callback pubkeys. @jrgm notes that it can't be deployed in production due to the change on the column type causing errors. We need to figure out a way forward for this before cutting train-62.

I'm hopeful that we can get around it by deploying two migrations - one that drops the pubkey stuff entirely and updates stored procedures to not reference that column at all, and a second that adds the pubkey stuff back in in the new format.

@jrgm does this sound like a reasonable approach to the problems you saw?

Gather and emit multi-device metrics

Once we have mozilla/fxa-auth-server#983 and mozilla/fxa-auth-server#988, we'll be able to make a pretty accurate assessment of "number of users with multiple devices connected". I'm hoping that we can efficiently pull this out of the database on a daily basis and add it onto our dashboards.

We currently emit some daily metrics reports via the script here:

https://github.com/mozilla-services/puppet-config/blob/master/fxa/modules/fxa_admin/files/gather_basic_metrics.py

But I think that logic should live over here next to the code, and should probably grow some tests etc eventually. Perhaps we can do that move as part of this work.

Anyway, to the existing "count of all accounts" and "count of verified accounts", let's add "count of users with 2 or more devices" and "count of users with 3 or more devices".

We'll need to use the sessionTokens table for this, and we should probably exclude sessions that haven't been active in the last month. So the thing we want is "number of uids that have more than X sessionTokens that were active in the last month".

There's an index on uid so I'm hoping we can do it with an efficient SQL query, but my SQL-fu is not super strong. It'll be something like this totally-untested SQL:

SELECT COUNT(uid) FROM (
    SELECT uid FROM sessionTokens
    GROUP BY uid
    HAVING COUNT(tokenId) > 1
)

But we'd have to check whether it can be run efficiently enough that we're happy doing it on our production db on a daily basis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.