dropbox / zxcvbn Goto Github PK
View Code? Open in Web Editor NEWLow-Budget Password Strength Estimation
Home Page: https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/wheeler
License: MIT License
Low-Budget Password Strength Estimation
Home Page: https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/wheeler
License: MIT License
I know you can't include a Bloom filter due to the frequency rank, but what about including one filter per rank? Is there something that prevents you from using one filter for common words, another for less common words, etc? It could greatly cut down on the library's size...
This module needs to ship with prebuilt javascript code, otherwise it doesn't work without coffeescript.
the simplest way to do this would be to call a build step from within a prepublish script in the package.json.
here is the error I get
$ npm install zxcvbn
$ node
> require('zxcvbn')
Error: Cannot find module './zxcvbn/zxcvbn'
at Function.Module._resolveFilename (module.js:336:15)
at Function.Module._load (module.js:278:25)
at Module.require (module.js:365:17)
at require (module.js:384:17)
at Object.<anonymous> (/home/dominic/c/experiments/node_modules/zxcvbn/index.js:1:80)
at Module._compile (module.js:460:26)
at Object.Module._extensions..js (module.js:478:10)
at Module.load (module.js:355:32)
at Function.Module._load (module.js:310:12)
at Module.require (module.js:365:17)
Please. When you already support Bower, then why not npm too?
It seems like the "repeat" strategy should apply to multiple-character runs.
Did anyone ever try this with Require.js?
It would be nice if the zxcvbn function was an export which doesn't happen if window is available. I think exports should be checked before window so we don't pollute the global namespace.
Currently it is:
"undefined" !== typeof window && null !== window ? (window.zxcvbn = o, "function" === typeof window.zxcvbn_load_hook && window.zxcvbn_load_hook()) : "undefined" !== typeof exports && null !== exports && (exports.zxcvbn = o)
I've noticed a few issues with date matching.
For example:
04052001 is matches as a date, but 040501 is not (and has higher entropy)
And with separators:
27.05.2005 - date, entropy=17.434
27.05.05 - bruteforce, entropy = 43.41
The latter, at least, seems to be due to check_date
only accepting 4-digit years. Also, both day and year would need to be checked for the 2-digit year case to see if either is <= 31.
I also noticed that zero is accepted for days and months, though that isn't a big issue as it will at worst cause an underestimate for the entropy.
More feature request than issue. Would be great to have i18n support so that the "very weak", "weak", "so-so", "good", and "great" could be customized by language.
Hi
First off all I want to congrats everyone involved in the development of this project.
I'm interested to improve the usage of this library for Portuguese language, so to achieve it I need to research for common words in and most popular password words in this language to build a more accurate bad password list, I guess.
There is some information describing the process for change the code to provide the dictionary for a set of words in another language, or maybe some simple approach to use this lib with bad password list and not permitted password in another language too?
Thanks!
src="//dl.dropbox.com/u/209/zxcvbn/zxcvbn.js"
should probably be:
src="zxcvbn.js"
This code is being considered for adoption in Drupal 8: https://drupal.org/node/1497290
There are concerns though about how English focused the dictionaries are.
It would also be good if the lists of strings could be updated regularly.
I assume too that there is support for all UTF8 characters.
Whenever I need to register for an account on a site that doesn't implement zxcvbn, I search Google for the blog post, search the page for "demo", and do the thing.
Maybe put it at dropbox.com/zxcvbn, register a domain, something like that?
http://dl.dropbox.com/u/209/zxcvbn/test/index.html is too long to memorize!
We are using firstname, lastname and email as parts of the blacklist. The following call returns a very high score, nevertheless the email address is part of the blacklist:
var mail = '[email protected]',
blacklist = ['[email protected]'];
console.log(zxcvbn(mail, blacklist));
{
"password":"[email protected]",
"entropy":111.544,
"match_sequence":[
{
"pattern":"dictionary",
"i":0,
"j":0,
"token":"I",
"matched_word":"i",
"rank":2,
"dictionary_name":"english",
"base_entropy":1,
"uppercase_entropy":1,
"l33t_entropy":0,
"entropy":2
},
{
"pattern":"bruteforce",
"i":1,
"j":1,
"token":"m",
"entropy":6.409390936137703,
"cardinality":85
},
{
"pattern":"dictionary",
"i":2,
"j":3,
"token":"me",
"matched_word":"me",
"rank":10,
"dictionary_name":"english",
"base_entropy":3.3219280948873626,
"uppercase_entropy":0,
"l33t_entropy":0,
"entropy":3.3219280948873626
},
{
"pattern":"bruteforce",
"i":4,
"j":5,
"token":"r.",
"entropy":12.818781872275405,
"cardinality":85
},
{
"pattern":"dictionary",
"i":6,
"j":7,
"token":"no",
"matched_word":"no",
"rank":18,
"dictionary_name":"english",
"base_entropy":4.169925001442312,
"uppercase_entropy":0,
"l33t_entropy":0,
"entropy":4.169925001442312
},
{
"pattern":"bruteforce",
"i":8,
"j":10,
"token":"ch.",
"entropy":19.228172808413106,
"cardinality":85
},
{
"pattern":"dictionary",
"i":11,
"j":15,
"token":"nicht",
"matched_word":"nicht",
"rank":24155,
"dictionary_name":"english",
"base_entropy":14.56003423231944,
"uppercase_entropy":0,
"l33t_entropy":0,
"entropy":14.56003423231944
},
{
"pattern":"bruteforce",
"i":16,
"j":16,
"token":".",
"entropy":6.409390936137703,
"cardinality":85
},
{
"pattern":"dictionary",
"i":17,
"j":23,
"token":"Invited",
"matched_word":"invited",
"rank":1175,
"dictionary_name":"english",
"base_entropy":10.198445041452363,
"uppercase_entropy":1,
"l33t_entropy":0,
"entropy":11.198445041452363
},
{
"pattern":"dictionary",
"i":24,
"j":24,
"token":"@",
"matched_word":"a",
"rank":5,
"dictionary_name":"english",
"l33t":true,
"sub":{
"@":"a"
},
"sub_display":"@ -> a",
"base_entropy":2.321928094887362,
"uppercase_entropy":0,
"l33t_entropy":1,
"entropy":3.321928094887362
},
{
"pattern":"dictionary",
"i":25,
"j":28,
"token":"mail",
"matched_word":"mail",
"rank":1135,
"dictionary_name":"english",
"base_entropy":10.148476582178278,
"uppercase_entropy":0,
"l33t_entropy":0,
"entropy":10.148476582178278
},
{
"pattern":"bruteforce",
"i":29,
"j":29,
"token":".",
"entropy":6.409390936137703,
"cardinality":85
},
{
"pattern":"dictionary",
"i":30,
"j":32,
"token":"com",
"matched_word":"com",
"rank":2994,
"dictionary_name":"english",
"base_entropy":11.547858506058418,
"uppercase_entropy":0,
"l33t_entropy":0,
"entropy":11.547858506058418
}
],
"crack_time":1.8922410863462927e+29,
"crack_time_display":"centuries",
"score":4,
"calc_time":10
}
Would it be at all possible to be able to pass in an option to include the dictionary via server as opposed to be included directly in the library like it is? Just to help out with loading the library.
It's quite strange, that single characters match dictionaries:
a
has an entropy of 2.33
whereas i
has an entropy of 1
because it is matched by the dictionary.
The same problem exists with single digits:
1
(entropy = 2) and 4
(entropy = 3.32( match a dictionary, whereas the other single digits have an entropy of 5.42.
It seems that the cardinality used to calculate entropy of bruteforce substrings is calculated based on the entire string. For example:
frzplfqetuothv
: cardinality 26, as expected
frzplfqetuothvpassword
: cardinality 26, as expected
frzplfqetuothvpasswordA
: cardinality 85.
The A
present elsewhere in the string causes it to assume that the letters of frzplfqetuothv
were sampled from the larger set of characters. This doesn't make sense, because users often pick passwords with e.g. a punctuation mark attached. Thus it is vastly overestimating the entropy of such passwords. The cardinality should be calculated per bruteforce substring.
Related: It parses passwords like frzplfqetuothvCOCIWDZOAZPVRL
as one long bruteforce string. It should attempt to split such strings into multiple bruteforce runs with lower cardinality.
calc_bruteforce_cardinality currently considers 'symbols' anything which is not an alphanumeric character, and gives symbols a cardinality of 33. This makes sense for ASCII passwords.
In Linux and Mac operating systems, it is quite easy to enter lots of common unicode charaters. For instance, on my Mac, if I press the right alt key, I can get the following symbols by tapping any other key on the keyboard
«“‘¥~‹÷´`≠
„Ω€®™æ¨œøπ
åß∂ƒ∞∆ªº¬
∑†©√∫˜µ
Another set of characters can be obtained with shift+alt.
Some of them can be used for an advanced form of leetification (eg: πainting
instead of painting
), others can simply be used in complicated passphrases (eg: to the ∞ and beyond™
).
zxcvbn doesn't increase the cardinality for the usage of such characters. I think it should, and it's quite easy: any character >0x7f should be considered part of a different character set. I suggest to increase the cardinality by about 100 (which is the number of special characters I can make with my keyboard and my operating system).
I am not sure how and if we should also attempt to detect leetification with unicode characters; since it's far from common, I think that detecting it and give it just 1 bit of entropy would be an underestimation, but I'm open to ideas.
I can provide a patch if you nod on the general idea.
Dear Sir,
Thanks for contribute zxcvbn.js first of all. This API perfectly works for me. But I am getting issue on IE8. Error is : zxcvbn.js, line 43 character 47.
I include zxcvbn-async.js on my page after changing path of zxcvbn.
I really appreciate your help.
Thanks & Regards
Saurabh Sharma
+91-9602273529
WordPress has an issue where if a long password (500+ characters) is checked for password strength, the browser becomes unresponsive for many seconds or minutes: See issue #31772 for the details.
zxcvbn
if the password is more than 32 characters long
000000000000000000000000000000000
zxcvbn
on the first 32 characters
There's nothing in the zxcvbn.js file to identify what the project, license or version is.
It would be useful to have a simple header included, such as:
/*
zxcvbn - realistic password strength estimation
Updated: 20130203032247
Project: https://github.com/lowe/zxcvbn
License: MIT
*/
Possible feature that could be added is having a uk and us keyboard layout in the adjacency graph. At the moment doing !"£$%^&*() gets a score greater than 1 which is probably wrong because its a fairly simple sequence.
I forked it and tried to implement it myself but the graph generation script doesn't play nice with the £ sign because its a unicode character and all the other characters are ASCII (i.e., when doing len(3£)
it returns 3 instead of 2 because the £ is considered by bytes.
Another workaround for this could just be to add !"£$%^&*() to one of the dictionaries of penalised words.
a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9
entropy:417.915
crack_time(seconds):3.190943210320069e+121
crack_time(readable):centuries
score:4
abcdefghijklmnopqrstuvwxyz0123456789
entropy:14.118
crack_time(seconds):0.889
crack_time(readable):instant
score:0
a b c 1 2 3
entropy:59.299
crack_time(seconds):35452087835576.43
crack_time(readable):centuries
score:4
entropy:3.807
crack_time(seconds):0.001
crack_time(readable):instant
score:0
just adding a space between every letter does not make it all that much more secure...
nvm, I found the apache license, it wasn't in the read me which is where I expected it.
zxcvbn decomposes each password into a match sequence, and then for each match says, "aha, I can find this part in an English dictionary (7 bits)", "this next piece is a name (4 bits), "this is brute force (9 bits)".
There is an inherent entropy to changing models each time. It's probably not much (2-6 bits per entry in the match sequence, I'm guessing) but at the moment zxcvbn is underestimating passwords that jump between a number of these.
There's a number of examples of overestimated entropies for passwords like these:
"ekekekekek", " . . . . . " (#39). Also space seperated passwords ("dark and stormy night") get quite a high entropy(#21).
I think this is because the algorithm does not detect higher level repetition patterns. For space separated strings it's okay that it does not take into account that there is a repetition of
(dictionaryword)(bruteforce)because the algorithm ignores the entropy in that information anyway. The problem is that it does not recognize that always the same selection is picked within a given searchspace. To illustrate this... you get an even higher entropy for "dark dark dark dark".
So when calculating the entropy a selection that occures n times should add ln(n) plus the entropy of the selection itself instead of n times the entropy of the selection itself. This should probably be applied before picking the lowest entropy.
I'm currently implementing this on a site and several users found that they can enter their password without the @ sign. I pass the email in the input fields argument. I'm not sure if this needs a stronger check, for instance a user might use a combination of their name and company for the password. So perhaps any instance of another field found anywhere in the password should reduce the strength, and a match of the password without the @ sign fail?
Happens when one of my variables in the array of the second argument is of type "undefined".
I used [variable || '']
in my code but it just feels like it's something that should be implemented within the core of zxcvbn, otherwise it just won't work whenever one of the members of the user_inputs
array is not a string.
Thants.
The library says that word781
has less entropy than word78
, because 781
is identified as digits
but 78
is identified as bruteforce
with lower entropy:
password: word78
entropy: 20.928
crack time (seconds): 99.743
crack time (display): 3 minutes
score from 0 to 4: 0
calculation time (ms): 0
password: word781
entropy: 18.677
crack time (seconds): 20.95
crack time (display): instant
score from 0 to 4: 0
calculation time (ms): 1
A very common password style is take the first letter of each word in a sentence/phrase, possibly with some substitutions. This leads to a fairly random looking password that is easy to remember, but hard to brute force. The letters are not randomly distributed however, as they're related to the frequencies of letters as the first letter of words. There are far more words starting with s,c,p than with x,z,y,q or numbers. Thus instead of treating it as cardinality 26 for any lower case letter, treat each letter individually based on its rank in the list of 95 printable characters.
You could get this rank by using something like this:
$ cat /usr/share/dict/american-english | cut -c 1 | uniq -c | sort -n
or $ cat /usr/share/dict/american-english | cut -c 1 | tr A-Z a-z | sort | uniq -c | sort -n
Alternatively you could get the rank based on the character frequencies in the password lists, which would help with the frequencies for numbers and special characters.
Can this also be used in Qt's QML as an JavaScript import module? I cannot get it to work.
As stated your dictionary is well... lacking. It's not including a couple of characters that I've seen in the "hacker's leet" from my calculator days.
Since I absolutely despise coffee I've not done a fork and then merged it back in. but here they are, so that you or anyone else can add them.
d,D = )
p,P = 6
it's a couple of characters but still.
ck=x
ers=orz
But anyway that's the only ones that I've seen missing from your system. Since it's supposed to use the "substituions" that people use, and I myself have used them, and lots of people I know have too, your meter would benefit from having them. At least the first two. But all would be best, but I'd think it'd require a bit more for your meter to use. Anyway looks interesting, also you have some that I've never heard of anyone using, so these would at the very least help it.
Trou8ador! is given a score of 2, entropy of 33
Many sites store user passwords using their hashes (SHA1, MD5, etc), and many do so without using any type of hash.
This week we have known of various massive leaks of lists of such hashed, unsalted passwords (e.g. LinkedIn, Last.fm and eHarmony).
Many users use one password, or have a pool of passwords they repeat among multiple sites, probably don't knowing their password has been compromised somewhere else.
I know googling user's passwords is essentially a bad idea, but I think establishing a secure connection to Google and searching a password hash (SHA1 or MD5) shouldn't compromise users' security.
zxcvbn assumes that passwords are hashed with bcrypt, scrypt or PBKDF2. This is fine when you know exactly what hashing algorithm is used. Sadly this isn't always the case and it should be stated more clearly in the project documentation.
The assumed time per guess(with strong hash) is 1/10 seconds and with 100 cores that makes 1/10000 seconds. With MD5, SHA256 or SHA512, the situation is different. For example with Oclhashcat, it takes just 1/1952000000 seconds to calculate single SHA512 hash(using 8 GPUs). See oclHashcat performance table.
For example with skjiqonjhrp
, zxcvbn returns 3 years as crack time. If it was hashed with SHA512, the crack time would be about 320 seconds. I calculated it with entropy returned by zxcvbn and using the same formula as in your blog post: 0.5 * 2^40.188 * 1/1952000000
. The difference is huge.
If people would use this as a general "how good is my password" meter, it would give overly optimistic results since there are countless of services out there using MD5/SHA256/SHA512.
Previously, after loading the zxcvbn.js script directly (either synchronously or asynchronously), you'd call it withwindow.zxcvbn(password)
.
After commit 5407bc2, you have to do window.zxcvbn.zxcvbn(password)
, which is a backwards-incompatible change. If that was intentional, you'll want to publish a new major revision instead of a new patch revision as you did.
How much would I lose running this library without the dictionary part? My concern is about file size since 300kb download is unacceptable in [my] mobile scenario.. What dropbox does about that?
Thanks for such library, anyhow it is awesome.
Hi there,
It would appear that the async file contains a reference to zxcvbn.js on dropbox.
There are two issues with this.
Firstly, the link is not under https, so causes certificate problems on SSL secured pages where it is used. eg. password reset forms.
Secondly, what happens if the dropbox location vanishes for whatever reason? :)
zxcvbn = require('zxcvbn')
returns undefined when the whole library is packed with webpack.
I think that user_input should be penalized without regard to case. For example, say I'm checking the strength of a password on a website when a user updates their profile, and I want to make sure they don't include their username in the password; if their username is "cheese", and they set their password as "CheeSe", then the user_input penalty would not be applied, because the matching is case-sensitive.
zxcvbn( 'cheese' );
Object {password: "cheese", entropy: 6.15, match_sequence: Array[1], crack_time: 0.004, crack_time_display: "instant"…}
zxcvbn( 'cheese', [ 'cheese' ] );
Object {password: "cheese", entropy: 0, match_sequence: Array[1], crack_time: 0, crack_time_display: "instant"…}
zxcvbn( 'cheese', [ 'CheeSe' ] );
Object {password: "cheese", entropy: 6.15, match_sequence: Array[1], crack_time: 0.004, crack_time_display: "instant"…}
Note that you have to reload the page between tests to get accurate results, because of #31.
The Ars Technica article
http://arstechnica.com/security/2013/05/how-crackers-make-minced-meat-out-of-your-passwords/2/
said that they quickly bruteforced all passwords of 6 or less, plus the all-upper or all-lower passwords of length 7 or 8 in just 41 seconds.
Yet the 'crack time (display):' for an all caps 7 char password is 5 years. I suspect more GPU power has made cracking faster.
Hey, can you clarify what license the code is intended to be under? And in case it is GPL or LGPL, have you considered using a more liberal license like MIT/BSD/APLv2?
console.log( zxcvbn( 'iandunn' ) );
Object {password: "iandunn", entropy: 14.892, match_sequence: Array[3], crack_time: 1.52, crack_time_display: "instant"…}
console.log( zxcvbn( 'iandunn', [ 'iandunn' ] ) );
Object {password: "iandunn", entropy: 0, match_sequence: Array[1], crack_time: 0, crack_time_display: "instant"…}
console.log( zxcvbn( 'iandunn' ) );
Object {password: "iandunn", entropy: 0, match_sequence: Array[1], crack_time: 0, crack_time_display: "instant"…}
The third call should return the exact same results as the first call, but instead it returns the results of the 2nd.
zxcvbn("[email protected]", "[email protected]")
has an entrophy of 24.015, even though it should be somewhere around 1.
Somehow zxcvbn does not recognize full matches of user_input.
test/index.html loads script test.js. This file is removed and prevents index.html from displaying any results. Prior pulls had this file. Suggest restore test/test.js or rewrite test/index.html to display results.
Because I want to minify it myself through other minifiers.
And it's a resource we can build, not source code itself. Doesn't belong to source control either.
See
http://stackoverflow.com/questions/10854845/should-i-version-control-the-minified-versions-of-my-jquery-plugins and http://blog.andrewray.me/dealing-with-compiled-files-in-git/
The string
alpha bravo charlie delta
tests as very strong, even though it's just ABCD spelled out in the ITU phonetic alphabet.
Hi, I'm trying to install zxcvbn by adding it to my project's bower.json, but it seems unable to find the module. If I install it directly using "bower install", it works as expected. Should it be trying to pull from lowe/zxcvbn? I see that navigating to github.com/lowe/zxcvbn in my browser redirects me here. Perhaps that is confusing bower somehow? For now I have it working by using https://raw.githubusercontent.com/dropbox/zxcvbn/482ed03a10779ec125100721c2d828b97abf9ea6/zxcvbn.js for my url, but obviously this is not ideal. Any ideas?
Right now zxcvbn only works with the assumption that the attacker limits himself to english - what he will most certainly not do.
Just one example: A common german word like "Schokolade" gets a score of 4/4
Now it is of course impossible to include every possible language. But you don't need to. There is a huge database of existing words available for query: Google will happily tell you, that Millions of websites include the word -- which is a very clear indication, that this is not a good password. So I suggest you include the results of a Google query in your scoring algorithm.
bye, ju
https://dl.dropboxusercontent.com/u/209/zxcvbn/test/index.html
temppass22 cardinality is 69
should not it be 36 (lower case + digits = 26+10)?
Would suggest a preliminary cleanup of the project structure...
data/ - data directory
data-scripts/ - python scripts to load/parse data
dist/ - folder for target/compiled js
src/ - coffeescript & javascript source files
tests/ - unit tests (mocha?)
bower.json - bower package info
package.json - npm/node package information
package.json
can reference src/(index.js)
as the primary file. npm
scripts can be added to package.json
for build, and/or use something like gulp.
Given that npm is already used for the installation of coffee-script and the nature of this package, it may be worthwhile to use uglify-js over closure compiler. It may not be quite as tight, but would reduce the need for external tooling (beyond node/npm) for this.
Converting the scripts to use CommonJS/node syntax could be used in combination with Browserify for the build, enabling a global and amd target in the dist directory from the same source.
I'd be happy to take this on, creating a fork and PR for the changes if there would be interest for including such changes/updates.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.