Coder Social home page Coder Social logo

Comments (14)

simonw avatar simonw commented on June 26, 2024

There's one problem here: it looks like usedforsecurity=False was added in Python 3.9, but Datasette still supports Python 3.8 until at least its EOL in October 2024: https://devguide.python.org/versions/

I'm happy to add it with a Python version check or similar, but would the FIPS scanning system identify something like the following?

import hashlib
import sys

def non_security_md5(text):
    try:
        return hashlib.md5(text.encode("utf8"), usedforsecurity=False).hexdigest()
    except TypeError:
        # usedforsecurity is not supported
        return hashlib.md5(text.encode("utf8")).hexdigest()

If that still trips the security filter I'm not sure what to do here - I want to be able to support Python 3.8.

from datasette.

simonw avatar simonw commented on June 26, 2024

Since this is purely a cosmetic thing and we're pre-Datasette-1.0 I'd be OK swapping MD5 for SHA256 here to please the filter.

from datasette.

simonw avatar simonw commented on June 26, 2024

Code in question:

@property
def color(self):
if self.hash:
return self.hash[:6]
return hashlib.md5(self.name.encode("utf8")).hexdigest()[:6]

def to_css_class(s):
"""
Given a string (e.g. a table name) returns a valid unique CSS class.
For simple cases, just returns the string again. If the string is not a
valid CSS class (we disallow - and _ prefixes even though they are valid
as they may be confused with browser prefixes) we strip invalid characters
and add a 6 char md5 sum suffix, to make sure two tables with identical
names after stripping characters don't end up with the same CSS class.
"""
if css_class_re.match(s):
return s
md5_suffix = hashlib.md5(s.encode("utf8")).hexdigest()[:6]
# Strip leading _, -
s = s.lstrip("_").lstrip("-")
# Replace any whitespace with hyphens
s = "-".join(s.split())
# Remove any remaining invalid characters
s = css_invalid_chars_re.sub("", s)
# Attach the md5 suffix
bits = [b for b in (s, md5_suffix) if b]
return "-".join(bits)

The CSS bit is actually a bigger problem, because changing that will change the CSS class name (and the name used for custom templates) for existing Datasette instances - which could result in customized templates or CSS breaking in ways that people might not easily notice.

from datasette.

simonw avatar simonw commented on June 26, 2024

Here's the origin of that usedforsecurity= flag, back in 2010:

from datasette.

simonw avatar simonw commented on June 26, 2024

Oh wait... is the issue here that in a FIPS enabled system simply calling hashlib.md5(...) triggers a runtime error?

If so, the fix could well be to use usedforsecurity=False in Python 3.9+, avoid that parameter in Python 3.8 and document that Datasette on Python 3.8 is incompatible with FIPS, which I imagine is completely fine.

I had originally assumed that this was about a FIPS scanner that statically analyzes Python code looking for insecure uses of hashlib, but I now see that a runtime error is a more likely mechanism here.

from datasette.

darugar avatar darugar commented on June 26, 2024

@simonw exactly, calling hashlib.md5(...) triggers a runtime error, adding the flag apparently disables the check and prevents the runtime error. Adding the flag for 3.9+ and documenting for 3.8 seems quite reasonable.

Not sure if you can do anything about the pint issue, but just as a heads-up that also exhibits a similar runtime error in their use of hashlib.blake2b, which overall prevents datasette from running in FIPS systems (we're patching around that for now).

from datasette.

simonw avatar simonw commented on June 26, 2024

OK, I figured out how to replicate this problem using Docker.

I'm using this image: https://hub.docker.com/r/cyberark/ubuntu-ruby-fips

The hardest part was finding an actively maintained FIPS Docker image! I eventually found it via this search for most recently updated images on Docker Hub matching "fips": https://hub.docker.com/search?q=fips&sort=updated_at&order=desc

So I can start the container with:

docker run -it --rm cyberark/ubuntu-ruby-fips /bin/bash

Then I can install stuff I need with:

apt-gen update && apt-get install -y python3 git python3.10-venv

Then:

cd /tmp
git clone https://github.com/simonw/datasette
cd datasette
python3 -m venv venv
source venv/bin/activate
pip install -e '.[test]'
pytest -n auto

This fails a bunch of tests thanks to the FIPS issue - errors like this:

  File "/tmp/datasette/datasette/database.py", line 77, in color
    return hashlib.md5(self.name.encode("utf8")).hexdigest()[:6]
ValueError: [digital envelope routines] unsupported

from datasette.

simonw avatar simonw commented on June 26, 2024

Applying this patch causes the test suite to pass in that FIPS Docker container:

diff --git a/datasette/database.py b/datasette/database.py
index becb552c..94225c47 100644
--- a/datasette/database.py
+++ b/datasette/database.py
@@ -74,7 +74,7 @@ class Database:
     def color(self):
         if self.hash:
             return self.hash[:6]
-        return hashlib.md5(self.name.encode("utf8")).hexdigest()[:6]
+        return hashlib.md5(self.name.encode("utf8"), usedforsecurity=False).hexdigest()[:6]
 
     def suggest_name(self):
         if self.path:
diff --git a/datasette/utils/__init__.py b/datasette/utils/__init__.py
index f2cd7eb0..d8d187ea 100644
--- a/datasette/utils/__init__.py
+++ b/datasette/utils/__init__.py
@@ -713,7 +713,7 @@ def to_css_class(s):
     """
     if css_class_re.match(s):
         return s
-    md5_suffix = hashlib.md5(s.encode("utf8")).hexdigest()[:6]
+    md5_suffix = hashlib.md5(s.encode("utf8"), usedforsecurity=False).hexdigest()[:6]
     # Strip leading _, -
     s = s.lstrip("_").lstrip("-")
     # Replace any whitespace with hyphens

from datasette.

simonw avatar simonw commented on June 26, 2024

@darugar do you think it's worth releasing a 0.64 with this fix, or can I leave it for a Datasette 1.0 alpha and then Datasette 1.0?

from datasette.

simonw avatar simonw commented on June 26, 2024

Not sure if you can do anything about the pint issue, but just as a heads-up that also exhibits a similar runtime error in their use of hashlib.blake2b, which overall prevents datasette from running in FIPS systems (we're patching around that for now).

How are you patching that?

I'm considering moving Pint out of Datasette core and trying to get it to work as a plugin instead before 1.0 - this may be just the push I need to make that decision.

from datasette.

darugar avatar darugar commented on June 26, 2024

@simonw yes we're patching Pint. Since we need to patch that anyway doing the patch for datasette is not much more effort, but of course it'd be nicer if datasette didn't need a patch :-) Completely up to you depending on timing for 1.0 alpha. Ideal scenario for us would be a datasette release with Pint as a plugin (we don't use it) and the md5 issue fixed in the release, but not a huge deal.

from datasette.

simonw avatar simonw commented on June 26, 2024

Wrote this up as a TIL: https://til.simonwillison.net/python/md5-fips

from datasette.

simonw avatar simonw commented on June 26, 2024

I filed an issue with Pint here:

from datasette.

simonw avatar simonw commented on June 26, 2024

And a PR against Pint too, which was easy because they've already dropped support for Python 3.8:

from datasette.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.