Coder Social home page Coder Social logo

django-scrubber's People

Contributors

airiulola avatar costela avatar fbinz avatar lociii avatar mastacheata avatar meshy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

django-scrubber's Issues

Crash after scrubbing a second time

When running scrub_data the second time, you get this error:

django_scrubber.ScrubberInitError: Integrity error initializing faker data (FEHLER:  doppelter Schlüsselwert verletzt Unique-Constraint »django_scrubber_fakedata_provider_provider_offset_694f1f44_uniq«
DETAIL:  Schlüssel »(provider, provider_offset)=(name - 0, 0)« existiert bereits.
); maybe decrease SCRUBBER_ENTRIES_PER_PROVIDER?

When you truncate the fake-data-table, it works again.

@mastacheata thinks, might be related to a forgotten provider_key: https://github.com/mastacheata/django-scrubber/blob/cb0614c3ad99dfede45f4bbeb44cdd9c94a71807/django_scrubber/scrubbers.py#L135

Problem with overritten `get_queryset()` method

I just found a really annonying thing: If you override the manager and its get_queryset() method like this:

class MyManager(models.Manager):

    def get_queryset(self):
        return super().get_queryset().exclude(
            show_on_website=False)

Then scrubber won't scrub the data with show_on_website=False.

Any ideas about that topic? Is there a way to avoid the default manager and go directly to the django-base one?

Best regards
Ronny

Proposal: Integrate scrubber wrapper

Hi there!

I wrote some time ago a wrapper class for extending and streamlining the scrubbing process. The idea is that stuff that needs to happen, happens under the hood (clear django session table (thats a big deal), truncate scrubber fake data table for reducing the dump size etc) amd stuff that should happen, can be customised by the developer (creating superuser with fixed password, pre- or post-processing).

It's all documented in our Ambient toolbox package: https://ai-django-core.readthedocs.io/en/latest/features/database_anonymisation.html

I wonder if you might be interested in merging this stuff in your package and provide a better and more convenient service for your users.

Best
Ronny

Broken for models with non-numeric primary key

  • Django Scrubber version: any
  • Django version: any
  • Python version: any
  • Operating System: any

Description

Model with non-numeric primary key will fail because of mod annotations.
model.objects.annotate( mod_pk=F('pk') % settings_with_fallback('SCRUBBER_ENTRIES_PER_PROVIDER') ).update(**realized_scrubbers)

What I Did

class Token(models.Model):
    key = models.CharField(_("Key"), max_length=40, primary_key=True)
    user = models.OneToOneField(
        settings.AUTH_USER_MODEL, related_name='external_auth_token',
        on_delete=models.CASCADE, verbose_name=_("User")
    )
    created = models.DateTimeField(_("Created"), auto_now_add=True)

    class Scrubbers:
        key = scrubbers.Concat(scrubbers.Hash('key'), scrubbers.Faker('pystr', min_chars=5, max_chars=15))

Running scrub_data will fail with this log:

psycopg2.errors.UndefinedFunction: operator does not exist: character varying % integer
django.db.utils.ProgrammingError: operator does not exist: character varying % integer
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.

Simplify scrubber validation when using third-party libraries

Currently, when using third-party libraries like Wagtail, running the "scrub_validation" command results in a pretty long list of classes, that potentially need scrubbing.

At first sight, I would have expected that the SCRUBBER_APPS_LIST setting would be used to determine the models that actually need to be checked.

Instead, I whitelisted all classes from the wagtail-ecosystem using the SCRUBBER_REQUIRED_FIELD_MODEL_WHITELIST setting.

Now, on to my question:

  1. Is it intended, that the SCRUBBER_APPS_LIST is ignored when running the validation command?
    I guess the app list and the validation fulfill different purposes, so in principle it makes sense to separate them.

  2. What do you think about adding the possibility to use regular expressions when whitelisting models. I.e. instead of saying "wagtailcore.Page", i.d. just say re.compile("wagtailcore.*") and thus could whitelist all wagtailcore models.

If there's interest, I can create a PR.

faker major release 6.0.0

  • Django Scrubber version: 0.5.2
  • Django version: /
  • Python version: /
  • Operating System: /

Description

faker made an unnecessary jump in their major release without any breaking changes.
We have to loosen our version restrictions.

Proposal: Strict mode

Hi there!

In the last Django Meetup Cologne I talked about the django-scrubber and we were toying around with some ideas.

I think the biggest drawback is that you have to think about scrubbing when adding new fields.

A colleage suggested a strict mode. This would mean:

  • We add a new settings variable STRICT which defaults to False for compatibility reasons
  • We add a new settings variable "MODEL_FIELD_BLACKLIST" which defaults to (CharField, Textfield, etc) which have to be scrubbed
  • We add a new type scrubbers.Keep to mark fields as "ok" - the scrubber will ignore those fields
  • We get a warning/error during scrubbing when there one field is not defined

We could sell this as "security by design".

What do you think about this? We could get rid of the only drawback this approach has in my opinion 🙂

Best from Cologne
Ronny

Add support for faker 8.x

  • Django Scrubber version: 0.5.3
  • Django version: all
  • Python version: all
  • Operating System: all

Description

faker moved to a new, unsupported major version.
Please add support for it.

How to activate logging

Hi @costela

I saw that there are some logs within the scrub_data command but I didn't mange to let it log somewhere.

Could you provide an example? I'd be willing to update the docs then 😃

Best
Ronny

How to null a given field

Hi @costela

I was wondering... how can I null a given field? I looked at faker but there as well not information about what to do.

Any ideas about that?

Thx!
Ronny

Using faker geo providers does not work

  • Django Scrubber version: Latest
  • Django version: 2.2.9
  • Python version: 3.6
  • Operating System: Win10

Hi there!

I tried to scrub lat and long fields with the faker provider but it does not work.

Here's a model:

class MyModel(models.Model):
    latitude = models.DecimalField(blank=True, null=True, max_digits=10, decimal_places=8)
    longitude = models.DecimalField(blank=True, null=True, max_digits=10, decimal_places=8)

    class Scrubbers:     
        latitude = scrubbers.Faker('latitude')
        longitude = scrubbers.Faker('longitude')

This will lead to:

CommandError: DataError while scrubbing <class 'apps.core.models.MyModel'> ((1264, "Out of range value for column '(null)' at row 1"))

Seems like a bug, doesn't it?

Best regards
Ronny

ProgrammingError when using Faker

  • Django Scrubber version: 0.3.1
  • Django version: 2.2.6
  • Python version: 3.6
  • Operating System: Win10

My setup:

class MyModel:
    my_bool = models.BooleanField(default=False)

    class Scrubbers:
        my_bool = scrubbers.Faker('pybool')

Result (translated from German):

django.db.utils.ProgrammingError: ERROR: Column »my_bool« has typ boolean, but expression has type character varying
LINE 1: UPDATE "app_mymodel" SET "my_bool" = (SELECT U0."...

Any ideas why this is happening? I tried the boolean provider as well and the same happens for date_object.

Best and thanks
Ronny

Support for multiple / non-default database

  • Django Scrubber version: 0.6
  • Django version: All
  • Python version: All
  • Operating System: All

Description

I use multiple databases in my app -- and Django-scrubber only scrubs the default one.

What I Did

My app uses 3 DBs, with the same models, for different customers, so my settings (in dev -- copied from prod) have this:

DATABASES = {
    key: {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': f'devdb-${key}',
        'USER': 'test',
        'PASSWORD': os.getenv("DB_TEST_PASSWORD"),
        'HOST': 'db.mydomain.com',
        'PORT': '5432',
    } for key in ['default', 'customer1', 'customer2', 'customer3']
}

I'd like to scrub the 3 customer DBs, but Django-scrubber doesn't let me do that. It would be nice to have a --database <db> option, or --all-databases

A workaround is to add this in my manage.py's main() function

    if len(sys.argv) > 1 and sys.argv[1] == "scrub_data":
        if "--database" in sys.argv:
            from django.conf import settings

            database_index = sys.argv.index("--database")
            del sys.argv[database_index]
            settings.DATABASES["default"] = settings.DATABASES[sys.argv[database_index]]
            print(f"Scrubbing database {sys.argv[database_index]}")
            del sys.argv[database_index]

then I execute:
./manage.py scrub_data --database customer1, etc.

it's working well, but it's a little ugly... would be nicer to have a clean support in django-scrubber :)

scrubbers.Hash failing on TextFields

  • Django Scrubber version: 0.3.0
  • Django version: any
  • Python version: any
  • Operating System: linux

Description

Things go wrong when max_length is defined on a field as None.

This is the default for TextFields in django. I think it occurs on this line:

if 'max_length' in self.extra:

What I Did

django.db.utils.ProgrammingError: column "none" does not exist
LINE 1: ...), "address" = SUBSTR(MD5("client"."address"), 1, None), "va...

We may be able to look into this next week.

Drop py2 support

Do what it says on the tin: py2 has been EOL for a while now. We should drop it.

Non-existing fields in scrubber definitions make the scrubbing fail

  • Django Scrubber version: 0.4.1
  • Django version: 2.2.10
  • Python version: 3.6.8

Description

We moved a field from one model to another and forgot to remove it from the models scrubber definition.
This makes the whole scrubbing process fail instead of gently skipping over it and issue a warning.

Log what scrubber is doing

Currently scrubber is only logging output from faker, it seems.

INFO 2019-11-19 15:27:32,407 scrubbers 22188 20920 Initializing fake scrub data for provider building_number(, )
INFO 2019-11-19 15:27:33,084 scrubbers 22188 20920 Initializing fake scrub data for provider name_female(, )

I think it would be awesome if we'd log the current model being scrubbed and maybe - if possible - the amount of records left/handled/processed.

Best
Ronny

Compat with Faker 3.0.0

When using Faker 3.0.0, scrubber throws this error:

TypeError: Calling .seed() on instances is deprecated. Use the class method Faker.seed() instead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.