Coder Social home page Coder Social logo

fogger's Introduction

Fogger - GDPR friendly database masker

Purpose

Fogger is a tool that solves the problem of data privacy. When developers need to work with production data but are obliged to comply with GDPR regulations they need a way to get the database copy with all the sensitive data masked. And while you can always write your own, custom solution to the problem - you don't have to anymore - with fogger you are covered.

Apart from masking data you can also subset or even exclude some tables. Don't worry for the relations with foreign keys, fogger will refine database so everything is clean and shiny.

You can configure various masking and subsetting strategies, and when what fogger has to offer is not enough - you can easily extend it with your own strategies.

How to use the docker image

Fogger requires docker environment, redis for caching and two databases: source and target. You can set up this stack using for example this docker-compose file:

version: '2.0'
services:
  fogger:
    image: tshio/fogger:latest
    volumes:
    - .:/fogger
    environment:
      SOURCE_DATABASE_URL: mysql://user:pass@source:3306/source
      TARGET_DATABASE_URL: mysql://user:pass@target:3306/target
      REDIS_URL: redis://redis
  worker:
    image: tshio/fogger:latest
    environment:
      SOURCE_DATABASE_URL: mysql://user:pass@source:3306/source
      TARGET_DATABASE_URL: mysql://user:pass@target:3306/target
      REDIS_URL: redis://redis
    restart: always
    command: fogger:consumer --messages=200
  redis:
    image: redis:4
  source:
    volumes:
    - ./dump.sql:/docker-entrypoint-initdb.d/dump.sql
    environment:
      MYSQL_DATABASE: source
      MYSQL_PASSWORD: pass
      MYSQL_ROOT_PASSWORD: pass
      MYSQL_USER: user
    image: mysql:5.7
  target:
    environment:
      MYSQL_DATABASE: target
      MYSQL_PASSWORD: pass
      MYSQL_ROOT_PASSWORD: pass
      MYSQL_USER: user
    image: mysql:5.7

Note:

  • we are mapping volume to fogger's and worker's /fogger directory - so the config file would be accessible both in container and in our host filesystem
  • we are importing database content from dump.sql

Of course you can modify and adjust the settings to your needs - for example - instead of importing database from dump file you can pass the existing database url to fogger and worker containers in the env variables.

Now we can spin up the set-up by docker-compose up -d. If the database is huge and you want to speed up the process you can spawn additional workers executing docker-compose up -d --scale=worker=4 instead. Give it a few seconds for the services to spin up then you can start with Fogger:

Fogger gives you three CLI commands:

  • docker-compose run --rm fogger fogger:init will connect to your source database and prepare a boilerplate configuration file with the information on tables and columns in your database. This configuration file is a place where you define which column should be masked (and how) and which tables should be subsetted. See [example config file](Example config file).
  • docker-compose run --rm fogger fogger:run is the core command that will orchestrate the copying, masking and subsetting of data. The actual copying will be done by background worker that can scale horizontally. Before run is executed, make sure that the config file has been modified to your needs. Available subset and mask strategies has been described below.
  • docker-compose run --rm fogger fogger:finish will recreate indexes, refine database so that all the foreign key constraints are still valid, and then recreate them as well. This command runs automatically after run so you need to execute it only when you have stopped the run command with ctrl-c.
  • it's done - the masked and subsetted data are in a target database. You can do whatever you please with it. For example: docker-compose exec target /usr/bin/mysqldump -u user --password=pass target > target.sql will save the dump of masked database in your filesystem.

Example config file

tables:
  posts:
    columns:
      title: { maskStrategy: starify, options: { length: 12 } }
      body: { maskStrategy: faker, options: { method: "sentences" } }
    subsetStrategy: tail
    subsetOptions: { length: 1000 }
  comments:
    columns:
      comment: { maskStrategy: faker, options: { method: "sentences" } }
  users:
    columns:
      email: { maskStrategy: faker, options: { method: "safeEmail" } }
excludes:
    - logs

This is an example of config file. The boilerplate based on your database schema will be generated for you by fogger:init, all you have to do is fill in the mask strategies on the columns that you want masked and subset strategies on the tables for which you only want fraction of the rows.

For the clarity and readability of the config files, all the tables that will not be changed can be omitted. They will be copied as they are. Similarly you can omit columns that are not to be masked. Tables from the excludes section will exist in the target database, but will be empty.

List of available strategies

Masking data

  • hashify - will save the MD5 hash instead of data - you can pass optional argument: template

    email: { maskStrategy: "hashify", options: { template: "%[email protected]" } }

  • starify - will save the 10 stars instead of data - you can pass optional argument: length to override default 10

    email: { maskStrategy: "starify", options: { }

  • faker - will use a marvelous faker library. Pass the method of faker that you want to use here as an option.

    email: { maskStrategy: "faker", options: { method: "safeEmail" } date: { maskStrategy: "faker", options: { method: "date", arguments: ["Y::m::d", "2017-12-31 23:59:59"] }

Subsetting data

  • range - only copy those rows, where column is between min and max
subsetStrategy: range
subsetOptions: { column: "createdAt", min: "2018-01-01 00:00", max: "2018-01-31 23:59:59" }
  • head and tail - only copy length first / last rows
subsetStrategy: head
subsetOptions: { length: 1000 }

or

subsetStrategy: tail
subsetOptions: { length: 1000 }

Under the hood

If you are interested what really happens:

  • source database schema without indices and foreign keys is copied to target
  • data is divided into chunks (this includes query modification for subsetting). Chunks are processed by background workers (using RabbitMQ)
  • during copying sensitive data is substituted for masked version - in order to keep the substituted values consistent, redis is used as a cache
  • when all data is copied, fogger will recreate indices
  • refining cleans up database removing (or setting to null) relations that point to excluded or subsetted table rows
  • the last step is to recreate foreign keys

Contributing

Feel free to contribute to this project! Just fork the code, make any updates and let us know!

fogger's People

Contributors

jarjak avatar kowoltsh avatar lode avatar perphronesis avatar pskt avatar tsurowiec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fogger's Issues

PostgreSQL and Boolean type doesn't work

Hello, we try to use fogger to copy data between 2 PostgreSQL 11 databases, but getting the following error

SQLSTATE[25P02]: In failed sql transaction: 7 ERROR: current transaction is aborted, commands ignored until end of transaction block INSERT INTO "schema"."schema_migrations" ("version", "dirty") VALUES('39', '')

schema_migrations.dirty field type is boolean, but instead of inserting boolean value it inserts empty string

Integrate fogger with Azure

Hi guys!
I am trying to mask the azure sql DB in a secure manner. Therefore, can I connect fogger with azure and I don't want to make a seperate dump.I want to deploy the changes in the existing DB.Is it possible with fogger?

Docker build issue pecl

WARNING: channel "pecl.php.net" has updated its protocols, use "pecl channel-update pecl.php.net" to update
pecl/sqlsrv requires PHP (version >= 7.3.0), installed version is 7.2.3
No valid packages found
install failed

If I update the version to 7.4 (errors with 7,3) there are additional issues.

Is there a Dockerfile for php 7.4?

Moving data in chunks always stuck at 99%

It's amazing tool for developers to achive data anonymization. Thanks for making it available to public.
I recently tried the tool locally. I notice fogger:run runs 0% to 99% real fast then it keep stuck at 99% for days.

I am new to this tool and could anyone suggest anything to fix it ?

Unknown database type _text requested

Hi all,

Great to have this!

I keep on getting this error

 Unknown database type _text requested, Doctrine\DBAL\Platforms\PostgreSQL100Platform may not support it.

is there a work around this?

Question with faker locale

In faker one can choose the locale being used to generate values by using Faker\Factory::create('pt_BR');, how can I do something similar with fogger?

fogger doesn't support Postgres ENUM type

Hi everyone!

Thanks for your efforts!

It looks like fogger doesn't work with Postgres Enum type

worker_1  | In AbstractPlatform.php line 479:
worker_1  |                                                                                
worker_1  |   Unknown database type enum_paymentdocuments_transactiontype requested, Doct  
worker_1  |   rine\DBAL\Platforms\PostgreSQL94Platform may not support it.     

Is this project still under development?

Hey,

I am looking for a tool like this, but I can see this tool has not had updates since 2019. Is it still being maintained? If not do anyone know a similar service that can:

  • Copy data from one database to another
  • Obfuscate personal data
  • Be containerised with Docker
  • Work with mSQL

Postgres database masking

I am interestimg is your software support postgress database masking? If yes, where can i see connection parameters syntax?
Thanks in advance,
Aleksandar Lukic

Command "fogger:init" is not defined

when I run docker-compose run --rm fogger fogger:init I get the error Command "fogger:init" is not defined.

Following the example from the readme, but I'm not sure what I missed.

Thanks!

Is it possible to concatenate mask strategies?

For example:

I need to generate "National Insurance Numbers", these must match the pattern:
AB123456C

Faker doesn't have direct support for this, but I'm hoping I could do it by using three mask strategies and concatenating the results.

  • Generate two random letters
  • Generate six random digits
  • Generate one random digit

Is this possible? Thanks.

Try To install it on windows without success

image

Hello I want to install the fogger but I got error like in the previous image, the error is after do the docker-compose up.
i have the follow :
Windows 10 Pro
Docker Desktop Version 4.0.0

basic installation guide

Hi, this looks promising for a data masking tool - however, I'm not hugely familiar with docker, compose etc. - does anyone have a simplified user guide for installing fogger. For example, do I need to git clone the repo, then from there do I need to modify the template docker-compose.yml file? will it install redis and all the other associated dependencies?

thanks!

Fogger not supporting Microsoft SQL Server db

Hi Guys,

First of all thank you for such a tool.

I am trying to add SOURCE_DATABASE_URL and TARGET_DATABASE_URL for SQL Server db in "docker-compose.yml" file like this.
environment:
SOURCE_DATABASE_URL: sqlsrv://MydbUser:MyPassword@MyHostname:port/MySourceDb
TARGET_DATABASE_URL: sqlsrv://MydbUser:MyPassword@MyHostName: port/MyTargetDb

I am getting below error when I fire "docker-compose up -d"
In SQLSrvException.php line 57:
SQLSTATE [IMSSP, -8]: An invalid connection option key type was received. Option key types must be strings.

Please let me know does fogger support SQL Server? If yes then what is wrong in my database URL?

Microsoft SQL Server Support?

Hi, does fogger support MSSQL? if so can you point me in the direction of some example config - in my case I don't want to import from a dump file, but rather point to the DB directly - is this possible?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.