Coder Social home page Coder Social logo

atlonxp / priana Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ailab-foi/priana

0.0 0.0 0.0 6.27 MB

GDPR Privacy Analytics based on AI and Data Science

License: GNU General Public License v3.0

Shell 2.43% Python 83.87% AutoHotkey 0.23% OpenEdge ABL 13.47%

priana's Introduction

PriAna - Privacy Analysis

PriAna GDPR

Intro

PriAna is a privacy analysis tool based on various data science and artificial intelligence techniques that tries to identify privacy related data on your computer. It scans for various file types and supports various database systems including MS Access, MS SQL Server, MySQL, ODBC, Oracle, PostgreSQL and SQLite. It can be a valuable tool for GDPR assesment and audit.

Supported filetypes

PriAna uses various NLP techniques to find privacy related content (in the current version English and Croatian words are supported, but support for other languages is pending) from numerous filetypes including but not limited to csv, doc, docx, eml, epub, gif, jpg, jpeg, json, html, htm, mp3, msg, odt, ogg, pdf, png, pptx, ps, rtf, tiff, tif, txt, wav, xlsx, xls, mdb, rtf, and html.

In addition to filetypes it also scans for MIME types so renamed files are scanned as well. Besides textual formats it extracts textual information from images using OCR as well as speech from audio files using speech to text. Currently only on Linux it also detects human faces on images.

It identifies various archive types including images of virtual machines: ovf, ova, box, vmdk, vdi, vhd, img, raw, iso, zip, rar, tar, 7z. In the current version such archives have to be extracted and scanned manually.

Supported search

PriAna searches for privacy related keywords (currently defined in priana.py), education related keywords and sexual orientation related keywords. It identifies emails, Croatian social security numbers (OIB and JMBG), IP addresses, MAC addresses, GPS coordinates and bank card numbers.

Using NLP it identifies names and locations, and using face detection it identifies images with human faces on them.

Supported platforms

PriAna has been tested on Linux and Windows. In case you encounter problems on any of these platforms, please file an issue on GitHub, we will look into it as soon as possible. If you manage to run PriAna on any other platform, please let us know, we are happily accepting patches ;-)

Reporting

PriAna outputs a report of all found files or tables in databases categorized into high, medium and low risk items. The output reports are stored in the data directory of PriAna.

Recommendation system (Croatian only)

PriAna also implements a small expert system for GDPR compliance currently only in Croatian that can help you asses all found data. This expert system is by no means a full audit and it does not guarantee in any way that you are GDPR compliant even if it states so!

Running from source

In order to run PriAna from source you will need a working Python (>= 2.7) environment including a number of modules: SPADE (version 2), textract, gettext, ZODB, magic, nltk, SQLAlchemy, pyODBC, multiprocessing, psutil, tkinter, ttk, tkMessageBox, tkFileDialog, PIL, pickle, ctypes and face_recognition.

To run the recommendation expert system you will need a working installation of SWI Prolog

In order to use the database scanning features on Linux you will need to install UnixODBC, MDB-tables, MDB-export.

On Windows you will need VC++ Redistributable and Microsoft Access Redistributable Database Engine. Other dependencies include AntiWord, Oracle's instant client, pdftools, Tesseract-OCR and textract's node version, as well as a number of DLLs included in the source.

Building the binary

To build the binary a shell script is included. You will need pyinstaller in order to build a working executable. On Windows you will need to run it through CygWin and need to install AutoHotKey.

The 'build.sh' script accepts 3 arguments:

  1. Platform (linux or win)
  2. Destination directory where the bineries are to be stored
  3. Optional skip if the value is no then all is to be build else the compile of the expert system is skipped and the destination folder is deleted in full.

Pre-built binaries

A pre-built binary is available for Windows: http://ai.foi.hr/priana-win.zip

As well as for Linux (Ubuntu): http://ai.foi.hr/priana-linux.tar.xz

priana's People

Contributors

mschatten avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.