Coder Social home page Coder Social logo

amlend2end's Introduction

AML End to End Example

Project description

This project demonstrates end to end pipeline how to train binary anti money laundering (AML) classifier based on Generative Adversarial Networks (GANs) and Graph embeddings. Proposed solution includes following sub sections:

  • Data ingestion - We will use sample of transactions data generated by AMLSim
  • Feature store โ€“ We use Hopsworks Feature Store to compute features, organize them as feature groups and store for downstream analysis, such as creating training datasets for model training, as well as retrieving them
  • Graph Embeddings - We will use StellarGraph library to compute graph embeddings.
  • Anomaly detection model - We will use keras implementation of adversarial anomaly detection that was adapted to tabular data.
  • Hyper parameter tuning - We will use Maggy to conduct experiments for hyperparameter tuning.
  • Model serving - We will use Hopsworks model server to predict anomalous transactions.

Demo dataset

A sample of transaction data is provided in the folder ./demodata, including upload alert_transactions.csv, party.csv and transactions.csv. You should upload these files to hdfs:///Projects/{}/Resources in your Hopsworks cluster. You can do this by running this script (which will also upload adversarialaml.tgz - needed to run some examples to the same directory):

./copy-hopsworks.sh project-name


Install in Python/PyPi: stellargraph 1.2.1

Anomaly detection model

Keras implementation of adversarial anomaly detection is provided in the folder ./adversarialaml. To use this library create a zip file containing the python files in the adversarialaml folder and attach your zip file when starting a Jupyter server or Hopsworks job. The copy-hopsworks.sh script will upload adversarialaml.tgz to the Resources directory, but you will still need to attach the zip file when you start/run a notebook/job.

End to End pipeline

To successfully complete this tutorial use one of 2 options bellow

Jupyter notebooks step by step

Run jupyter notebooks in the following order:

  1. 1_transaction_feature_engineering_ingestion.ipynb
  2. 2_prep_training_dataset_for_embeddings.ipynb
  3. 3_maggy_node_embeddings.ipynb
  4. 4_compute_node_embeddings.ipynb
  5. 5_predict_node_embeddings_and_ingest_to_fs.ipynb
  6. 6_maggy_adversarial_aml.ipynb
  7. 7_train_adversarial_aml.ipynb
  8. 8_aml_model_server.ipynb

Airflow

In Hopsworks you can also create airflow pipeline. For this:

  1. Create notebook jobs. You can follow instructions here how to create jobs in Hopsworks.
  2. Create Airflow DAG using provided airflow_aml_end2end.py.

amlend2end's People

Contributors

davitbzh avatar jimdowling avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.