Coder Social home page Coder Social logo

safe-graph / ugfraud Goto Github PK

View Code? Open in Web Editor NEW
126.0 4.0 26.0 7.28 MB

An Unsupervised Graph-based Toolbox for Fraud Detection

License: Apache License 2.0

Python 100.00%
fraud-detection outlier-detection fraud-prevention anomaly-detection graph-algorithms machine-learning data-science toolbox opensource security-tools

ugfraud's Introduction



Building GitHub Downloads Pypi version

An Unsupervised Graph-based Toolbox for Fraud Detection

Introduction: UGFraud is an unsupervised graph-based fraud detection toolbox that integrates several state-of-the-art graph-based fraud detection algorithms. It can be applied to bipartite graphs (e.g., user-product graph), and it can estimate the suspiciousness of both nodes and edges. The implemented models can be found here.

The toolbox incorporates the Markov Random Field (MRF)-based algorithm, dense-block detection-based algorithm, and SVD-based algorithm. For MRF-based algorithms, the users only need the graph structure and the prior suspicious score of the nodes as the input. For other algorithms, the graph structure is the only input.

Meanwhile, we have a deep graph-based fraud detection toolbox which implements state-of-the-art graph neural network-based fraud detectors.

We welcome contributions on adding new fraud detectors and extending the features of the toolbox. Some of the planned features are listed in TODO list.

If you use the toolbox in your project, please cite the paper below and the algorithms you used :

@inproceedings{dou2020robust,
  title={Robust Spammer Detection by Nash Reinforcement Learning},
  author={Dou, Yingtong and Ma, Guixiang and Yu, Philip S and Xie, Sihong},
  booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  year={2020}
}

Useful Resources

Table of Contents

Installation

You can install UGFraud from pypi:

pip install UGFraud

or download and install from github:

git clone https://github.com/safe-graph/UGFraud.git
cd UGFraud
python setup.py install

Dataset

The demo data is not the intact data (rating and date information are missing). The rating information is only used in ZooBP demo. If you need the intact date to play demo, please email [email protected] to download the intact data from Yelp Spam Review Dataset. The metadata.gz file in /UGFraud/Yelp_Data/YelpChi includes:

  • user_id: 38063 number of users
  • product_id: 201 number of products
  • rating: from 1.0 (low) to 5.0 (high)
  • label: -1 is not spam, 1 is spam
  • date: data creation time

User Guide

Running the example code

You can find the implemented models in /UGFraud/Demo/ directory. For example, you can run fBox using:

python eval_fBox.py 

Running on your datasets

Check out the data_to_network_graph function in /UGFraud/Demo/demo_pre.py to convert your data into the networkx graph.

In order to use your own data, you have to provide the following information at least:

  • a dict of dict:
'user_id':{
        'product_id':
                {
                'label': 1
                }
  • a dict of prior

You can use dict_to networkx(graph_dict) function from /Utils/helper.py file to convert your graph_dict into a networkx graph. For more details, please see data_to_network_graph.py.

The structure of code

The /UGFraud repository is organized as follows:

  • Demo/ contains the implemented models and the corresponding example code;
  • Detector/ contains the basic models;
  • Yelp_Data/ contains the necessary dataset files;
  • Utils/ contains the every help functions.

Implemented Models

Model Paper Venue Reference
SpEagle Collective Opinion Spam Detection: Bridging Review Networks and Metadata KDD 2015 BibTex
GANG GANG: Detecting Fraudulent Users in Online Social Networks via Guilt-by-Association on Directed Graph ICDM 2017 BibTex
fBox Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective ICDM 2014 BibTex
Fraudar FRAUDAR: Bounding Graph Fraud in the Face of Camouflage KDD 2016 BibTex
ZooBP ZooBP: Belief Propagation for Heterogeneous Networks VLDB 2017 BibTex
SVD Singular value decomposition and least squares solutions - BibTex
Prior Evaluating suspicioueness based on prior information - -

Model Comparison

Model Application Graph Type Model Type
SpEagle Review Spam Tripartite MRF
GANG Social Sybil Bipartite MRF
fBox Social Fraudster Bipartite SVD
Fraudar Social Fraudster Bipartite Dense-block
ZooBP E-commerce Fraud Tripartite MRF
SVD Dimension Reduction Bipartite SVD

TODO List

  • Homogeneous graph implementation

How to Contribute

You are welcomed to contribute to this open-source toolbox. Currently, you can create issues or send email to [email protected] for inquiry.

ugfraud's People

Contributors

chenmetanoia avatar yingtongdou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ugfraud's Issues

Scale function bug?

Is the else condition indented wrongly?

def scale_value(value_dict):
    """
    Calculate and return a dict of the value of input dict scaled to (0, 1)
    """

    ranked_dict = [(user, value_dict[user]) for user in value_dict.keys()]
    ranked_dict = sorted(ranked_dict, reverse=True, key=lambda x: x[1])

    up_max, up_mean, up_min = ranked_dict[0][1], ranked_dict[int(len(ranked_dict) / 2)][1], ranked_dict[-1][1]

    scale_dict = {}
    for i, p in value_dict.items():
        norm_value = (p - up_min) / (up_max - up_min)
        if norm_value == 0:  # avoid the 0
            scale_dict[i] = 0 + 1e-7
        elif norm_value == 1:  # avoid the 1
            scale_dict[i] = 1 - 1e-7
    else:
        scale_dict[i] = norm_value

    return scale_dict

cannot import name 'Detector' most likely due to a circular import

Performing a simple import as outlined in testing.py

import sys
import os
__file__ = "~/env/lib/python3.8/site-packages/UGFraud"
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
from UGFraud.Demo.eval_fBox import *

However, this produces the below error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
~/env/lib/python3.8/site-packages/UGFraud in <module>
      3 __file__ = "~/env/lib/python3.8/site-packages/UGFraud"
      4 sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
----> 5 from UGFraud.Demo.eval_fBox import *

~/miniconda3/lib/python3.8/site-packages/UGFraud/__init__.py in <module>
      1 # -*- coding: utf-8 -*-
      2 
----> 3 from . import Detector
      4 from . import Utils
      5 

ImportError: cannot import name 'Detector' from partially initialized module 'UGFraud' (most likely due to a circular import) (~/miniconda3/lib/python3.8/site-packages/UGFraud/__init__.py)

m-zoom & d-cube

good job!! Whether there is a plan for m-zoom and d-cube algorithm?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.