Coder Social home page Coder Social logo

axenderst / ldbc-snb-enhancer.js Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sindhu-vasireddy/ldbc-snb-enhancer.js

0.0 0.0 0.0 529 KB

Generates auxiliary data based on an LDBC SNB dataset

License: MIT License

JavaScript 0.56% TypeScript 99.44%

ldbc-snb-enhancer.js's Introduction

LDBC SNB Enhancer

Build status Coverage Status npm version

Generates auxiliary data based on an LDBC SNB social network dataset.

For example, it can generate fake names for existing people such as:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person>
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348840411>.

All auxiliary data that is generated is annotated with the predicate http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator, which can refer to an existing person, that acts as a malicious actor.

Installation

$ npm install -g ldbc-snb-enhancer

or

$ yarn global add ldbc-snb-enhancer

Usage

Invoke from the command line

This tool can be used on the command line as ldbc-snb-enhancer, which takes as single parameter the path to a config file:

$ ldbc-snb-enhancer path/to/config.json

Config file

The config file that should be passed to the command line tool has the following JSON structure:

{
  "@context": "https://linkedsoftwaredependencies.org/bundles/npm/ldbc-snb-enhancer/^2.0.0/components/context.jsonld",
  "@id": "urn:ldbc-snb-enhancer:default",
  "@type": "Enhancer",
  "personsPath": "path/to/social_network_person_0_0.ttl",
  "activitiesPath": "path/to/social_network_activity_0_0.ttl",
  "staticPath": "path/to/social_network_static_0_0.ttl",
  "destinationPathData": "path/to/social_network_auxiliary.ttl",
  "logger": {
    "@type": "LoggerStdout"
  },
  "dataSelector": {
    "@type": "DataSelectorRandom",
    "seed": 12345
  },
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNames",
      "chance": 0.3
    }
  ]
}

The important parts in this config file are:

  • "personsPath": Path to the persons output file of LDBC SNB.
  • "destinationPath": Path of the destination file to create.
  • "logger": An optional logger for tracking the generation process. (LoggerStdout prints to standard output)
  • "dataSelector": A strategy for selecting values from a collection. (DataSelectorRandom selects random values based on a given seed)
  • "handlers": An array of enhancement handlers, which are strategies for generating data.
  • "parameterEmitterPosts"": An optional parameter emitter for the extracted posts.
  • "parameterEmitterComments"": An optional parameter emitter for the extracted comments.

Configure

Handlers

The following handlers can be configured.

Person Names Handler

Generate additional names for existing people. People are selected randomly from the friends that are known by the given person.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNames",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for a name to be generated. The number of new names will be the number of people times this chance, where names are randomly assigned to names.
  • "parameterEmitter"": An optional parameter emitter.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person> 
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348840411>.

Person Names in Cities Handler

Generate additional names for existing people where the malicious creator refers to a city. Cities will be selected based on the city the random person is located in.

This is a variant of the Person Names Handler.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNamesCities",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for a name to be generated. The number of new names will be the number of people times this chance, where names are randomly assigned to names.
  • "parameterEmitter"": An optional parameter emitter.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000021990232555617> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person>
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/firstName> "Zulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/lastName> "Tulma";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://dbpedia.org/resource/Dingzhou>.

Person Noise Handler

Generate additional triples attached to existing people. People are selected randomly.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNoise",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for an additional triple to be generated. The number of new triples will be the number of people times this chance. This value can be larger than 1 to generate multiple triples per person.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471-noise-1> 
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/noise> "NOISE-1";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000000000000000471>.

Posts Handler

Generate posts and assign them to existing people.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPosts",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for posts to be generated. The number of posts will be the number of people times this chance, where people are randomly assigned to posts.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post-fake2967> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Post>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "2967";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000004398046512167>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000010995116283441>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/creationDate> "2021-02-22T10:39:31.595Z";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> "200.200.200.200";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/browserUsed> "Firefox";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/length> "17";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/language> "en";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locatedIn> <http://dbpedia.org/resource/Belgium>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasTag> <http://www.ldbc.eu/ldbc_socialnet/1.0/tag/Georges_Bizet>.

Comments Handler

Generate comments and assign them to existing people as reply to existing posts

{
  "handlers": [
    {
      "@type": "EnhancementHandlerComments",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for comments to be generated. The number of comments will be the number of people times this chance, where people are randomly assigned to comments.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/comment-fake9> a <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "9";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000008796093024878>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000032985348839704>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/creationDate> "2021-02-22T10:39:31.595Z";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> "200.200.200.200";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/browserUsed> "Firefox";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/length> "17";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/replyOf> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000274877908873>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/language> "en";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locatedIn> <http://dbpedia.org/resource/Belgium>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasTag> <http://www.ldbc.eu/ldbc_socialnet/1.0/tag/Georges_Bizet>.

Post Contents Handler

Generate additional contents for existing posts.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPostContents",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for post content to be generated. The number of new post contents will be the number of posts times this chance, where contents are randomly assigned to posts. @range {double}

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000206158430485> <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "962072675046";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/content> "Tomatoes are blue";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000017592186048516>.

Post Authors Handler

Generate additional authors for existing posts.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPostAuthors",
      "chance": 0.3
    }
  ]
}

Parameters:

  • "chance": The chance for a post author to be generated. The number of new post authors will be the number of posts times this chance, where authors are randomly assigned to posts.

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/data/post00000000962072675046> <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> "962072675046";
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000006597069770017>;
    <http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasMaliciousCreator> <http://www.ldbc.eu/ldbc_socialnet/1.0/data/pers00000019791209301543>.

Vocabuary Handler

Generates vocabulary information.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerVocabulary"
    }
  ]
}

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/id> a rdf:Property.
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/hasCreator> a rdf:Property.
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Person> a rdfs:Class.

Vocabuary Predicate Domain Handler

Generates vocabulary information about the domain of a specific predicate.

{
  "handlers": [
     {
      "@type": "EnhancementHandlerVocabularyPredicateDomain",
      "classIRI": "http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment",
      "predicateIRI": "http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP"
    }
  ]
}

Generated shape:

<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/locationIP> rdfs:domain
<http://www.ldbc.eu/ldbc_socialnet/1.0/vocabulary/Comment>.

Parameter Emitters

Certain handlers allow their internal parameters to be emitted.

Such parameters may then for instance be valuable as query substitution parameters.

CSV Parameter Emitter

Emits parameters as CSV files.

{
  "handlers": [
    {
      "@type": "EnhancementHandlerPersonNames",
      "chance": 0.3,
      "parameterEmitter": {
        "@type": "ParameterEmitterCsv",
        "destinationPath": "parameters-person-names.csv"
      }
    }
  ]
}

License

This software is written by Ruben Taelman.

This code is released under the MIT license.

ldbc-snb-enhancer.js's People

Contributors

rubensworks avatar sindhu-vasireddy avatar renovate-bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.