Coder Social home page Coder Social logo

MapTracker Graph Database

MapTracker is a massive graph database - over a terabyte on disk, 1.2B nodes, 2.0B edges, 3.5B metadata assignments. It's used at BMS to "resolve X to Y" - that is, given an object of "type X" find - in a qualitative way - all "related" objects of "type Y". This is done using an aggresively normalized triple store and a large set of rules that dictate what kinds of edges are reasonable to traverse when going from X to Y.

MapTracker is generally not used "on its own", but is rather a component in other tools. Examples avaiable here are:

  • Chem-Bio Hopper - "Hop" from biology to chemistry, or vice-versa, using published chemical activities
  • Hypergeometric Affy - Given a set of "interesting" (generally overexpressed) Affymetrix probesets, run Fisher's Exact Test to identify ontologies that appear overrepresented in the set.
  • Standardize Gene - Given a set of gene identifiers (eg symbols), attempt to determine what they "really are" (ie, given messy gene symbols, convert to rigorous gene accessions)

The schema (tables) is relatively simple. What has made MapTracker particularly powerful is:

  • Careful normalization of loaded data
  • Segregation of nodes into namespaces. Ameliorates collisions, particularly with identifiers like gene symbols
  • Exhaustive logic defining valid connections between X-to-Y. Example, RNA to probeset
  • Generic transitive logic that lets X-to-Y be automatically merged with Y-to-Q and Q-to-W in order to find X-to-W. Such "chains" allow only fundamental connections to be defined yet allow the network to be (safely, rationally) explored far beyone its expected "neighbors"

The image below is an auto-generated network, created by sampling 20,000 random edges from the database (created by exploreSelf.pl). It represents, at a high level, the common node-edge-node triples held by the database.

Network overview

All edges are part of a controlled vocabulary. Most (though not all) are directional. The edges in the above sample include:

Edge overview

Charles Tilford's Projects

catmisc icon catmisc

Miscellaneous helper functions pulled into their own package to aid imports and inheritance

conffiles icon conffiles

Configuration files that I share across machines

dynamictable icon dynamictable

R package to generate interactive HTML tables from matrices and data.frames

gettingandcleaningdata icon gettingandcleaningdata

Coursera assignment at: https://class.coursera.org/getdata-034/human_grading/view/courses/975118/assessments/3/submissions

git-lfs icon git-lfs

Git extension for versioning large files

latencytest icon latencytest

A set of small files for testing network latency in clients (browsers)

manageperl icon manageperl

Scripts to assist in managing Perl libraries with git

maptracker icon maptracker

Massive triple-store graph database allowing X-to-Y identifier conversion

monod icon monod

:notebook: Our cool, secure, and offline-first Markdown editor.

myrepository icon myrepository

R package to manage the creation and publication of R packages

nodejs_course icon nodejs_course

Notes from Lynda course: Node.js Essential Training with Alex Banks

pgseqhash icon pgseqhash

Perl module and scripts to perform exhaustive mismatched-alignment of small oligos against a genome

resume icon resume

Just a holder repo for my resume, because LinkedIn does weird things to PDFs

setfisher icon setfisher

An R package performing hypergeometric enrichment analysis while managing identifier translation and multiple voting

v-saw icon v-saw

The Virology Sequence Analysis Workbench

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.