Coder Social home page Coder Social logo

pathwaycommons / factoid Goto Github PK

View Code? Open in Web Editor NEW
28.0 10.0 7.0 81.69 MB

A project to capture biological pathway data from academic papers

Home Page: https://biofactoid.org

License: MIT License

JavaScript 90.59% CSS 7.74% HTML 0.61% Shell 0.33% Dockerfile 0.19% EJS 0.54%

factoid's Introduction

Factoid

DOI License

Biofactoid (biofactoid.org), is a web-based system that empowers authors to capture and share machine-readable summaries of molecular-level interactions described in their publications.

Biofactoid's codebase is licensed under MIT.

Getting the data

All contributed pathway data is freely available for download at https://biofactoid.org/api/document/zip which contains files for each pathway represented in:

  • JavaScript Object Notation (JSON). This is the native format for Biofactoid data and contains interaction data, metadata of the record itself, metadata of the corresponding article, and visualisation data (layout and colors as Cytoscape JSON (Franz et al. (2016) Bioinforma. Oxf. Engl., 32, 309–311.)).
  • Biological Pathway Exchange (BioPAX) (Demir et al. (2010) Biotechnol., 28, 935–942.) for detailed semantic exchange.
  • Systems Biology Graphical Notation Markup Language (SBGNML), a format that supports biological process visualization (Le Novère et al. Nat. Biotechnol., 27, 735–741. (2009); van Iersel et al. (2012) Bioinforma. Oxf. Engl., 28, 2016–2021.)

Our data is licensed under CC0.

Required software

Required software for Graph Database if Docker not used

The following lines should be present in the neo4j.conf file of ~/neo4j-community-5.X.X/conf:

  • server.default_advertised_address=localhost
  • server.default_listen_address=0.0.0.0
  • server.bolt.enabled=true
  • server.bolt.tls_level=DISABLED
  • server.bolt.listen_address=:7687
  • server.bolt.advertised_address=:7687
  • server.http.enabled=true
  • server.http.listen_address=:7474
  • server.http.advertised_address=:7474

Configuration

The following environment variables can be used to configure the server:

General:

  • NODE_ENV : the environment mode; either production or development (default)
  • PORT : the port on which the server runs (default 3000)
  • LOG_LEVEL : minimum log level; one of info (default), warn, error
  • BASE_URL : used for email linkouts (e.g. https://factoid.baderlab.org)
  • API_KEY : used to restrict new document creation (e.g. 8365E63B-9A20-4661-AED8-EDB1296B657F)

CRON:

  • CRON_SCHEDULE : second (optional), minute, hour, day of month, month, day of week
  • DOCUMENT_CRON_UPDATE_PERIOD : Milliseconds between successive Document cron update calls
  • DOCUMENT_CRON_STALE_PERIOD : Milliseconds since Documemt was last edited; criteria for trashing
  • GRAPHDB_CRON_REFRESH_PERIOD_MINUTES : Minimum time (minutes) between refreshes of graph DB data

Database:

  • DB_NAME : name of the db (default factoid)
  • DB_HOST : hostname or ip address of the database host (default localhost)
  • DB_PORT : port where the db can be accessed (default 28015, the rethinkdb default)
  • DB_USER : username if the db uses auth (undefined by default)
  • DB_PASS : password if the db uses auth (undefined by default)
  • DB_CERT : local file path to certificate (cert) file if the db uses ssl (undefined by default)

Downloads:

  • BULK_DOWNLOADS_PATH : relative path to bulk downloads
  • BIOPAX_DOWNLOADS_PATH : relative path to biopax downloads
  • BIOPAX_IDMAP_DOWNLOADS_PATH : relative path to id-mapped biopax downloads
  • EXPORT_BULK_DELAY_HOURS : period to delay (batch) export tasks

Services:

  • DEFAULT_CACHE_SIZE : default max number of entries in each cache
  • REACH_URL : full url of the reach textmining endpoint
  • PC_URL : base url for pathway commons apps, to search or link
  • BIOPAX_CONVERTER_URL : url for the factoid to biopax/sbgn converter
  • GROUNDING_SEARCH_BASE_URL: url for the grounding service
  • NCBI_EUTILS_BASE_URL : url for the NCBI E-utilities
  • NCBI_EUTILS_API_KEY : API key for the NCBI E-utilities
  • INDRA_DB_BASE_URL : url for INDRA (Integrated Network and Dynamical Reasoning Assembler)
  • INDRA_ENGLISH_ASSEMBLER_URL : url for service that assembles INDRA statements into models
  • SEMANTIC_SEARCH_BASE_URL : url for semantic-search web service
  • ORCID_BASE_URL : url for ORCID website
  • ORCID_PUBLIC_API_BASE_URL : url for version of ORCID public API
  • NO_ABSTRACT_HANDLING : labels directing how to sort documents missing query text. 'text' (default): autogenerate text from templates; 'date': sort by date and ignore text.
  • CROSSREF_API_BASE_URL : url for Crossref Unified Resource API
  • NCBI_BASE_URL : url for the NCBI/NLM/NIH
  • PUBTATOR_API_PATH : url path for the PubTator3 web service API

Links:

  • UNIPROT_LINK_BASE_URL : base url concatenated to id to generate a linkout
  • CHEBI_LINK_BASE_URL: base url concatenated to id to generate a linkout
  • PUBCHEM_LINK_BASE_URL: base url concatenated to id to generate a linkout
  • NCBI_LINK_BASE_URL: base url concatenated to id to generate a linkout
  • PUBMED_LINK_BASE_URL: base url concatenated to unique id to generate linkout
  • DOI_LINK_BASE_URL: base url concatenated to doi to generate linkout
  • GOOGLE_SCHOLAR_BASE_URL : base url concatenated to doi, title, or pmid to generate linkout
  • IDENTIFIERS_ORG_ID_BASE_URL : base url concatenated to collection id_prefix:id (i.e. prefix:accession)

Demo:

  • DEMO_ID : the demo document id (default demo)
  • DEMO_SECRET : the demo document secret (default demo)
  • DEMO_JOURNAL_NAME : the journal name for the demo doc
  • DEMO_AUTHOR : the author display name for the demo doc
  • DEMO_TITLE : the title of the demo doc's article
  • DEMO_CAN_BE_SHARED : whether the demo can be shared (default false)
  • DEMO_CAN_BE_SHARED_MULTIPLE_TIMES : whether the demo can be shared multiple times (normal docs can be shared only once; default false)
  • SAMPLE_DOC_ID : id for document that is used as homepage example (production)

Sharing:

  • DOCUMENT_IMAGE_CACHE_SIZE : number of images to cache in memory
  • DOCUMENT_IMAGE_PLL_LIMIT : max number of images to be generated in parallel (expensive)
  • DOCUMENT_IMAGE_WIDTH : tweet card image width
  • DOCUMENT_IMAGE_HEIGHT : tweet card image height
  • DOCUMENT_IMAGE_PADDING : padding around tweet card image (prevents twitter cropping issues)
  • TWITTER_ACCOUNT_NAME : twitter account visible on card
  • TWITTER_CONSUMER_KEY : twitter api key
  • TWITTER_CONSUMER_SECRET : twitter api secret
  • TWITTER_ACCESS_TOKEN_KEY : twitter app key
  • TWITTER_ACCESS_TOKEN_SECRET : twitter app secret
  • MAX_TWEET_LENGTH : max characters a user can type as a share caption

Email:

  • EMAIL_ENABLED: boolean to enable third-party mail service (default false)
  • EMAIL_FROM: name to send emails from (default Biofactoid)
  • EMAIL_FROM_ADDR: address to send emails from (default [email protected])
  • SMTP_PORT: mail transport port (default 587)
  • SMTP_HOST: mail transport host (default localhost)
  • SMTP_USER: mail transport auth user
  • SMTP_PASSWORD: mail transport auth password
  • EMAIL_VENDOR_MAILJET: name of Mailjet vendor
  • MAILJET_TMPLID_INVITE: vendor email template id for an invitation
  • MAILJET_TMPLID_FOLLOWUP: vendor email template id for a follow-up
  • MAILJET_TMPLID_REQUEST_ISSUE: vendor email template id for a request error notification
  • EMAIL_TYPE_INVITE: name to indicate invite email
  • EMAIL_TYPE_FOLLOWUP: name to indicate follow-up email
  • EMAIL_TYPE_REQUEST_ISSUE: name to indicate request error email
  • EMAIL_SUBJECT_INVITE: subject text for invitation email
  • EMAIL_SUBJECT_FOLLOWUP: subject text for follow-up email
  • EMAIL_SUBJECT_REQUEST_ISSUE: subject text for request error email

AppSignal:

  • APPSIGNAL_PUSH_API_KEY : AppSignal API key
  • APPSIGNAL_APP_NAME : name of this app (e.g. 'Biofactoid')
  • APPSIGNAL_APP_ENV : used to indicate which instance is running (e.g 'master', 'production', 'unstable')

Graph Database:

  • GRAPHDB_CONN : The connection string
  • GRAPHDB_USER : Authentication username
  • GRAPHDB_PASS : Authentication password

The following environment variables should always be set in production instances:

  • NODE_ENV : set to production
  • BASE_URL : the production url
  • API_KEY : set to a uuid that you keep secret (used in management panel)
  • TWITTER_ACCOUNT_NAME : twitter account visible on card
  • TWITTER_API_KEY : twitter api key
  • TWITTER_API_KEY_SECRET : twitter api secret
  • TWITTER_ACCESS_TOKEN : twitter app key
  • TWITTER_ACCESS_TOKEN_SECRET : twitter app secret
  • NCBI_EUTILS_API_KEY: the API key for pathwaycommons account
  • EMAIL_ENABLED: true for Mailjet support
  • SMTP_HOST: Mailjet host name
  • SMTP_USER: Mailjet account credentials
  • SMTP_PASSWORD: Mailjet password credentials
  • APPSIGNAL_PUSH_API_KEY : AppSignal API key
  • APPSIGNAL_APP_ENV : used to indicate which instance is running (e.g 'master', 'production', 'unstable')

Run targets

  • npm start : start the server
  • npm stop : stop the server
  • npm run build : build project
  • npm run build-prod : build the project for production
  • npm run bundle-profile : visualise the bundle dependencies
  • npm run clean : clean the project
  • npm run watch : watch mode (debug mode enabled, auto rebuild, livereload)
  • npm test : run tests
  • npm run lint : lint the project
  • npm run fix : fix minor linting errors (ones that can be automatically fixed)

Running via Docker

Images are maintained at dockerhub. Also see factoid-docker-config.

Testing

All files /test will be run by Mocha. You can npm test to run all tests, or you can run mocha -g specific-test-name (prerequisite: npm install -g mocha) to run specific tests.

The tests expect rethinkdb to be running on localhost on the default port (28015).

Chai is included to make the tests easier to read and write.

Notes:

  • The Syncher.synch() is setup separately for each test file and namespaced. The reason for this is that the tests need to be able to be run independently and previous Syncher.synch() calls from other files would otherwise conflict.
  • Each test file should require('./util/conf') to make debugging with promises easier etc.

Publishing a release

  1. Make sure the tests are passing: npm test
  2. Make sure the linting is passing: npm run lint
  3. Bump the version number with npm version, in accordance with semver. The version command in npm updates both package.json and git tags, but note that it uses a v prefix on the tags (e.g. v1.2.3).
  4. For a bug fix / patch release, run npm version patch.
  5. For a new feature release, run npm version minor.
  6. For a breaking API change, run npm version major.
  7. For a specific version number (e.g. 1.2.3), run npm version 1.2.3.
  8. Push the release: git push origin --tags
  9. Publish a GitHub release so that Zenodo creates a DOI for this version.

Related software

Factoid depends on services whose software we maintain.

factoid's People

Contributors

d2fong avatar dependabot[bot] avatar fdurupinar avatar fileoy avatar holymiracle avatar igorrodchenkov avatar jvwong avatar lindajiawenli avatar maxkfranz avatar metincansiper avatar sacdallago avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

factoid's Issues

Refactor commandtip plugin

The commandtip plugin needs to be refactored to

  • upgrade qtip (a dependency) -- this will fix several bugs we have currently, and
  • make the plugin more flexible -- we'll be using the tooltips more and more and the current plugin only works for a few different view types.

Port server side code to node.js

This will probably only take about a day of work, but it will make it a lot easier to do the refactoring and UI changes that we want to do, including:

  • TODO list tickets

Handle TODOs in code

Some code (esp. textmining & assoc.) need more work and are documented in the code with TODO. They were completed in time for ISMB, but should be made better.

Add SIF export webservice

We can add this export option to this sprint, because it's not as complex as something like BioPAX. That way, we can get all the refactoring and UI improvements we want in that sprint, too.

Find organism from textmining

An abstract when returned from textmining services could return the organism associated with it. Alternatively, a ranked or scored list of potential organisms could be returned.

Investigate textmining coming back with error

Here's a couple of problematic abstracts:


Differential Sensitivity of Glioma- versus Lung Cancer-Specific EGFR Mutations to EGFR Kinase Inhibitors.

Abstract
Activation of the epidermal growth factor receptor (EGFR) in glioblastoma (GBM) occurs through mutations or deletions in the extracellular (EC) domain. Unlike lung cancers with EGFR kinase domain (KD) mutations, GBMs respond poorly to the EGFR inhibitor erlotinib. Using RNAi, we show that GBM cells carrying EGFR EC mutations display EGFR addiction. In contrast to KD mutants found in lung cancer, glioma-specific EGFR EC mutants are poorly inhibited by EGFR inhibitors that target the active kinase conformation (e.g., erlotinib). Inhibitors that bind to the inactive EGFR conformation, however, potently inhibit EGFR EC mutants and induce cell death in EGFR-mutant GBM cells. Our results provide first evidence for single kinase addiction in GBM and suggest that the disappointing clinical activity of first-generation EGFR inhibitors in GBM versus lung cancer may be attributed to the different conformational requirements of mutant EGFR in these 2 cancer types.


Both genome-wide genetic and epigenetic alterations are fundamentally important for the development of cancers, but the interdependence of these aberrations is poorly understood. Glioblastomas and other cancers with the CpG island methylator phenotype (CIMP) constitute a subset of tumours with extensive epigenomic aberrations and a distinct biology. Glioma CIMP (G-CIMP) is a powerful determinant of tumour pathogenicity, but the molecular basis of G-CIMP remains unresolved. Here we show that mutation of a single gene, isocitrate dehydrogenase 1 (IDH1), establishes G-CIMP by remodelling the methylome. This remodelling results in reorganization of the methylome and transcriptome. Examination of the epigenome of a large set of intermediate-grade gliomas demonstrates a distinct G-CIMP phenotype that is highly dependent on the presence of IDH mutation. Introduction of mutant IDH1 into primary human astrocytes alters specific histone marks, induces extensive DNA hypermethylation, and reshapes the methylome in a fashion that mirrors the changes observed in G-CIMP-positive lower-grade gliomas. Furthermore, the epigenomic alterations resulting from mutant IDH1 activate key gene expression programs, characterize G-CIMP-positive proneural glioblastomas but not other glioblastomas, and are predictive of improved survival. Our findings demonstrate that IDH mutation is the molecular basis of CIMP in gliomas, provide a framework for understanding oncogenesis in these gliomas, and highlight the interplay between genomic and epigenomic changes in human cancers.

Find a good default abstract

We should find an abstract that the current textmining picks up well and gives us a nice looking graph. This will give a better first impression for the first release.

Tooltips can get in inconsistent state

Conditions that can cause inconsistent state:

  • deletion of entity in another instance
  • creation of UI not managed by Derby template (should use private models)

Refactor menubar plugin

The menubar plugin should be refactored such that it uses standard HTML5 elements. This will

  • simplify our code,
  • reduce page load KB,
  • allow for reusability in tooltips,
  • give us better HTML5 compatability (important to support touch devices in future),
  • and so on.

Add info

Title: Factoid - Building the Future of Scientific Publishing...
Factoid helps authors to translate their written scientific text into formal descriptions of biological processes useful for sharing their results with others, bioinformatics analysis and integrating with other data to help build a more complete model of a cell.

Factoid aims to be part of the publication process. With the author's help, it will extract pathway and related information from a paper as it is submitted, submit the resulting diagram as a visual abstract for peer-review along with the paper and finally, publish the information in a sharable and computable format accompanying the paper for others to use.

Factoid 1.0 helps turn text into a simple and editable network model of a biological process. Text is automatically converted to a first draft of a network which can then be corrected using easy to use editing functions. Future versions will have more advanced text mining functionality to improve the 'first draft', will allow saving the results in standard formats for sharing and will make it easier to add text and edit the network.

New edge addition method might cause accidental edge additions

Let's say a user wants to create a new edge between two nodes that are somehow far away from each other in a crowded (hairball-ish) network. Currently when the user initiates new edge addition (via clicking on the source nodes) and as he mouse-overs other nodes, edges to these nodes (with a reaction between when necessary) are created automatically. But it is somehow hard to avoid intermediate nodes hence accidental connections along the way (from the source node to the target node).

It might be better to start with one-edge-per-action behavior unless the user explicitly says otherwise.

Determine what other webservices we should use for entity info

We are currently using UniProt for entity search and association information, and we are using Miguel's webservices for textmining. We will need additional sources of entity association information to support search and association with non-protein entities.

Possibilities:

  • Pathway Commons

Organise undo/redo internally

Currently, we have an external manager for undo/redo, which means that we need to define each operation in terms of modifications that can be made to one or more parts of the page. This makes undo/redo complicated. For example, if we add or change the UI, we have to make major changes to the undo/redo code every time.

It also makes it difficult to support live Google-docs-like autosaving, synchronising, et cetera.

To improve this situation, we should ditch the current undo/redo code and use this:
https://github.com/cytoscape/cytoscapeweb/issues/99

UI like the sidebar then just becomes a client of the core model, sending events back and forth. The same holds true for server-side syncing and saving. This will also make the undo and redo code much easier to work with, meaning we'll be able to update the UI a lot easier.

Refactor event binding so it's mobile friendly

We're binding to 'click'. That's not good for mobile, since it's simulated (i.e. clicks are slow on mobile). We should use Zepto, ideally, since it creates a 'tap' event for us. That requires cytoscape.js to support Zepto, which requires some more refactoring.

Alternatively, we could abstract the events ourselves. That would be OK, but it wouldn't be nearly as good: The quick fix would be to use 'click' for desktop and 'touchstart' for mobile, but 'touchstart' isn't a tap. 'touchstart' is triggered when your finger first touches the screen.

Re-enable interaction tooltips

Now that the prerequisite bugs have been fixed and UI updates made, we can re-enable and update interaction tooltips.

Allow textmining query to be cancellable

Two things:

(1) If the textmining takes a long time, it would be nice to be able to cancel it in the case where the user forgot to do something to his input text before submitting.

(2) It would be nice if the textmining would run again on failure. However, failure is likely infrequent, so this is likely low priority.

P53 not recognized

When I use the sentence

"MDM2 inhibits P53"

as the abstract, Factoid can only recognize MDM2, and miss P53. But if I use

"P53 is inhibited by MDM2"

then it works as it should.

Replace Pathway Commons entity search

Write our own webservices that search for entities to replace the jquery.pathwaycommons plugin. Given that we'll have to do processing of our own if we use external webservices as a basis, we should definitely do this on the server side.

distorted cursor when hovering info popup headers

The cursor becomes distorted when mouse-overing the headers inside the node information popups. Including the mouse-cursor in the screenshot was somehow tricky, so here's an ugly 'screenshot' to show what it looks like (just next to RAD51 header):

http://i.imgur.com/o6QGL.jpg

Chrome 18.0.1025.163, Safari / Mac OS X 10.6.8

Revise side UI

  • Allow entering text on the editor
  • Drag and drop for entities within interactions
  • Button, tooltip, and textarea for entering a large bit of text (e.g. many interactions)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.