pathwaycommons / factoid Goto Github PK

A project to capture biological pathway data from academic papers

License: MIT License

JavaScript 90.59% CSS 7.74% HTML 0.61% Shell 0.33% Dockerfile 0.19% EJS 0.54%

factoid's Introduction

Factoid

Biofactoid (biofactoid.org), is a web-based system that empowers authors to capture and share machine-readable summaries of molecular-level interactions described in their publications.

Biofactoid's codebase is licensed under MIT.

Getting the data

All contributed pathway data is freely available for download at https://biofactoid.org/api/document/zip which contains files for each pathway represented in:

JavaScript Object Notation (JSON). This is the native format for Biofactoid data and contains interaction data, metadata of the record itself, metadata of the corresponding article, and visualisation data (layout and colors as Cytoscape JSON (Franz et al. (2016) Bioinforma. Oxf. Engl., 32, 309–311.)).
Biological Pathway Exchange (BioPAX) (Demir et al. (2010) Biotechnol., 28, 935–942.) for detailed semantic exchange.
Systems Biology Graphical Notation Markup Language (SBGNML), a format that supports biological process visualization (Le Novère et al. Nat. Biotechnol., 27, 735–741. (2009); van Iersel et al. (2012) Bioinforma. Oxf. Engl., 28, 2016–2021.)

Our data is licensed under CC0.

Required software

Node.js >=10
RethinkDB ^2.3.0

Required software for Graph Database if Docker not used

Neo4j ^5.4.0
APOC ^5.4.0

The following lines should be present in the neo4j.conf file of ~/neo4j-community-5.X.X/conf:

server.default_advertised_address=localhost
server.default_listen_address=0.0.0.0
server.bolt.enabled=true
server.bolt.tls_level=DISABLED
server.bolt.listen_address=:7687
server.bolt.advertised_address=:7687
server.http.enabled=true
server.http.listen_address=:7474
server.http.advertised_address=:7474

Configuration

The following environment variables can be used to configure the server:

General:

NODE_ENV : the environment mode; either production or development (default)
PORT : the port on which the server runs (default 3000)
LOG_LEVEL : minimum log level; one of info (default), warn, error
BASE_URL : used for email linkouts (e.g. https://factoid.baderlab.org)
API_KEY : used to restrict new document creation (e.g. 8365E63B-9A20-4661-AED8-EDB1296B657F)

CRON:

CRON_SCHEDULE : second (optional), minute, hour, day of month, month, day of week
DOCUMENT_CRON_UPDATE_PERIOD : Milliseconds between successive Document cron update calls
DOCUMENT_CRON_STALE_PERIOD : Milliseconds since Documemt was last edited; criteria for trashing
GRAPHDB_CRON_REFRESH_PERIOD_MINUTES : Minimum time (minutes) between refreshes of graph DB data

Database:

DB_NAME : name of the db (default factoid)
DB_HOST : hostname or ip address of the database host (default localhost)
DB_PORT : port where the db can be accessed (default 28015, the rethinkdb default)
DB_USER : username if the db uses auth (undefined by default)
DB_PASS : password if the db uses auth (undefined by default)
DB_CERT : local file path to certificate (cert) file if the db uses ssl (undefined by default)

Downloads:

BULK_DOWNLOADS_PATH : relative path to bulk downloads
BIOPAX_DOWNLOADS_PATH : relative path to biopax downloads
BIOPAX_IDMAP_DOWNLOADS_PATH : relative path to id-mapped biopax downloads
EXPORT_BULK_DELAY_HOURS : period to delay (batch) export tasks

Services:

DEFAULT_CACHE_SIZE : default max number of entries in each cache
REACH_URL : full url of the reach textmining endpoint
PC_URL : base url for pathway commons apps, to search or link
BIOPAX_CONVERTER_URL : url for the factoid to biopax/sbgn converter
GROUNDING_SEARCH_BASE_URL: url for the grounding service
NCBI_EUTILS_BASE_URL : url for the NCBI E-utilities
NCBI_EUTILS_API_KEY : API key for the NCBI E-utilities
INDRA_DB_BASE_URL : url for INDRA (Integrated Network and Dynamical Reasoning Assembler)
INDRA_ENGLISH_ASSEMBLER_URL : url for service that assembles INDRA statements into models
SEMANTIC_SEARCH_BASE_URL : url for semantic-search web service
ORCID_BASE_URL : url for ORCID website
ORCID_PUBLIC_API_BASE_URL : url for version of ORCID public API
NO_ABSTRACT_HANDLING : labels directing how to sort documents missing query text. 'text' (default): autogenerate text from templates; 'date': sort by date and ignore text.
CROSSREF_API_BASE_URL : url for Crossref Unified Resource API
NCBI_BASE_URL : url for the NCBI/NLM/NIH
PUBTATOR_API_PATH : url path for the PubTator3 web service API

Links:

UNIPROT_LINK_BASE_URL : base url concatenated to id to generate a linkout
CHEBI_LINK_BASE_URL: base url concatenated to id to generate a linkout
PUBCHEM_LINK_BASE_URL: base url concatenated to id to generate a linkout
NCBI_LINK_BASE_URL: base url concatenated to id to generate a linkout
PUBMED_LINK_BASE_URL: base url concatenated to unique id to generate linkout
DOI_LINK_BASE_URL: base url concatenated to doi to generate linkout
GOOGLE_SCHOLAR_BASE_URL : base url concatenated to doi, title, or pmid to generate linkout
IDENTIFIERS_ORG_ID_BASE_URL : base url concatenated to collection id_prefix:id (i.e. prefix:accession)

Demo:

DEMO_ID : the demo document id (default demo)
DEMO_SECRET : the demo document secret (default demo)
DEMO_JOURNAL_NAME : the journal name for the demo doc
DEMO_AUTHOR : the author display name for the demo doc
DEMO_TITLE : the title of the demo doc's article
DEMO_CAN_BE_SHARED : whether the demo can be shared (default false)
DEMO_CAN_BE_SHARED_MULTIPLE_TIMES : whether the demo can be shared multiple times (normal docs can be shared only once; default false)
SAMPLE_DOC_ID : id for document that is used as homepage example (production)

Sharing:

DOCUMENT_IMAGE_CACHE_SIZE : number of images to cache in memory
DOCUMENT_IMAGE_PLL_LIMIT : max number of images to be generated in parallel (expensive)
DOCUMENT_IMAGE_WIDTH : tweet card image width
DOCUMENT_IMAGE_HEIGHT : tweet card image height
DOCUMENT_IMAGE_PADDING : padding around tweet card image (prevents twitter cropping issues)
TWITTER_ACCOUNT_NAME : twitter account visible on card
TWITTER_CONSUMER_KEY : twitter api key
TWITTER_CONSUMER_SECRET : twitter api secret
TWITTER_ACCESS_TOKEN_KEY : twitter app key
TWITTER_ACCESS_TOKEN_SECRET : twitter app secret
MAX_TWEET_LENGTH : max characters a user can type as a share caption

Email:

EMAIL_ENABLED: boolean to enable third-party mail service (default false)
EMAIL_FROM: name to send emails from (default Biofactoid)
EMAIL_FROM_ADDR: address to send emails from (default [email protected])
SMTP_PORT: mail transport port (default 587)
SMTP_HOST: mail transport host (default localhost)
SMTP_USER: mail transport auth user
SMTP_PASSWORD: mail transport auth password
EMAIL_VENDOR_MAILJET: name of Mailjet vendor
MAILJET_TMPLID_INVITE: vendor email template id for an invitation
MAILJET_TMPLID_FOLLOWUP: vendor email template id for a follow-up
MAILJET_TMPLID_REQUEST_ISSUE: vendor email template id for a request error notification
EMAIL_TYPE_INVITE: name to indicate invite email
EMAIL_TYPE_FOLLOWUP: name to indicate follow-up email
EMAIL_TYPE_REQUEST_ISSUE: name to indicate request error email
EMAIL_SUBJECT_INVITE: subject text for invitation email
EMAIL_SUBJECT_FOLLOWUP: subject text for follow-up email
EMAIL_SUBJECT_REQUEST_ISSUE: subject text for request error email

AppSignal:

APPSIGNAL_PUSH_API_KEY : AppSignal API key
APPSIGNAL_APP_NAME : name of this app (e.g. 'Biofactoid')
APPSIGNAL_APP_ENV : used to indicate which instance is running (e.g 'master', 'production', 'unstable')

Graph Database:

GRAPHDB_CONN : The connection string
GRAPHDB_USER : Authentication username
GRAPHDB_PASS : Authentication password

The following environment variables should always be set in production instances:

NODE_ENV : set to production
BASE_URL : the production url
API_KEY : set to a uuid that you keep secret (used in management panel)
TWITTER_ACCOUNT_NAME : twitter account visible on card
TWITTER_API_KEY : twitter api key
TWITTER_API_KEY_SECRET : twitter api secret
TWITTER_ACCESS_TOKEN : twitter app key
TWITTER_ACCESS_TOKEN_SECRET : twitter app secret
NCBI_EUTILS_API_KEY: the API key for pathwaycommons account
EMAIL_ENABLED: true for Mailjet support
SMTP_HOST: Mailjet host name
SMTP_USER: Mailjet account credentials
SMTP_PASSWORD: Mailjet password credentials
APPSIGNAL_PUSH_API_KEY : AppSignal API key
APPSIGNAL_APP_ENV : used to indicate which instance is running (e.g 'master', 'production', 'unstable')

Run targets

npm start : start the server
npm stop : stop the server
npm run build : build project
npm run build-prod : build the project for production
npm run bundle-profile : visualise the bundle dependencies
npm run clean : clean the project
npm run watch : watch mode (debug mode enabled, auto rebuild, livereload)
npm test : run tests
npm run lint : lint the project
npm run fix : fix minor linting errors (ones that can be automatically fixed)

Running via Docker

Images are maintained at dockerhub. Also see factoid-docker-config.

Testing

All files /test will be run by Mocha. You can npm test to run all tests, or you can run mocha -g specific-test-name (prerequisite: npm install -g mocha) to run specific tests.

The tests expect rethinkdb to be running on localhost on the default port (28015).

Chai is included to make the tests easier to read and write.

Notes:

The Syncher.synch() is setup separately for each test file and namespaced. The reason for this is that the tests need to be able to be run independently and previous Syncher.synch() calls from other files would otherwise conflict.
Each test file should require('./util/conf') to make debugging with promises easier etc.

Publishing a release

Make sure the tests are passing: npm test
Make sure the linting is passing: npm run lint
Bump the version number with npm version, in accordance with semver. The version command in npm updates both package.json and git tags, but note that it uses a v prefix on the tags (e.g. v1.2.3).
For a bug fix / patch release, run npm version patch.
For a new feature release, run npm version minor.
For a breaking API change, run npm version major.
For a specific version number (e.g. 1.2.3), run npm version 1.2.3.
Push the release: git push origin --tags
Publish a GitHub release so that Zenodo creates a DOI for this version.

Related software

Factoid depends on services whose software we maintain.

GitHub
- grounding-search: Disambiguate bio-entities via full-text search
- semantic-search: Rank texts based on similiarity
- factoid-converters: Convert Factoid model JSON to standard languages (BioPAX and SBGN-PD)
DockerHub
- factoid
- grounding-search
- semantic-search
- factoid-converters
- rethinkdb-docker: RethinkDB-based image with dependencies for database administration (i.e. dump and restore).

factoid's People

Contributors

Stargazers

Watchers

Forkers

fdossi fdurupinar d2fong metincansiper wxli0 jvwong maxkfranz

factoid's Issues

Refactor commandtip plugin

The commandtip plugin needs to be refactored to

upgrade qtip (a dependency) -- this will fix several bugs we have currently, and
make the plugin more flexible -- we'll be using the tooltips more and more and the current plugin only works for a few different view types.

Port server side code to node.js

This will probably only take about a day of work, but it will make it a lot easier to do the refactoring and UI changes that we want to do, including:

TODO list tickets

Can have double edge w. interaction

Determine what fields should be shown for associated entities

e.g. for proteins, what's useful in uniprot (e.g. http://www.uniprot.org/uniprot/P12004)

Handle TODOs in code

Some code (esp. textmining & assoc.) need more work and are documented in the code with TODO. They were completed in time for ISMB, but should be made better.

Add SIF export webservice

We can add this export option to this sprint, because it's not as complex as something like BioPAX. That way, we can get all the refactoring and UI improvements we want in that sprint, too.

Newly added edges aren't associated with any interaction

Make sure data.interaction points to the right ID

Improve info tooltip for associated entities

The design is already mocked up in Evernote

Sync selection state in graph with side panel

Find organism from textmining

An abstract when returned from textmining services could return the organism associated with it. Alternatively, a ranked or scored list of potential organisms could be returned.

Make textmining socket functions dynamically defined

As they are, they are very fragile to change.

Investigate textmining coming back with error

Here's a couple of problematic abstracts:

Differential Sensitivity of Glioma- versus Lung Cancer-Specific EGFR Mutations to EGFR Kinase Inhibitors.

Abstract
Activation of the epidermal growth factor receptor (EGFR) in glioblastoma (GBM) occurs through mutations or deletions in the extracellular (EC) domain. Unlike lung cancers with EGFR kinase domain (KD) mutations, GBMs respond poorly to the EGFR inhibitor erlotinib. Using RNAi, we show that GBM cells carrying EGFR EC mutations display EGFR addiction. In contrast to KD mutants found in lung cancer, glioma-specific EGFR EC mutants are poorly inhibited by EGFR inhibitors that target the active kinase conformation (e.g., erlotinib). Inhibitors that bind to the inactive EGFR conformation, however, potently inhibit EGFR EC mutants and induce cell death in EGFR-mutant GBM cells. Our results provide first evidence for single kinase addiction in GBM and suggest that the disappointing clinical activity of first-generation EGFR inhibitors in GBM versus lung cancer may be attributed to the different conformational requirements of mutant EGFR in these 2 cancer types.

Both genome-wide genetic and epigenetic alterations are fundamentally important for the development of cancers, but the interdependence of these aberrations is poorly understood. Glioblastomas and other cancers with the CpG island methylator phenotype (CIMP) constitute a subset of tumours with extensive epigenomic aberrations and a distinct biology. Glioma CIMP (G-CIMP) is a powerful determinant of tumour pathogenicity, but the molecular basis of G-CIMP remains unresolved. Here we show that mutation of a single gene, isocitrate dehydrogenase 1 (IDH1), establishes G-CIMP by remodelling the methylome. This remodelling results in reorganization of the methylome and transcriptome. Examination of the epigenome of a large set of intermediate-grade gliomas demonstrates a distinct G-CIMP phenotype that is highly dependent on the presence of IDH mutation. Introduction of mutant IDH1 into primary human astrocytes alters specific histone marks, induces extensive DNA hypermethylation, and reshapes the methylome in a fashion that mirrors the changes observed in G-CIMP-positive lower-grade gliomas. Furthermore, the epigenomic alterations resulting from mutant IDH1 activate key gene expression programs, characterize G-CIMP-positive proneural glioblastomas but not other glioblastomas, and are predictive of improved survival. Our findings demonstrate that IDH mutation is the molecular basis of CIMP in gliomas, provide a framework for understanding oncogenesis in these gliomas, and highlight the interplay between genomic and epigenomic changes in human cancers.

Reintegrate edgehandles plugin (mouseover edge adding)

Find a good default abstract

We should find an abstract that the current textmining picks up well and gives us a nice looking graph. This will give a better first impression for the first release.

Refactoring shortcut key code to use jquery.hotkeys

Using keycodes is not really human-readable without lots of comments.

https://github.com/jeresig/jquery.hotkeys

Tooltips can get in inconsistent state

Conditions that can cause inconsistent state:

deletion of entity in another instance
creation of UI not managed by Derby template (should use private models)

Adjust scroll after popover show

Tooltips should open in graph as well as sidebar

No entities found in textmining

This causes no response from the server.

Integrate edge editing with undo/redo

Refactor menubar plugin

The menubar plugin should be refactored such that it uses standard HTML5 elements. This will

simplify our code,
reduce page load KB,
allow for reusability in tooltips,
give us better HTML5 compatability (important to support touch devices in future),
and so on.

Add info

Title: Factoid - Building the Future of Scientific Publishing...
Factoid helps authors to translate their written scientific text into formal descriptions of biological processes useful for sharing their results with others, bioinformatics analysis and integrating with other data to help build a more complete model of a cell.

Factoid aims to be part of the publication process. With the author's help, it will extract pathway and related information from a paper as it is submitted, submit the resulting diagram as a visual abstract for peer-review along with the paper and finally, publish the information in a sharable and computable format accompanying the paper for others to use.

Factoid 1.0 helps turn text into a simple and editable network model of a biological process. Text is automatically converted to a first draft of a network which can then be corrected using easy to use editing functions. Future versions will have more advanced text mining functionality to improve the 'first draft', will allow saving the results in standard formats for sharing and will make it easier to add text and edit the network.

Reactive fns may impact performance

Add keyboard shortcuts for common operations

New edge addition method might cause accidental edge additions

Let's say a user wants to create a new edge between two nodes that are somehow far away from each other in a crowded (hairball-ish) network. Currently when the user initiates new edge addition (via clicking on the source nodes) and as he mouse-overs other nodes, edges to these nodes (with a reaction between when necessary) are created automatically. But it is somehow hard to avoid intermediate nodes hence accidental connections along the way (from the source node to the target node).

It might be better to start with one-edge-per-action behavior unless the user explicitly says otherwise.

Determine what other webservices we should use for entity info

We are currently using UniProt for entity search and association information, and we are using Miguel's webservices for textmining. We will need additional sources of entity association information to support search and association with non-protein entities.

Possibilities:

Pathway Commons

Organise undo/redo internally

Currently, we have an external manager for undo/redo, which means that we need to define each operation in terms of modifications that can be made to one or more parts of the page. This makes undo/redo complicated. For example, if we add or change the UI, we have to make major changes to the undo/redo code every time.

It also makes it difficult to support live Google-docs-like autosaving, synchronising, et cetera.

To improve this situation, we should ditch the current undo/redo code and use this:
https://github.com/cytoscape/cytoscapeweb/issues/99

UI like the sidebar then just becomes a client of the core model, sending events back and forth. The same holds true for server-side syncing and saving. This will also make the undo and redo code much easier to work with, meaning we'll be able to update the UI a lot easier.

Refactor side panel to use popovers

New nodes can't be selected

Make textmining & searching APIs with ql.io

There's no question that we should be using this:
http://ql.io/

Determine whether interactions should have editable names

Refactor event binding so it's mobile friendly

We're binding to 'click'. That's not good for mobile, since it's simulated (i.e. clicks are slow on mobile). We should use Zepto, ideally, since it creates a 'tap' event for us. That requires cytoscape.js to support Zepto, which requires some more refactoring.

Alternatively, we could abstract the events ourselves. That would be OK, but it wouldn't be nearly as good: The quick fix would be to use 'click' for desktop and 'touchstart' for mobile, but 'touchstart' isn't a tap. 'touchstart' is triggered when your finger first touches the screen.

Re-enable interaction tooltips

Now that the prerequisite bugs have been fixed and UI updates made, we can re-enable and update interaction tooltips.

Add more merging options to textselect plugin

Covers merging cases in textmining UI

Allow textmining query to be cancellable

Two things:

(1) If the textmining takes a long time, it would be nice to be able to cancel it in the case where the user forgot to do something to his input text before submitting.

(2) It would be nice if the textmining would run again on failure. However, failure is likely infrequent, so this is likely low priority.

Show some kind of message if nothing was found by textmining

Load more searches by scrolling

Instead of showing lots of search results all at once, show more when you reach the bottom.

Reintegrate help tooltips into UI

P53 not recognized

When I use the sentence

"MDM2 inhibits P53"

as the abstract, Factoid can only recognize MDM2, and miss P53. But if I use

"P53 is inhibited by MDM2"

then it works as it should.

Allow entering text on the editor
Drag and drop for entities within interactions
Button, tooltip, and textarea for entering a large bit of text (e.g. many interactions)