Coder Social home page Coder Social logo

reactome2gpml-converter's Introduction

build

Reactome2GPML Converter

This converter converts Reactome pathways to GPML 2013a format.

The class ReactometoGPML2013.java does the actual conversion and class CLIConverter provides command line access to the converter.

Eclipse is used as Java IDE, and the project is built using ant.

After setting up the project in Eclipse or as a local .jar file, testing can be performed on a local Reactome MySQL database; download at https://reactome.org/download-data.

The conversion is performed by the convertPathway method in class org.reactome.sgml.ReactomeToGPML2013Converter. You will have to provide your correct database connection information for class MySQLAdaptor().

For more information, read the publication on this project titled: "Reactome from a WikiPathways Perspective" by Anwesha Bohler , Guanming Wu, Martina Kutmon, Leontius Adhika Pradhana, Susan L. Coort, Kristina Hanspers, Robin Haw, Alexander R. Pico, Chris T. Evelo. https://doi.org/10.1371/journal.pcbi.1004941

Recent conversions for WikiPathways and news can be found at: https://classic.wikipathways.org/index.php/Portal:Reactome

reactome2gpml-converter's People

Contributors

denisesl22 avatar egonw avatar jonathanmelius avatar mkutmon avatar pennatula avatar ryamiller avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reactome2gpml-converter's Issues

Fix log file

  • Lists of Rendered and "Not Rendered" after every PW should be removed from output command line.
  • Add summary of log in the end

hardcode some types in the convertor?

Anwesha,

there are a number of ChEBI identifiers used in Reactome pathways which do not get annotated as @type="Metabolite" in the GPML.

What do you think of making a short list of known metabolites, so that when those IDs are identified the are marked as metabolite in the GPML?

That would solve a number of these errors (which are known upstream and haven't been fixed yet, it seems):

https://jenkins.bigcat.unimaas.nl/job/WikiPathways%20Curation%20(Reactome)/lastCompletedBuild/testReport/nl.unimaas.bigcat.wikipathways.curator/Metabolites/ChEBIIDsNotMarkedAsMetabolite/

Missing child PW for 166520

Reactome PW 166520 (Signaling by NTRKs) was in version 63, but became a meta PW for version 64.
This PW has 3 child PWs; one was already part of pathway collections, 1 is a new pathway, and 3rd (9034015) is not a GPML (nor are the children of this PW). Check if this 3rd pathway becomes part of the 65/66 version, and if not, where this is coming from.

Fix GroupRefs

We noticed one more thing about the new Reactome pathways, the GroupRefs are no longer alphanumeric, but instead look like this:

GroupRef="1416470.4872874213801427"

They used to look like this:

GroupRef="f1da6"

Resize Labels and datanodes for complex components

Longer labels are not always multi-line, resulting in cutoff text. This is not a problem for all longer labels, some are written as multi-line labels as expected, so I don’t know if maybe this has to do with how the information in stored by Reactome. I’m attaching an example. For this particular example, the corresponding Reactome pathway does have a multi-line label.

Remove unnecessary references

Identical references on interactions and anchor points seem repetitive. For cases with references on one or more interactions leading to an anchor point, the anchor point also lists the same references, which makes the pathways look busier than necessary and doesn’t really add any information. Is it OK to skip the references at the level of the anchor and keep them just for the incoming interactions? See the attached for an example. Note that I dragged the horizontal interaction (with the anchors) up a bit to show the references clearer.

Problem in WP1825

Warning! Requested path "rectangle-double" is not available with linetype of "Double". Using linetype of "Solid" instead

Datasource

All proteins annotated with datasource uniprot-swissprot should be annotated with uniprot-trembl

Created GPML file names

Names of GPML files should be changed to reactomeID.gpml i.s.o. pathway title. (lots of special characters)

"PubChem Compound" & "NCBI Nucleotide" unknown DataSource v66 (and before)

Another 2 exceptions, which I've seen for version 66 but also previous ones:

"java.lang.IllegalArgumentException : no DataSource known for PubChem Compound "
&
"java.lang.IllegalArgumentException : no DataSource known for NCBI Nucleotide "

Have to check out why the datasource annotation is not working in this case...

trim spaces in text labels of data nodes

In the current GPML created some labels have redundant spaces at the end of labels, e.g. in WP3579:

<DataNode TextLabel="PC " GraphId="cdb40" Type="Metabolite" GroupRef="bbf32">
  <Attribute Key="cellular_location" Value="endoplasmic reticulum membrane" />
  <Attribute Key="complex_id" Value="R-ALL-5684863" />
  <Attribute Key="copies_num" Value="1" />
  <Graphics CenterX="240.0" CenterY="1851.0" Width="100.0" Height="20.0" ZOrder="32768" FontSize="10" Valign="Middle" Color="0000ff" />
  <Xref Database="ChEBI" ID="CHEBI:16110" />
</DataNode>

Exception: class not found -> renderableChemicalDrug v66

During the conversion of version 66, there was an exception printed out:
java.lang.ClassNotFoundExeption : org.gk.render.RenderableChemicalDrug .

Could this class be an update to the Reactome Schema?

Will work on printing an error report specifically for these exceptions, so we can locate them in the Reactome PW + schema, and go from there.

Problems with Plant Pathways

Hi Anwesha,

After reviewing about half of the pathways, here’s what I found:

  1. On some pathways, one or more nodes are missing. It seems to coincide with nodes that have “Reference Entity” listed as “Unkown:Unknown" in Reactome. I don’t recall if this was the case with the human converted pathways also. Could we still add the node, just with an empty xref, but with the same label as Reactome? Example: http://www.wikipathways.org/index.php/Pathway:WP3111
  2. So far, I only found one pathway with serious issues: http://www.wikipathways.org/index.php/Pathway:WP2972
  3. Minor: The pathway titles should be capitalized.

Thanks,

Kristina

WP2719 has a Reactome ID for a metabolite which has an incorrect format

It has this GPML, but 111875 is not a valid Reactom ID according to identifiers.org:

  <DataNode TextLabel="Ca2+" GraphId="a415d" Type="Metabolite">
    <Attribute Key="cellular_location" Value="endoplasmic reticulum lumen" />
    <Graphics CenterX="2809.0" CenterY="582.0" Width="50.0" Height="30.0" ZOrder="32768" FontSize="10" Valign="Middle" Color="0000ff" />
    <Xref Database="Reactome" ID="111875" />
  </DataNode>

But there are more pathways with integer Reactome IDs.

Many pathways are not renderable

This is the error I get using the converter.
GKBReader.createRenderableFromType(): java.lang.ClassNotFoundException: org.gk.render.RenderableEntitySet
java.lang.ClassNotFoundException: org.gk.render.RenderableEntitySet
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at org.gk.persistence.GKBReader.createRenderableFromType(GKBReader.java:648)
at org.gk.persistence.GKBReader.createRenderableFromElement(GKBReader.java:632)
at org.gk.persistence.GKBReader.openNodes(GKBReader.java:474)
at org.gk.persistence.GKBReader.openProcess(GKBReader.java:398)
at org.gk.persistence.GKBReader.openProcess(GKBReader.java:312)
at org.gk.persistence.DiagramGKBReader.openProcess(DiagramGKBReader.java:68)
at org.gk.persistence.DiagramGKBReader.openDiagram(DiagramGKBReader.java:62)
at org.reactome.convert.common.AbstractConverterFromReactome.queryPathwayDiagram(AbstractConverterFromReactome.java:199)
at org.gk.gpml.ReactometoGPML2013.convertPathway(ReactometoGPML2013.java:489)
at org.gk.gpml.CLIConverter.convertReactomeToGPML(CLIConverter.java:126)
at org.gk.gpml.CLIConverter.convertReactomeToGPMLByID(CLIConverter.java:162)
at org.gk.gpml.CLIConverter.main(CLIConverter.java:42)

The renderable entity set (equivalent to the Pathway Class on our end) seems to be missing.

List of non renderable pathways in version 50 : https://cloud.bigcat.maastrichtuniversity.nl/public.php?service=files&t=e5900286773689ce08820c088bd57f5e

Nucleotide salvage PW not properly converted v.62 Reactome (8956321)

The GPML file is created, however it is empty.
There are no reported errors in the log file.
The PW still exists in Reactome; the parent PW is created (Metabolism of nucleotides, 15869), however this is a meta PW. The child pathways 'Purine salvage' and 'Pyrimidine salvage' are not created by the converter.

So, for now, I will keep the original PW at version 61 4082, and check if version 63/64/65/66 improve something in the conversion; If not, we should check the MySQL database to see what is going wrong here.

WP1903 has a complex with an ChEBI identifiers which is not an ChEBI identifier

The output GPML on WikiPathways is (r83141):

  <DataNode TextLabel="DAG" GraphId="e667d" Type="Complex">
    <Attribute Key="cellular_location" Value="plasma membrane" />
    <Graphics CenterX="491.5" CenterY="29.0" Width="35.0" Height="22.0" ZOrder="32768" FontSize="10" Valign="Middle" ShapeType="RoundedRectangle" Color="a52a2a" />
    <Xref Database="ChEBI" ID="R-HSA-CHEBI:17815" />
  </DataNode>

However, "R-HSA-CHEBI:17815" is not a ChEBI identifier. I am not sure how to check the Reactome source and not sure if the problem is in the convertor code or in the Reactome pathway source.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.