Coder Social home page Coder Social logo

paxtools's Introduction

Cytoscape Core App: BioPAX Support

Introduction

This is a Cytoscape Core App for reading BioPAX files.

How to build

git clone https://github.com/cytoscape/biopax.git
mvn clean install

paxtools's People

Contributors

armish avatar cannin avatar emekdemir avatar igorrodchenkov avatar istemi-bahceci avatar markwoon avatar n1zea144 avatar ozgunbabur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

paxtools's Issues

Question!

Hi there, what is the most efficient way if I have a list of around ~50 genes to output a data frame in r that contains only the controls-state-of interaction type but for all the ~50 genes? When I query using this code it just takes such a long time and is outputting way more than I need.

t1 <- graphPc(source = 50_genes, kind = "NEIGHBORHOOD", format = "SIF", verbose = TRUE, limit = 1)

sbgn-converter: put metadata to the SBGN-ML (using extensions feature).

Add useful metadata to SBGN nodes/edges (glyphs, arcs, ports) either as nested xml elements ('bp:' or better define own schema if needed) inside <extensions/> and <notes/>, or simply using <![CDATA[...]]> and encoded JSON object in it.

Include, e.g.:

  • HGNC, ChEBI, UniProt, PubMed IDs;
  • BioPAX Class (name);
  • synonyms (as <label/> contains display name already);
  • organism(s);
  • data source(s);
  • evidence ?
  • generic or not (boolean)

One way to implement this feature in sbgn-converter, in a very flexible/extensible manner, would be using key:value entries from the Paxtools BioPAX object.annotations (bpe.getAnnotations()) map (and the annotations can be created in many different ways, e.g., within cPath2 or Sifgraph methods).

Does not compile with JDK9-11

Currently, it builds only with JDK8 (and uses Java 1.6 as its source/target compatibility level).

We really need to upgrade/migrate Paxtools project configuration (POM) to make it possible at least to build with JDK9-11, a.s.a.p. And we should also increase the compatibility level to Java 1.8.

PS: I've successfully done this for other projects already, e.g., cpath2 and validator.

sbgn-converter: fix compartment IDs (and labels)

This was originally reported by Dylan (@d2fong) via email:

"
...
For compartment ids, it seems the main inconsistency between them is the capitalization of the terms. I think this seems to be an easy fix in the java sbgn-converter where we could just transform the string to lowercase before populating the compartment id.

I have also seen roughly 2 abnormal characters in ids: '-' and ','.
...
"

E.g., as the result, there are both "Cytosol" and "cytosol".

errors in the javadoc

There are errors, such as missing description for method arguments, use of

, etc., that makes 'site' goal fail (perhaps we'd use an older version of the site or javadoc plugins, etc., but it definitely worth polishing the javadoc).

TransitivePropertyAccessor infinite loop (and OutOfMemory)

The issue seems to occur when converting the large PC2v8 biopax model to GSEA or SIF.
It still needs to be confirmed where exactly, but looks like it's either in gsea-converter or, highly possible, in biopax pattern module (which uses transitive property paths a lot, e.g., "PhysicalEntity/controllerOf/controlled*:Interaction", etc.)...

Exception in thread "pool-2-thread-3" java.lang.StackOverflowError
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at java.lang.Class.toString(Class.java:151)
at java.lang.String.valueOf(String.java:2854)
at java.lang.StringBuilder.append(StringBuilder.java:128)
at org.biopax.paxtools.impl.BioPAXElementImpl.hashCode(BioPAXElementImpl.java:224)
at java.util.HashMap.hash(HashMap.java:366)
at java.util.HashMap.put(HashMap.java:496)
at java.util.HashSet.add(HashSet.java:217)
at org.biopax.paxtools.controller.TransitivePropertyAccessor.transitiveGet(TransitivePropertyAccessor.java:40)
at org.biopax.paxtools.controller.TransitivePropertyAccessor.transitiveGet(TransitivePropertyAccessor.java:42)
at org.biopax.paxtools.controller.TransitivePropertyAccessor.transitiveGet(TransitivePropertyAccessor.java:42)
at org.biopax.paxtools.controller.TransitivePropertyAccessor.transitiveGet(TransitivePropertyAccessor.java:42)
at org.biopax.paxtools.controller.TransitivePropertyAccessor.transitiveGet(TransitivePropertyAccessor.java:42)
...

It's in the recursive method private void transitiveGet(D bean, Set values)
I think, the biopax model may contain circles/loops, such as a complex contains itself (perhaps those do not have much sense and have to be fixed by the validator), but transitiveGet method must survive and handle this anyway.

One can set an incompatible EntityReference to a physical entity using the API.

Currently, Paxtools API allows one set an incompatible type EntityReference using setEntityReference(EntityReference er) method without getting any exceptions or even log messages (exception is thrown only if data are read from a file/stream, but not when one does e.g., protein.setEntityReference(smallMoleculeRef).

SBGN-ML possesses disconnected nodes

In some cases, the SBGN-ML retrieved from the PC2 web service contains several nodes that are not connected to anything else.

Failing case

Pathway: 'Signaling by BMP' -- Reactome

  • BioPAX record
  • BioPax viewed in ChiBE: chibe_signaling_bmp.png
  • SBGN-ML view: pc_search_signaling_bmp.png

Notes: Viewing the associated BioPAX record in ChiBE reveals ‘Member of’ edge connecting these nodes.

pc_search_signaling_bmp

chibe_signaling_bmp

Cannot add a new EntityReference

Minimum code to reproduce

BioPAXFactory factory = BioPAXLevel.L3.getDefaultFactory();
Model model = factory.createModel();
model.addNew(EntityReference.class, "3d8d8d63-997e-4967-9469-0da3f691f659");

Error
For version 5.2.1-SNAPSHOT:

18:21:48.865 [main] INFO org.biopax.paxtools.util.BPCollections - System property: paxtools.CollectionProvider=null
18:21:48.867 [main] INFO org.biopax.paxtools.util.BPCollections - Using the default CollectionProvider (creates HashMap, HashSet).
18:21:48.871 [main] ERROR org.biopax.paxtools.model.BioPAXFactory - Failed creating BioPAX object: interface org.biopax.paxtools.model.level3.EntityReference, URI: 3d8d8d63-997e-4967-9469-0da3f691f659
java.lang.InstantiationException: null
	at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.biopax.paxtools.model.BioPAXFactory.create(BioPAXFactory.java:76)
	at org.biopax.paxtools.impl.ModelImpl.addNew(ModelImpl.java:120)
	at factoid.model.TemplatesModel.main(TemplatesModel.java:254)
Exception in thread "main" java.lang.NullPointerException
	at org.biopax.paxtools.impl.ModelImpl.add(ModelImpl.java:158)
	at org.biopax.paxtools.impl.ModelImpl.addNew(ModelImpl.java:121)
	at factoid.model.TemplatesModel.main(TemplatesModel.java:254)

For version 5.1.0:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.lang.NullPointerException
	at org.biopax.paxtools.impl.ModelImpl.add(ModelImpl.java:158)
	at org.biopax.paxtools.impl.ModelImpl.addNew(ModelImpl.java:121)
	at BiopaxExp.BiopaxExp.Main.main(Main.java:14)

Debugging observations so far
IMG_5059

As it is show in the screenshot above constructorAccessor property is set to instantiationexceptionconstructoraccessorimpl while it is set to delegetingconstructoraccessorimpl for the other element types that I tried so far and was able to add with success (e.g. Interaction).

Complex members not part of complexes in SBGN output

When PC is queried to return the result in SBGN, it used to be the case that complex members were part of their parent/owner complexes (in SBGNML, member glyphs were simply listed inside their owner complex glyphs) but that no longer seems to be the case. From the layout of the returned pathway, it seems like members are properly contained / nested in their parent complexes but that is not the case!
Also see iVis-at-Bilkent/newt#176

Users were confused by Xref IDs in the "nodes" section of EXTENDED SIF

As we explained, there listed (unification and relationship xrefs') ID's were not designed for ID-mapping (e.g., from HGNC Symbol to NCBI Gene ID).
We cannot completely solve this due to the xrefs come from original data and sometimes multiple BioPAX entity reference types, but could probably decrease the level of confusion if we'd skip outputing for relationship Xrefs, or at least skip here - https://github.com/BioPAX/Paxtools/blob/master/pattern/src/main/java/org/biopax/paxtools/pattern/miner/ExtendedSIFWriter.java#L193 - for RelationshipXref having relationshipType other than "identity"...

sbgn-converter: fix the COSE layout, make optional, or remove this feature?..

With some BioPAX data (quite large but normal, not too complex), it takes too long to apply the default layout (currently, it's done by default), despite converting to SBGN-ML takes just a few seconds.

E.g., I run

java -Xmx16g -jar paxtools.jar toSBGN beta-pc9-transfac-pathway-CAGGTG_V.owl CAGGTG_V.sbgn.xml

, and after a few seconds, when converting from BioPAX to SBGN is done, it prints -

SBGN-PD Layout is running...
success ratio: 0.6532992762877821
enhanced ratio: 0.838058748403576
Total execution time: 1336800 miliseconds.
  • which means layout took 22 minutes (bad).

The layout is implemented in the chilay "library".

"Invalid BioPAX object to wrap as node" warnings

@ozgunbabur , I see lots of warnings in PC logs (cpath2, PC9, which uses Paxtools) like:
WARN o.b.p.q.w.GraphL3Undirected - Invalid BioPAX object to wrap as node. Ignoring: http://identifiers.org/wikipathways/WP143/dd235

Why is this a warning, is this a problem, bug?.. If not, why to log as warning instead of debug?..

Completer: an option (default) to avoid sub-pathways?

I suggest we make it at least an option for the Completer to skip sub-pathways. This would make large and nested BioPAX sub-model that can result from a PC2 web service query ('get' or 'graph') more usable and reasonable for converting to SBGN-ML, SIF and for vizualisation.

Looks, this cannot be simply done via @autocomplete annotation on properties 'pathwayComponent' and 'controlled', because the properties' range is of either Pathway or Interaction type...
But we could e.g. skip deep traversing here if the value is of type Pathway.

@ozgunbabur , @emekdemir, @gbader share your thoughts please!

Implement SIF Graph Queries

Running graph queries on the entire SIF model vs. corresponding BioPAX could greatly improve performance. We'd implement this in paxtools-query module. But there are several problems...

For example, a SIF file/model, such as Pathway Commons' one, usually contains one ID type (HGNC Symbol or UniProt AC) for genomic entities and one (ChEBI) for chemicals, and misses quite a few interactions due to lack of those IDs (we'd use a URI based version of SIF; anyway, id-mapping is required, etc..)

Refs PathwayCommons/cpath2#202
Refs #19

SimpleIOHandler prints xsd datatype as an abreviated string, which is subtly off

Dear BioPax developers,

On line 669 of the SimpleIOHandler the datatype print is slightly off.

Using a tool like rapper or riot to convert the output of the rdf/xml to n-triples, we see the difference (example from rhea-db)

<http://biopax.rhea-db.org/level3/57967_stoichiometry_right_3249> 
  <http://www.biopax.org/release/biopax-level3.owl#stoichiometricCoefficient> 
     "1.0"^^<xsd:float> .

was printed but we expect

<http://biopax.rhea-db.org/level3/57967_stoichiometry_right_3249>
  <http://www.biopax.org/release/biopax-level3.owl#stoichiometricCoefficient> 
    "1.0"^^<http://www.w3.org/2001/XMLSchema#float> .

Quite a lot of tools auto-correct this difference down stream. But if one uses a tool that is spec rdf/xml compliant this gives a difference when doing SPARQL values comparison. Changing from a numeric comparison to a string comparison.

We will open a PR with this fix soon.

Regards,
Jerven

Update to commons-lang 3

Would it be possible to switch from commons-lang 2 to commons-lang 3?

If so, I'd be willing to submit a PR.

L3ToSBGNPDConverter.java with predefined layout

@ozgunbabur

Hello Ozgun,

I'm working with the L3ToSBGNPDConverter.java trying to find a way to use pre-determined points to position the layouts as an alternative to the Chilay layout that is currently being applied in the createSBGN() method. I would like to ask you a few questions on this and how you might go about doing it.

I would really appreciate your opinion and if you have some time to chat my email is '[email protected]'.

Thanks for the consideration,

Peter

Sbgn-converter: hard-coded xml:base results in inconsistent IDs

Here, the BioPAX model xml:base should be matched instead of constant URI prefix which in fact can be different in different pathway data files...

PS:
Ideally, we'd probably URI-encode the BioPAX URIs instead of stripping the base and replacing special characters (though this should be probably done in a way to allow mapping from the SBGN XML IDs back to original BioPAX RDF/JSONLD URIs, at least for physical entities and pathways; this is very important for web apps).

Problem caused by calling layout while utilizing biopax to sbgn conversion through Python

Hi there,

I need to use Paxtools through Python. Therefore, I am running the Java code of Paxtools through Python. I am trying to load a biopax model, making some process on it and finally converting it to sbgn by using sbgn-converter of paxtools. Everything works fine except the biopax to sbgn conversion.

The problem that happens for the conversion step is that when I call writeSBGN() of L3ToSBGNPDConverter the python rocketship icon appears and then the process had not end though I have waited for a while.

I figured out that the problem is gone when I tried to totally eliminate the layout call inside createSBGN() by commenting out the fallowing code segment:

final boolean layout = doLayout && n < this.maxNodes && !arcMap.isEmpty();
try {
	//Must call this, although actual layout might never run;
	//in some real data tests, skipping createLayout method
	//led to malformed SBGN model, unfortunately...
	(new SBGNLayoutManager()).createLayout(sbgn, layout);
} catch (Exception e) {
	throw new RuntimeException("SBGN Layout of " + model.getXmlBase()
			+ ((model.getName()==null) ? "" : model.getName()) + " failed.", e);
}
if(!layout) log.warn(String.format("No layout, for either " +
		"it's disabled: %s, or ~ no. nodes > %s: %s, or - no edges: %s",
		!doLayout, maxNodes, n>maxNodes, arcMap.isEmpty()));

The problem persists when I try to prevent the layout by setting the doLayout parameter as false. BTW I have tried two different libraries to run Java through Python. That libraries are jnius and jpype. The same thing has happened in both cases.

I have a few questions considering these:

  • Do you have any clue about the actual reason why this would be happening? I mean what may be happening differently during the layout? I may try to find a solution based on that reason.
  • Would you consider to make the the layout totally optional (I mean even not calling by setting the doLayout parameter as false)
  • How and why skipping layout causes the malformed sbgn model when the layout is skipped?
  • (If needed) Am I allowed to modify the Paxtools source code in my own fork to create my own jar?

NPE in Searcher (pattern)

When using the paxtools pattern librry's SIFSearcher from pcviz (uniprot-scraper script), got several of these exceptions:
Exception in thread "pool-7071-thread-1" Exception in thread "pool-7071-thread-6" Exception in thread "pool-7071-thread-2" Exception in thread "pool-7071-thread-3" Excep
tion in thread "pool-7071-thread-11" Exception in thread "pool-7071-thread-4" Exception in thread "pool-7071-thread-8" Exception in thread "pool-7071-thread-5" java.lang
.NullPointerException
at org.biopax.paxtools.pattern.Searcher.search(Searcher.java:174)
at org.biopax.paxtools.pattern.Searcher.search(Searcher.java:158)
at org.biopax.paxtools.pattern.miner.SIFSearcher$1.run(SIFSearcher.java:163)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
...

  • which seems to be caused by a concurrency issue (guess: Searcher, Miner are not thread-safe enough...; also, as I can see, each thread in SIFSearcher then also creates child threads due to Searcher runs multiple threads as well...)

@AutoComplete in fact did not work

Completer's filter uses the default values (forward=true;backward=false) for all the object properties and thus follows e.g. nextStep property despite this annotation. I tested and confirmed that this always returns null.
I suspect the annotation is not available at runtime (probably, due to no retention value is there set)...

This also causes PathwayCommons/cpath2#267.

(I build, test and run Paxtools with Oracle JDK1.8 on Ubuntu Linux; that might matter...)
@ozgunbabur @emekdemir

Almost fixed this bug.. wait.

Cloner: bogus parameter (Model); not thread-safe (but it could be safe)

It's easy to see that Cloner object does not actually use the Model passed via the first parameter of the clone method.

The Cloner defines a Traverser with the Visitor being the Cloner itself, implementing 'visit' method, which does not use the "source" model at all; it only uses the internal property - 'targetModel', which is also not a good idea, because 'targetModel' gets renewed and returned every time the clone(..) method is called; so Cloner cannot be run in several threads...

Possible fix:

  • remove the model parameter from the clone(..) method (deprecate current method and set source=null in there); i.e., have a method public synchronized Model clone(Set<BioPAXElement> toBeCloned) instead;
  • remove the private field: targetModel from the Cloner class;
  • the 'clone' method should create a new local variable Model (to be returned) as Model targetModel = factory.createModel(); and pass it to the traverser as traverser.traverse(bpe, targetModel); (instead of passing 'source' there);
  • the 'visit' method should then use model (argument) instead of 'targetModel';

@ozgunbabur , @emekdemir please review this idea (I can fix then)!

sbgn-converter: include "empty" and blackbox sub-pathways.

The idea is to make the sbgn-converter generate nodes/edges for "empty" and blackbox sub-pathways
(i.e., for those that have no interactions in them originally or as the result of detaching the sub-network from a larger, e.g. Pathway Commons', BioPAX model).

another loop and overflow

ModelUtils.getParentPathways (paxtools-core) and therefore SearchEngine.index (paxtools-search) gets into endless loop when the translated, normalized KEGG data is used (or any model that integrates that data).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.