romanolab / comptox_ai Goto Github PK
View Code? Open in Web Editor NEWComptoxAI - An artificial Intelligence toolkit for computational toxicology
Home Page: https://comptox.ai/
License: Other
ComptoxAI - An artificial Intelligence toolkit for computational toxicology
Home Page: https://comptox.ai/
License: Other
Currently there is no way to access the Python package documentation on the website due to a URL clash between the REST api docs and the Python api documentation. All URLs with comptox.ai/api
are currently being directed to the REST API application.
Memgraph has been growing and shows promise, potentially improving on Neo4j's shortcomings. It seems worthwhile to determine whether or not we should migrate at some point in the future. This can start out with something as simple as lists of pros/cons for the two technologies comparing across the two. Some aspects that make sense to take into account include:
Users should be able to run the following sample code to build a simple link prediction task on a subgraph containing chemicals, diseases, and genes:
from comptox_ai.ml.nn import NeuralNetwork
nn = NeuralNetwork(model='link-prediction')
# Load the data by calling the new routines you added to comptox_ai.db.graph_db
nn.load_data(node_types=['Chemical', 'Gene', 'Disease'])
# Train the model
nn.fit()
# Return predicted links
nn.predict()
The conda environment.yml
and pip setup.py
files that specify prerequisites haven't been thoroughly tested or recently updated. We need to make it easy for any user to run the respective install commands and have it 'just work'.
A few contributing issues:
numpy
and Pandas
version mismatchesmysql
installation in order to install the python mysqlclient
Not all users will need to build the full comptox_ai database from scratch. Therefore, we can also remove the dependency on ista
. Users who want to build it from scratch can install ista
separately and deal with the database dependencies, etc. at that point in time.
To semi-future-proof this, we should create 2 GitHub actions - one that builds/installs ComptoxAI via conda
, and one that does so for pip
.
There could be a possible missmatch for the edges between chemicals and the assay node rt-viability-hepg2-p1.
There are more edges 'CHEMICALHASACTIVEASSAY' than 'CHEMICALHASINACTIVEASSAY'.
I think that the actives and the inactives chemicals for this assay are just inverted.
Currently, the basic style guides for Comptox_ai isn't up to date.
We plan to use the following style guides:
Python - PEP 8
Python Documentation - Numpydoc
Javascript/Typescript - Airbnb
We should update the dependencies and install the necessary packages needed to implement the style guides.
We're observing issues with LaTeX rendering on both the Comptox AI and EC2 pages. The content intended to be displayed in LaTeX format isn't being rendered as expected.
Comptox AI Page:
EC2 Page:
We need to ensure that the required packages and libraries for LaTeX rendering are present and configured correctly, especially on the EC2 page.
Some elements on the data browser app (see web/packages/app/src/App.css
and others) have CSS styling that conflicts with the overall website CSS. This results in some issues related to the style in the app, which are overridden by the overall website styles. See, for example, the space above section headings (right above "Relationships"):
This can probably be fixed by adding CSS classes to elements in the app itself that override the general website CSS for that element type.
There are sometimes multiple DSSTox IDs mapping to a single PubChem CID. There are multiple considerations with this:
In b049114 we enabled support for remote databases, and added a number of sanity checks to make sure that (a.) the user passes good options either via a config file or via parameters, and (b.) a valid connection was indeed established after the Neo4j bolt driver is created.
However, we currently cannot use neo4j.comptox.ai as a hostname. Tests using either the neo4j:// or bolt:// protocols failed to provide a good result. It is currently unclear whether this is a local issue (e.g., it can be fixed by tweaking comptox_ai.db.GraphDB._connect()
) or if it is something that will require tweaking on the side of the ComptoxAI server (e.g., a DNS issue, NGINX configuration, etc.).
While working on [Issue-#63], it was discovered that the Data Portal on the AWS deployed version was an old version. The issue was identified after the merge of [PR-#72], but it could have been caused by PRs before #72. In trying to figure out the cause, it was identified that the React App was not being build properly in the web directory itself.
This issue is made to resolve the React App build issue in the scope of web/app directory. After React build in web/app is fixed, the issue will be closed. The issue regarding AWS version will be handled in either by creating a subsequent issue or in [Issue-#63].
We use an OWL ontology to structure the database before importing it into Neo4j. However, the neosemantics (n10s) library by default sticks a namespace prefix (e.g., "owl__") before every entity. This should be fixed. Perhaps it just needs to be imported to neo4j using a different parameter in n10s (easy fix).
This GitHub issue is dedicated to tracking all activities related to managing Python and JavaScript packages within Comptox AI. It serves as a central location for documenting tasks such as installing new packages, updating existing ones, resolving dependency conflicts, and addressing package-related errors. The purpose of this tracking is to maintain a clear and organized record of our package management efforts for Comptox AI.
Building on the Node
class (referenced in Issue-#102), we also need a Chemical subclass. This will help handle the specific data types we use in Comptox AI.
Objectives:
Node
class provides a generic interface, the Chemical
subclass will be tailored to handle attributes and methods specific to chemical data.Chemical
subclass can seamlessly interact with the Comptox AI Neo4j database and pull chemical-specific data.Key Tasks:
Chemical
nodes.fetch_chemicals
, to retrieve and process chemical data from the database.Chemical
class is able to parse and present data in a Python-friendly format.Currently we don't have a method that fetches the list of chemical ids by the 'type of id (ex. CasRN, DTXSID).'
The goal is to create a method called fetchChemicals which takes input parameter of 'type of id (ex. CasRN, DTXSID)' and 'list of ids' to return a list of chemicals matching those id list input.
TODO:
The current implementation of the Node
class is inadequate and lacks complete functionality. In order to seamlessly interact with the Comptox AI Neo4j database, it's essential that we redesign and implement a more robust Node
class.
Objectives:
Node
class to be modular. This will ensure it efficiently handles different types of graph data, including node labels, relationships, and properties.Chemical
, facilitating the inclusion of more specific node formats as needed.Key Features to Implement:
Chemical
.We need to define a GitHub action that builds the ComptoxAI website, including using Sphinx to generate HTML for all of the documentation. In summary, the action should:
make html
in the docs/
directoryweb/packages/app
)There are some very simple unit tests in the tests/
directory, but they are far from complete. We should continue to add unit tests for all major features, and include the tests in GitHub actions (see #42 for related discussion to this end).
The REST API does not sufficiently sanitize inputs. For example, when you perform a node search by CasRN, the following query should work:
https://comptox.ai/api/nodes/Chemical/search?field=xrefCasRN&value=1071-83-6
However, an error is received:
{
"message": "No results found for user query",
"query": "MATCH (n:Chemical) WHERE n.xrefCasRN = 1071-83-6 RETURN n, id(n);",
"result": {
"records": [],
"summary": {
"query": {
"text": "MATCH (n:Chemical) WHERE n.xrefCasRN = 1071-83-6 RETURN n, id(n);",
"parameters": {}
},
"queryType": "r",
"counters": {
"_stats": {
"nodesCreated": 0,
"nodesDeleted": 0,
"relationshipsCreated": 0,
"relationshipsDeleted": 0,
"propertiesSet": 0,
"labelsAdded": 0,
"labelsRemoved": 0,
"indexesAdded": 0,
"indexesRemoved": 0,
"constraintsAdded": 0,
"constraintsRemoved": 0
},
"_systemUpdates": 0
},
"updateStatistics": {
"_stats": {
"nodesCreated": 0,
"nodesDeleted": 0,
"relationshipsCreated": 0,
"relationshipsDeleted": 0,
"propertiesSet": 0,
"labelsAdded": 0,
"labelsRemoved": 0,
"indexesAdded": 0,
"indexesRemoved": 0,
"constraintsAdded": 0,
"constraintsRemoved": 0
},
"_systemUpdates": 0
},
"plan": false,
"profile": false,
"notifications": [],
"server": {
"address": "165.123.13.192:7687",
"version": "Neo4j/4.4.0",
"protocolVersion": 4.2
},
"resultConsumedAfter": {
"low": 397,
"high": 0
},
"resultAvailableAfter": {
"low": 1,
"high": 0
},
"database": {
"name": "neo4j"
}
}
}
}
The solution is to appropriately wrap the CasRN in double quotes (e.g., n.xrefCasRN = "1071-83-6"
), but the API does not do this.
Other instances of inputs that fail due to lack of sanitization are likely, but may be challenging to find in the absence of more robust testing and/or user-submitted bug reports.
We will need to create a GitHub action that deploys the website to comptox.ai
. This action should run any time a new release of ComptoxAI is created, and should only trigger if the action for building the website is successful (see #51 for details).
Currently, the website is hosted on a physical server on the UPenn campus, but we will be migrating to AWS in the near future. This action should target AWS rather than the physical server, so we may need to delay implementation of this action until the migration to AWS has been completed.
PR-#95 removed references to the Py2neo package and, in doing so, commented out the Graph
class in graph/graph.py
. This has led to an error when building the Sphinx documentation using make html
. The issue arises when other sections of the codebase, such as graph/io.py
, try to reference the now-missing Graph
class.
A solution to this would be to partially revert PR-#95, ensuring that we only retain the relevant changes. Instead of entirely commenting out methods or classes that interact with the package, we should just comment out the specific Py2neo-related sections. In their place, appropriate error-handling logic should be introduced.
It's imperative to run comprehensive tests post-revision, including both Sphinx documentation builds and Pytest checks, to validate the changes and confirm that no remnants of the package linger adversely.
As we continue to refine our documentation for Comptox AI, we've reevaluated our approach to rendering mathematical equations. This issue provides a comparison between using sphinx-mathjax-offline
and a direct integration of MathJax via npm, highlighting why the latter is advantageous for our project.
Method: Leverages the MathJax JavaScript library bundled within the extension to render LaTeX math equations in browsers.
Dependencies: Direct dependency on the extension itself, which includes a version of MathJax.
Pros:
Cons:
Method: Incorporates the MathJax library directly into the documentation's static assets. This method avoids relying on any particular Sphinx extension.
Dependencies: Requires an npm installation and a direct npm package dependency on MathJax.
Pros:
Cons:
Our primary motivations for transitioning to a direct MathJax integration are:
sphinx-mathjax-offline
not loading as expected post-build. Using MathJax directly proved more reliable.Given these considerations, we've decided to implement MathJax directly via npm, ensuring more control, stability, and reliability for our mathematical renderings in the Comptox AI documentation.
The following example script should be used as a template for the "neo4j -> pytorch" method:
>>> from comptox_ai.db.graph_db import GraphDB
>>> db = GraphDB()
>>> db.create_graph_native_projection(name='example_graph', node_types=['ns0__Gene', 'ns0__StructuralEntity', 'ns0__Disease'])
>>> db.to_pytorch(graph_name='example_graph')
The to_pytorch()
method should return a torch_geometric.data.Dataset
object, as described on this page: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html
As of PR-#92, the Py2neo package has been removed from the project's dependencies. To ensure the project's codebase is up to date, it's necessary to remove all references to the Py2neo package from the code.
This includes any imports, function calls, or code sections that use Py2neo-specific functionality. Review the entire codebase and make the necessary changes to eliminate any dependencies on Py2neo.
Some of the codes are not formatted to the standardized format according to the style guides implemented in issue-#47.
The goal is to beautify all the codes currently in comptox_ai in order to
The formatting task should be a long term task done from time to time with low priority. It should be done by bundling files into relevant categories and fixing each bundle per commit (e.g. app components, python tests, python functions, react tests).
Communication between team members is essential in order to avoid merge conflicts and the bundles chosen to beautify should be codes that are not currently being worked on.
Problem:
Currently, the 'Python package' badge and 'React build' badge on the GitHub README page are displaying a 'failed' status for their respective tests.
Proposed Solution:
To resolve this issue, we need to perform the following steps:
Test Verification: First, we should ensure that the tests are working correctly as intended. This involves running the tests individually to verify their functionality.
Identify Root Cause: We need to investigate whether the failures in the tests are causing the issue with the badges. If the tests are indeed failing, we should identify the root cause of these failures.
Badge Status Alignment: We should verify whether the test status is correctly reflected in the status badges on the README page. If there is a mismatch, we'll need to correct it.
By following these steps, we aim to ensure that the badges accurately reflect the status of our tests on the README page.
The API Docs page on the comptox.ai website cannot be loaded properly due to conflicts with Swagger occurring from path naming.
To resolve this, we need to do the following:
The current system of Comptox AI website is structured to host the webpage in AWS EC2, but the data-related part are stored in the university server.
The conf file in EC2 must be modified so that the redirection to the servers reflects this division of servers when tasks are called.
TODO:
The correct URL for the REST API documentation is https://comptox.ai/api/help/. Anything else (e.g.,https://comptox.ai/api/index.html) will likely fail, and therefore all references to bad URLs should be identified and removed.
When improving our documentation setup for Comptox AI, we faced the decision of which Sphinx extension to use for math rendering. This issue provides a detailed comparison of sphinx.ext.imgmath
and sphinx-mathjax-offline
, which informed our decision.
sphinx.ext.imgmath
:dvipng
, dvisvgm
, or convert
(from ImageMagick) command to be on the system. This involves converting LaTeX to DVI and then to the desired image format.Pros:
Cons:
sphinx-mathjax-offline
:Pros:
sphinx.ext.mathjax
which retrieves MathJax from a CDN.Cons:
Decision:
Given our specific requirements, especially concerning our EC2 setup and memory constraints, we opted to use sphinx-mathjax-offline
. This decision prioritizes scalable rendering, offline capabilities, and avoids the necessity of a resource-heavy LaTeX installation on our EC2 instance.
UMLS CUIs are used for diseases, but similar clinical concepts can be annotated to many of the other node types. This will greatly enhance future work integrating ComptoxAI with observational and/or clinical data.
Some candidate concepts that could be added across node types:
It may be most effective to use more than one of the above, where applicable.
Currently, users can either query the Neo4j database directly at http://neo4j.comptox.ai:7474, or they can use pre-defined routes within the REST API at http://comptox.ai/api, but we need to add support for raw Cypher queries directly within the ComptoxAI REST API for improved usability.
Currently, all web services (static website, REST API, graph database, chemical structure database) are hosted on the same physical server. We should migrate the website itself, along with the REST API app, to an Amazon AWS EC2 instance, leaving only the databases on the current server.
The (work-in-progress) action to deploy the website on new releases should be written to automatically deploy to AWS once the instance is up and running. Following successful completion of this, we will migrate the www.comptox.ai domain to the new EC2 instance.
We've identified an issue with the PathSearch function in our React app—it's not delivering the expected results. You can observe the specific error on our page: https://comptox.ai/data.html.
Steps to Address the Issue:
Broader Context & Future Consideration:
The PathSearch function is intrinsically tied to the react-d3-graph
package. Given that we're considering replacing react-d3-graph
—primarily because it's the root of our package conflicts and it's no longer maintained—it's imperative we have a fully functional PathSearch.
Just a note on the package concerns: Using react-d3-graph
currently forces us into using npm install --legacy-peer-deps
with React 17. This is not a sustainable practice, as it might lead to problems when updating other packages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.