romanolab / comptox_ai Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 5.0 12.47 MB

ComptoxAI - An artificial Intelligence toolkit for computational toxicology

Home Page: https://comptox.ai/

License: Other

Python 72.56% Shell 2.98% JavaScript 23.44% HTML 0.37% CSS 0.64%

ai data diseases graph-database graph-machine-learning neo4j ontology phenotypes

comptox_ai's People

Contributors

Stargazers

Watchers

Forkers

pyz2020 van-truong th5 tosaddler travyse

comptox_ai's Issues

Change api docs website directory from `api/` to `docs/`

Currently there is no way to access the Python package documentation on the website due to a URL clash between the REST api docs and the Python api documentation. All URLs with comptox.ai/api are currently being directed to the REST API application.

Evaluate cost/benefit of staying with Neo4j vs. moving to Memgraph (or another graph db)

Memgraph has been growing and shows promise, potentially improving on Neo4j's shortcomings. It seems worthwhile to determine whether or not we should migrate at some point in the future. This can start out with something as simple as lists of pros/cons for the two technologies comparing across the two. Some aspects that make sense to take into account include:

Cost of enterprise license
Speed of queries and graph data science algorithms
Features in the REST API interface
Features in Python front-ends
Breadth of machine learning algorithms available
Computational footprint for graphs that are sized similarly to ComptoxAI

Create link prediction GNN model

Users should be able to run the following sample code to build a simple link prediction task on a subgraph containing chemicals, diseases, and genes:

from comptox_ai.ml.nn import NeuralNetwork

nn = NeuralNetwork(model='link-prediction')

# Load the data by calling the new routines you added to comptox_ai.db.graph_db
nn.load_data(node_types=['Chemical', 'Gene', 'Disease'])

# Train the model
nn.fit()

# Return predicted links
nn.predict()

Update conda and pip requirements

The conda environment.yml and pip setup.py files that specify prerequisites haven't been thoroughly tested or recently updated. We need to make it easy for any user to run the respective install commands and have it 'just work'.

A few contributing issues:

numpy and Pandas version mismatches
New syntax for creating and installing Conda environments
Reliance on having a working mysql installation in order to install the python mysqlclient

Not all users will need to build the full comptox_ai database from scratch. Therefore, we can also remove the dependency on ista. Users who want to build it from scratch can install ista separately and deal with the database dependencies, etc. at that point in time.

To semi-future-proof this, we should create 2 GitHub actions - one that builds/installs ComptoxAI via conda, and one that does so for pip.

Edges missmatch for assay node rt-viability-hepg2-p1

There could be a possible missmatch for the edges between chemicals and the assay node rt-viability-hepg2-p1.
There are more edges 'CHEMICALHASACTIVEASSAY' than 'CHEMICALHASINACTIVEASSAY'.
I think that the actives and the inactives chemicals for this assay are just inverted.

Implement basic code formatting style guide

Currently, the basic style guides for Comptox_ai isn't up to date.

We plan to use the following style guides:
Python - PEP 8
Python Documentation - Numpydoc
Javascript/Typescript - Airbnb

We should update the dependencies and install the necessary packages needed to implement the style guides.

Improper LaTeX Rendering on Comptox AI and EC2 Pages

We're observing issues with LaTeX rendering on both the Comptox AI and EC2 pages. The content intended to be displayed in LaTeX format isn't being rendered as expected.

Comptox AI Page:
- Issue: LaTeX not rendering correctly.
- Reference: Comptox AI Page Link
EC2 Page:
- Issue: LaTeX doesn't appear to be installed or functioning properly.

We need to ensure that the required packages and libraries for LaTeX rendering are present and configured correctly, especially on the EC2 page.

Fix CSS styling in data browser app

Some elements on the data browser app (see web/packages/app/src/App.css and others) have CSS styling that conflicts with the overall website CSS. This results in some issues related to the style in the app, which are overridden by the overall website styles. See, for example, the space above section headings (right above "Relationships"):

This can probably be fixed by adding CSS classes to elements in the app itself that override the general website CSS for that element type.

Resolve multiple matches for CIDs

There are sometimes multiple DSSTox IDs mapping to a single PubChem CID. There are multiple considerations with this:

Do we copy CID data into multiple chemical records when more than one CID maps to a DSSTox ID? This leads to a semantic conflict.
Alternatively, do we switch from Chemical nodes mapping to a single DSSTox ID to instead map to a single CID? This would require us to keep arrays of DSSTox-related fields.

Can't use `neo4j.comptox.ai` as hostname

In b049114 we enabled support for remote databases, and added a number of sanity checks to make sure that (a.) the user passes good options either via a config file or via parameters, and (b.) a valid connection was indeed established after the Neo4j bolt driver is created.

However, we currently cannot use neo4j.comptox.ai as a hostname. Tests using either the neo4j:// or bolt:// protocols failed to provide a good result. It is currently unclear whether this is a local issue (e.g., it can be fixed by tweaking comptox_ai.db.GraphDB._connect()) or if it is something that will require tweaking on the side of the ComptoxAI server (e.g., a DNS issue, NGINX configuration, etc.).

Fix React app build error

While working on [Issue-#63], it was discovered that the Data Portal on the AWS deployed version was an old version. The issue was identified after the merge of [PR-#72], but it could have been caused by PRs before #72. In trying to figure out the cause, it was identified that the React App was not being build properly in the web directory itself.

This issue is made to resolve the React App build issue in the scope of web/app directory. After React build in web/app is fixed, the issue will be closed. The issue regarding AWS version will be handled in either by creating a subsequent issue or in [Issue-#63].

Remove namespace prefixes from graph database

We use an OWL ontology to structure the database before importing it into Neo4j. However, the neosemantics (n10s) library by default sticks a namespace prefix (e.g., "owl__") before every entity. This should be fixed. Perhaps it just needs to be imported to neo4j using a different parameter in n10s (easy fix).

Package Management Tracking for Comptox AI

This GitHub issue is dedicated to tracking all activities related to managing Python and JavaScript packages within Comptox AI. It serves as a central location for documenting tasks such as installing new packages, updating existing ones, resolving dependency conflicts, and addressing package-related errors. The purpose of this tracking is to maintain a clear and organized record of our package management efforts for Comptox AI.

Implement Chemical Subclass Extending the Node Class

Building on the Node class (referenced in Issue-#102), we also need a Chemical subclass. This will help handle the specific data types we use in Comptox AI.

Objectives:

Specialization: While the Node class provides a generic interface, the Chemical subclass will be tailored to handle attributes and methods specific to chemical data.
Integration: Ensure the Chemical subclass can seamlessly interact with the Comptox AI Neo4j database and pull chemical-specific data.

Key Tasks:

Define attributes unique to Chemical nodes.
Integrate with current chemical handling methods, such as fetch_chemicals, to retrieve and process chemical data from the database.
Ensure the Chemical class is able to parse and present data in a Python-friendly format.

Create fetchChemicals method

Currently we don't have a method that fetches the list of chemical ids by the 'type of id (ex. CasRN, DTXSID).'
The goal is to create a method called fetchChemicals which takes input parameter of 'type of id (ex. CasRN, DTXSID)' and 'list of ids' to return a list of chemicals matching those id list input.

TODO:

Create method fetchChemicals

Revise Node class for better integration with Comptox AI Neo4j data

The current implementation of the Node class is inadequate and lacks complete functionality. In order to seamlessly interact with the Comptox AI Neo4j database, it's essential that we redesign and implement a more robust Node class.

Objectives:

Modularity: Design the Node class to be modular. This will ensure it efficiently handles different types of graph data, including node labels, relationships, and properties.
Extensibility: The class should serve as a foundational layer that can be easily extended. This will pave the way for future subclasses like Chemical, facilitating the inclusion of more specific node formats as needed.

Key Features to Implement:

Robust methods to manage node properties.
Ability to handle diverse node labels and relationships.
Seamless integration capabilities with the Neo4j database.
A clear structure that allows for easy subclassing for specific node types, such as Chemical.

Add GitHub Action for building the website

We need to define a GitHub action that builds the ComptoxAI website, including using Sphinx to generate HTML for all of the documentation. In summary, the action should:

Install the python package along with any dependencies needed to build the documentation
Run make html in the docs/ directory
Build the dynamic data browsing React app (found in web/packages/app)
Copy the compiled version of the app into the correct location in the website's file structure

Add unit tests for python package

There are some very simple unit tests in the tests/ directory, but they are far from complete. We should continue to add unit tests for all major features, and include the tests in GitHub actions (see #42 for related discussion to this end).

Sanitize API inputs

The REST API does not sufficiently sanitize inputs. For example, when you perform a node search by CasRN, the following query should work:

https://comptox.ai/api/nodes/Chemical/search?field=xrefCasRN&value=1071-83-6

However, an error is received:

{
  "message": "No results found for user query",
  "query": "MATCH (n:Chemical) WHERE n.xrefCasRN = 1071-83-6 RETURN n, id(n);",
  "result": {
    "records": [],
    "summary": {
      "query": {
        "text": "MATCH (n:Chemical) WHERE n.xrefCasRN = 1071-83-6 RETURN n, id(n);",
        "parameters": {}
      },
      "queryType": "r",
      "counters": {
        "_stats": {
          "nodesCreated": 0,
          "nodesDeleted": 0,
          "relationshipsCreated": 0,
          "relationshipsDeleted": 0,
          "propertiesSet": 0,
          "labelsAdded": 0,
          "labelsRemoved": 0,
          "indexesAdded": 0,
          "indexesRemoved": 0,
          "constraintsAdded": 0,
          "constraintsRemoved": 0
        },
        "_systemUpdates": 0
      },
      "updateStatistics": {
        "_stats": {
          "nodesCreated": 0,
          "nodesDeleted": 0,
          "relationshipsCreated": 0,
          "relationshipsDeleted": 0,
          "propertiesSet": 0,
          "labelsAdded": 0,
          "labelsRemoved": 0,
          "indexesAdded": 0,
          "indexesRemoved": 0,
          "constraintsAdded": 0,
          "constraintsRemoved": 0
        },
        "_systemUpdates": 0
      },
      "plan": false,
      "profile": false,
      "notifications": [],
      "server": {
        "address": "165.123.13.192:7687",
        "version": "Neo4j/4.4.0",
        "protocolVersion": 4.2
      },
      "resultConsumedAfter": {
        "low": 397,
        "high": 0
      },
      "resultAvailableAfter": {
        "low": 1,
        "high": 0
      },
      "database": {
        "name": "neo4j"
      }
    }
  }
}

The solution is to appropriately wrap the CasRN in double quotes (e.g., n.xrefCasRN = "1071-83-6"), but the API does not do this.

Other instances of inputs that fail due to lack of sanitization are likely, but may be challenging to find in the absence of more robust testing and/or user-submitted bug reports.

Add GitHub action for deploying the ComptoxAI website

We will need to create a GitHub action that deploys the website to comptox.ai. This action should run any time a new release of ComptoxAI is created, and should only trigger if the action for building the website is successful (see #51 for details).

Currently, the website is hosted on a physical server on the UPenn campus, but we will be migrating to AWS in the near future. This action should target AWS rather than the physical server, so we may need to delay implementation of this action until the migration to AWS has been completed.

Fix `make html` Build Error: `Graph` Class Not Defined

Reference: Issue-#93, PR-#95

PR-#95 removed references to the Py2neo package and, in doing so, commented out the Graph class in graph/graph.py. This has led to an error when building the Sphinx documentation using make html. The issue arises when other sections of the codebase, such as graph/io.py, try to reference the now-missing Graph class.

A solution to this would be to partially revert PR-#95, ensuring that we only retain the relevant changes. Instead of entirely commenting out methods or classes that interact with the package, we should just comment out the specific Py2neo-related sections. In their place, appropriate error-handling logic should be introduced.

It's imperative to run comprehensive tests post-revision, including both Sphinx documentation builds and Pytest checks, to validate the changes and confirm that no remnants of the package linger adversely.

Comparison: `sphinx-mathjax-offline` vs Direct MathJax Integration

As we continue to refine our documentation for Comptox AI, we've reevaluated our approach to rendering mathematical equations. This issue provides a comparison between using sphinx-mathjax-offline and a direct integration of MathJax via npm, highlighting why the latter is advantageous for our project.

sphinx-mathjax-offline:

Method: Leverages the MathJax JavaScript library bundled within the extension to render LaTeX math equations in browsers.

Dependencies: Direct dependency on the extension itself, which includes a version of MathJax.

Pros:

Uses an offline version of MathJax, making it functional even without an internet connection.
No additional installation required beyond the Sphinx extension.
More accessible to screen readers compared to image-based methods.

Cons:

Version limitations: The version of MathJax used might not always be the latest, leading to potential compatibility or feature issues.
Versioning issues due to dependency on the Sphinx version, which can lead to compatibility conflicts or unexpected behaviors.
Might increase page load times if there's significant mathematical content.
Possible documentation bloat due to bundled MathJax scripts.

Direct MathJax Integration (via npm):

Method: Incorporates the MathJax library directly into the documentation's static assets. This method avoids relying on any particular Sphinx extension.

Dependencies: Requires an npm installation and a direct npm package dependency on MathJax.

Pros:

Flexibility: Allows direct control over the version of MathJax, ensuring that we can always utilize the latest features and improvements.
Stability: Reduces potential conflicts with Sphinx versions or other extensions.
Direct access to the entire MathJax library, enabling potential custom configurations or advanced features.
Reduced risk of unexpected behaviors, as there's no middle-man extension.

Cons:

Manual maintenance: Any updates or changes to MathJax require manual intervention rather than a simple extension update.
Requires a more hands-on integration process during the initial setup.

Decision:

Our primary motivations for transitioning to a direct MathJax integration are:

Version Control: We're not using the newest version of Sphinx due to dependency constraints. By directly integrating MathJax, we reduce potential conflicts that might arise from Sphinx extensions targeting specific Sphinx versions.
Reliability: We encountered issues with sphinx-mathjax-offline not loading as expected post-build. Using MathJax directly proved more reliable.
Flexibility and Stability: A direct integration provides more control over our documentation's behavior, reducing the layers of abstraction and potential points of failure.

Given these considerations, we've decided to implement MathJax directly via npm, ensuring more control, stability, and reliability for our mathematical renderings in the Comptox AI documentation.

Related issues: #87, #101, #105

Tweak `graph_db` to run example script

The following example script should be used as a template for the "neo4j -> pytorch" method:

>>> from comptox_ai.db.graph_db import GraphDB
>>> db = GraphDB()
>>> db.create_graph_native_projection(name='example_graph', node_types=['ns0__Gene', 'ns0__StructuralEntity', 'ns0__Disease'])
>>> db.to_pytorch(graph_name='example_graph')

The to_pytorch() method should return a torch_geometric.data.Dataset object, as described on this page: https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html

Remove all code references to Py2neo

As of PR-#92, the Py2neo package has been removed from the project's dependencies. To ensure the project's codebase is up to date, it's necessary to remove all references to the Py2neo package from the code.

This includes any imports, function calls, or code sections that use Py2neo-specific functionality. Review the entire codebase and make the necessary changes to eliminate any dependencies on Py2neo.

Beautify code to match standardized style guide

Some of the codes are not formatted to the standardized format according to the style guides implemented in issue-#47.

The goal is to beautify all the codes currently in comptox_ai in order to

apply the newly agreed style guide and have a standardized coding format for the team
minimize format changes when making feature pull requests so that the modifications better reflect the changes in feature codes, not the format changes
clean unnecessary codes and allow efficient code revision

The formatting task should be a long term task done from time to time with low priority. It should be done by bundling files into relevant categories and fixing each bundle per commit (e.g. app components, python tests, python functions, react tests).

Communication between team members is essential in order to avoid merge conflicts and the bundles chosen to beautify should be codes that are not currently being worked on.

Fix test badge displaying failure on GitHub README page

Problem:
Currently, the 'Python package' badge and 'React build' badge on the GitHub README page are displaying a 'failed' status for their respective tests.

Proposed Solution:
To resolve this issue, we need to perform the following steps:

Test Verification: First, we should ensure that the tests are working correctly as intended. This involves running the tests individually to verify their functionality.
Identify Root Cause: We need to investigate whether the failures in the tests are causing the issue with the badges. If the tests are indeed failing, we should identify the root cause of these failures.
Badge Status Alignment: We should verify whether the test status is correctly reflected in the status badges on the README page. If there is a mismatch, we'll need to correct it.

By following these steps, we aim to ensure that the badges accurately reflect the status of our tests on the README page.

Fix API documentation page load error on Comptox AI website

The API Docs page on the comptox.ai website cannot be loaded properly due to conflicts with Swagger occurring from path naming.

To resolve this, we need to do the following:

Rename the docs/source/api folder to docs/source/docs
Rename the navbar option to match the change above
Change all the links to the docs/api to docs/docs

Modify nginx conf in AWS EC2

The current system of Comptox AI website is structured to host the webpage in AWS EC2, but the data-related part are stored in the university server.

The conf file in EC2 must be modified so that the redirection to the servers reflects this division of servers when tasks are called.

TODO:

Copy conf file from Comptox AI ssh
Modify redirection route of copied conf file
Store the modified conf file to EC2

Update all references to API docs to point to correct URL

The correct URL for the REST API documentation is https://comptox.ai/api/help/. Anything else (e.g.,https://comptox.ai/api/index.html) will likely fail, and therefore all references to bad URLs should be identified and removed.

Comparison: `sphinx.ext.imgmath` vs `sphinx-mathjax-offline`

When improving our documentation setup for Comptox AI, we faced the decision of which Sphinx extension to use for math rendering. This issue provides a detailed comparison of sphinx.ext.imgmath and sphinx-mathjax-offline, which informed our decision.

`sphinx.ext.imgmath`:

Method: Converts LaTeX math equations into images.
Dependencies: Requires dvipng, dvisvgm, or convert (from ImageMagick) command to be on the system. This involves converting LaTeX to DVI and then to the desired image format.

Pros:

No dependence on JavaScript: Renders equations as images, so no need for users to have JavaScript enabled.
Provides a consistent appearance across all platforms and browsers.

Cons:

Image rendering might not be as sharp on all devices or resolutions, especially compared to font-based methods.
Results in larger documentation size due to the use of images.
Less accessible to screen readers.
Potential color discrepancies between the doc's background and the generated image.
Requires a LaTeX installation when building on systems like our EC2 instance. This can be problematic given the memory limitations of our free-tier EC2 setup.

`sphinx-mathjax-offline`:

Method: Utilizes the MathJax JavaScript library to render LaTeX math equations in browsers using an offline version of the library.
Dependencies: None outside of the extension, as it includes a version of MathJax with the documentation.

Pros:

Renders crisply: Uses web or local fonts for display, ensuring clear, scalable equations.
Fully functional offline, unlike sphinx.ext.mathjax which retrieves MathJax from a CDN.
More accessible to screen readers compared to image-based methods.

Cons:

Requires JavaScript: Users must have JavaScript enabled in their browsers for proper rendering.
Might increase page load times if there's a lot of mathematical content, due to browser processing.
Potentially larger documentation size owing to bundled MathJax scripts.

Decision:
Given our specific requirements, especially concerning our EC2 setup and memory constraints, we opted to use sphinx-mathjax-offline. This decision prioritizes scalable rendering, offline capabilities, and avoids the necessity of a resource-heavy LaTeX installation on our EC2 instance.

Related issues: #87 , #101

Add standardized clinical concepts as node properties

UMLS CUIs are used for diseases, but similar clinical concepts can be annotated to many of the other node types. This will greatly enhance future work integrating ComptoxAI with observational and/or clinical data.

Some candidate concepts that could be added across node types:

UMLS CUI
SNOMED / SNOMED-CT codes
OMOP concepts

It may be most effective to use more than one of the above, where applicable.

Add route for direct Cypher query in REST API

Currently, users can either query the Neo4j database directly at http://neo4j.comptox.ai:7474, or they can use pre-defined routes within the REST API at http://comptox.ai/api, but we need to add support for raw Cypher queries directly within the ComptoxAI REST API for improved usability.

Migrate ComptoxAI website and REST API express app to AWS

Currently, all web services (static website, REST API, graph database, chemical structure database) are hosted on the same physical server. We should migrate the website itself, along with the REST API app, to an Amazon AWS EC2 instance, leaving only the databases on the current server.

The (work-in-progress) action to deploy the website on new releases should be written to automatically deploy to AWS once the instance is up and running. Following successful completion of this, we will migrate the www.comptox.ai domain to the new EC2 instance.

Fix Issues with PathSearch Function in React App

We've identified an issue with the PathSearch function in our React app—it's not delivering the expected results. You can observe the specific error on our page: https://comptox.ai/data.html.

Steps to Address the Issue:

Determine the Root Cause: We need to ascertain if the glitch is due to rendering issues or if there's a fundamental flaw within the PathSearch function itself.
Resolve & Test: Once we've pinpointed and rectified the problem, we should be able to utilize the PathSearch function effectively again.

Broader Context & Future Consideration:
The PathSearch function is intrinsically tied to the react-d3-graph package. Given that we're considering replacing react-d3-graph—primarily because it's the root of our package conflicts and it's no longer maintained—it's imperative we have a fully functional PathSearch.

Just a note on the package concerns: Using react-d3-graph currently forces us into using npm install --legacy-peer-deps with React 17. This is not a sustainable practice, as it might lead to problems when updating other packages.

romanolab / comptox_ai Goto Github PK

comptox_ai's People

Contributors

Stargazers

Watchers

Forkers

comptox_ai's Issues

sphinx-mathjax-offline:

Direct MathJax Integration (via npm):

Decision:

sphinx.ext.imgmath:

sphinx-mathjax-offline:

Recommend Projects

Recommend Topics

Recommend Org

`sphinx.ext.imgmath`:

`sphinx-mathjax-offline`: