molssi / covid Goto Github PK
View Code? Open in Web Editor NEWMolSSI SARS-CoV-2 Biomolecular Simulation Data and Algorithm Store
Home Page: https://covid.molssi.org
MolSSI SARS-CoV-2 Biomolecular Simulation Data and Algorithm Store
Home Page: https://covid.molssi.org
This is a note to self (Levi) to add the images introduced by #29 to the pages
Some of the Folding@home simulation datasets end up under "no specified targets" when there are defined targets:
Any idea how we fix this?
A review process needs to be established for data.
One process, proposed at the onset of the project was to just merge all data as it came in, and give it a color coded system to indicate review. It has been pointed out that this will likely lead to chaos and be hard to maintain.
Another process suggested was to have data be reviewed before it ever gets merged and I think this is what should be done. The pipeline would be like this:
cc @Andrew-AbiMansour @sjayellis @jchodera
It would be extremely useful to allow a view of all data deposited by a specific organization
(e.g. Folding@home
), so that we can present views of all data deposited from a single organization at a time.
@apayne97, @henriberger and I have been talking about solutions to incorporate information from the Thorne Lab in a more automated way. We have come with this "ideal" pipeline:
Tier 1) Create a script that can diff their PDB IDs with our PDB IDs. Report the set difference for a human to review which new ones are worth adding.
Tier 2) Create a GitHub Actions pipeline that does this automatically either with an hourly cronjob or, if technically possible, after every push to the Thorne Lab repo
Tier 3) Add bot features to GHA to submit the PRs needed for each new candidate PDB ID. A human reviews it, editing the information as needed, and merges or rejects it. The closed PRs serve as a history on what we have tried so we don't resubmit twice.
Let us know if you have feedback!
As it says on the label...
validation.py should populate classes such as "ValidProteins" dynamically from the data/proteins directory files, so that users can simply add the yml files instead of adding directly to the python script.
It would be useful to distinguish between unsolvated models in the "Models" section that is associated with each Structure, as well as on the Models page. For example:
The explicitly solvated snapshot may be useful for MD simulations, but is more likely important for provenance tracking purposes with the associated simulation dataset that contains trajectories since very few modeling applications can use the solvated snapshot.
By contrast, the unsolvated protein models---into which missing loops have been built, structural modeling errors have been corrected, misperceived structural ions have been corrected, etc. can be used in essentially all modeling workflows.
I would suggest we move the solvated snapshots to be associated with their relevant simulation trajectories into "Simulations", and reserve "Models" for pre-solvated models that have corrected issues with the original structural data.
How will this hub take in potential data analysis contributions? We have taken the DESRES 3CLpro trajectories and extracted a small set (34, but adjustable) diverse conformations of the catalytic domain from the 100,000 snapshots - something we think could be useful to those planning docking studies. "Analyses" are not a current data type, so I guess a new schema is needed. In our case the input data (in addition to the DESRES trajectories) is a Jupyter notebook and one PDB format file, the output is the 34 selected structures, again as PDB files. As a starter:
type: one of [Jupyter notebook, bash script....]
title: (required)
description: (required)
creator: (required)
organization: (optional)
lab: (optional)
institute: (optional)
models: (optional) must point to model in models
dir
- modelname_1
- ...
proteins: (required) Must be a valid protein (see proteins
dir)
- protein 1
- ...
structures: (optional) must point to structure which could be in structure
dir
- structure 1
- ...
simulations: (optional) must point to simulation which could be in simulation
dir
- simulation 1
- ...
rating: (optional) int on domain [1,5], 5 is better
files: (required) URLs to input and supporting files.
- file 1
- ...
references: (optional) List of referfences associated with the programs and methods you want to mention. For publications tied to this exact analysis, use the publicaton
and preprint
categories
- ref1
- ref2
publication: (optional) URL of the publication which includes THIS analysis
preprint: (optional) URL of the preprint for the publication. Can also be used to note if submitted to a peer reviewed journal by the exact word "Submitted"
With respect to #43, data need to be re-assessed regularly to see if adjustment need to be made in the quality department. A process should be established for what that is.
The current data, after review, can be given a star rating. This might not be the best system.
What we need to do:
rating
tag and then any system can be built around that)cc @apayne97 @Binikarki @jchodera @sjayellis @Andrew-AbiMansour @egoldber
It seems like it might be helpful to change "About > Collaborating" to "About > Contributing" to lower the perceptual barrier for others to get involved:
Here, it looks like we have to go through a significant process to decide whether someone is allowed to be a "collaborator" for a monolithic site that is intended to be a community hub.
Instead, it may make sense to list contributors so far (which can be automatically pulled from YAML files, which we could add contributor:
fields to) and describe several ways in which folks can get involved by
Not a bug, I just didn't see a format that looked right
There are a WHOLE bunch of 3CLpro (Mpro, Main Protease, nsp5) structures. And potentially a WHOLE BUNCH of molecules that will target it. I think it's worth thinking about the best way to curate and share this data.
My current idea would be to just:
This could be expanded to PLpro (nsp3) and RdRP (nsp12) in a similar fashion.
Please briefly describe your suggestion.
It would be good to have a field about variants so they are easily searchable rather than just being plain text.
Please provide the schema for the new/refined data class of interest.
List below all the keywords/values you would like to modify or add.
20E (EU1; D614G+A222V)
Alpha (B.1.1.7)
Beta (B.1.351)
Delta (B.1.617.2)
Epsilon (B.1.427 and B.1.429)
Omicron (B.1.1.529)
Additional context
Modelling variants and their mutations is important to understand what they are doing and this area will likely continue to grow. I have recently worked on this in both my previous postdoc in the Bahar lab (https://dx.doi.org/10.2139/ssrn.3907841) and my current Marie Curie fellowship in the Carazo/Sorzano lab (https://www.biorxiv.org/content/10.1101/2021.12.05.471263v2).
The logic which determines if a model is part of a target or not appears to be broken and everything is categorized as "No target"
The Swissmodel SARS-CoV-2 models may be useful to link:
https://swissmodel.expasy.org/repository/species/2697049
We could link both "all" models (which are useful to browse) as well as specific high-value models for specific targets.
We should be sure to update the Feig lab models:
https://twitter.com/MeikelFeig/status/1254876370896855041
We need to remove the stoplights from structural data. They are entirely misleading as to the quality of the structures and their utility for different purposes.
Publication status has no impact on structure quality. If we want to communicate publication status as presence of preprint or published version, we should simply come up with an icon that is displayed for preprint and published that shows up if these are available and absent if they are not.
This is going to cause active harm to the community if we keep these.
The appropriate annotation data should instead be pulled from the Coronavirus Structural Task Force, but we shouldn't wait for the implementation of that to strip out the stoplight nonsense.
"SCoV2-MD (www.scov2-md.org) is a new online resource that systematically organizes atomistic simulations of the SARS-CoV-2 proteome." https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkab977/6425545
This came up about a month ago but no one has made an issue yet. Alternative database with a similar goal to this one that doesn't seem to know we exist. It's also missing a good number of the useful datasets.
Describe the bug
The "New data entries" and "Tracking issue" issue tracker buttons take you nowhere useful.
To Reproduce
Click on the "New data entries" and "Tracking issue" issue tracker buttons.
Expected behavior
These buttons should take you to where they claim to.
Screenshots
If applicable, add screenshots to help explain your problem.
Describe the bug
As it says on the tin, that file has two top level keys for "protein." I don't know which set is correct.
Several times now I have been looking for something I know is on the website but just can't seem to find it - for instance, I just wanted to download a video showing a DESRES trajectory but just kept going in circles clicking on links that led nowhere.
One concrete suggestion is to have a constantly visible legend for the different labels telling me where I will go when I click on it (i.e. a "proteins" page, an external link, etc)
Additionally, if there could be some way to orient the user in "space" in the website, i.e. by highlighting the panel you're in and showing how far down the page you are (maybe not possible with a static site? idk)
I believe the citation for RIKEN trajectories like https://covid.molssi.org/simulations/#riken-cpr-tms-tmd1_toup-trajectory should be:
Takaharu Mori, Jaewoon Jung, Chigusa Kobayashi, Hisham M Dokainish, Suyong Re, Yuji Sugita (2021):
Elucidation of interactions regulating conformational stability and dynamics of SARS-CoV-2 S-protein.
Biophysical Journal 120(6)
https://doi.org/10.1016/j.bpj.2021.01.012
Note that there are several deposits but I have not checked all of them.
It would be great to add the I-TASSER models for various viral targets:
https://zhanglab.ccmb.med.umich.edu/COVID-19/
The beautiful image on the top of the page might be useful as well---perhaps we could get permission to use it and have it link to the appropriate proteins?
This website appears to index some other investigational compounds of interest: https://covdb.stanford.edu/
Might be worth a link!
Describe the bug
The description
field in simulations
do not render Markdown correctly.
We'll need this for our incoming Folding@home data sharing PRs.
Fortunately, this seems easy to fix---will create a PR momentarily.
To Reproduce
Example: https://covid.molssi.org//simulations/#sars-cov-2-spike-s-glycoprotein
Expected behavior
Markdown should render correctly to allow inclusion of links to simulation data sources and inline shell examples of how to download the data
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.