theislab / bindome Goto Github PK
View Code? Open in Web Editor NEWAssembling of biomolecule binding data (TF/RNA) from genomics databases for ML-downstream.
License: MIT License
Assembling of biomolecule binding data (TF/RNA) from genomics databases for ML-downstream.
License: MIT License
scanpy is missing in the dependencies (https://github.com/theislab/bindome/blob/main/environment.yml)
from os.path import exists, join is missing in https://github.com/theislab/bindome/blob/main/bindome/datasets/chipatlas.py
The branch chipatlas_seq_download now contains a more generic downloader to retrieve files through wget. I am using this file to sequentially download dozens of BigWig and bed.gz files into the webserver. If useful please take a look. @ivirshup @mumichae Chatting later about ways to refine this with code.
https://github.com/theislab/bindome/blob/chipatlas_seq_download/bindome/datasets/chipatlas.py#L151
Components of a data loader are most likely:
See also here https://sfaira.readthedocs.io/en/latest/adding_datasets.html
For modalities that cou can and want to load into h5ad, i would even recommend adopting the collection classes of sfaira, you coul just import them.
@mhorlacher Following up on the previous discussion, adding RNA datasets is expected to increase usability and exploratory insights based on downstream analyses, of this repository and connection with others for modeling.
Some examples of RNA datasets by priority IMO are.
This one is potentially great, but raw data does not seem to be available, and one should go per study:
Please feel free to list additional ones. The idea is to get 3-6 into functions and h5ad files, following general conventions (sequence data + counts available). Examples here.
https://github.com/theislab/bindome/blob/main/bindome/datasets/selex.py#L142
https://github.com/theislab/bindome/blob/main/bindome/datasets/probound.py#L16
Looking forward to keeping the discussion on this. Thanks!
While loading bigWig files path from ChIP-atlas using bindome.datasets.chipatlas
, a normalization constant has to be applied to loaded bigWig files before generating the final object, based on the formula described here formula.
inutano/chip-atlas#84
def _convert_bw_to_raw_counts
get_bigWig
, that similar to bindome.datasets.chipatlas
downloads (if not local) and retrieves the bigWig file that is requested. A parm adjust
would convert the counts from floating point to integer, based on the formula indicated in the chipatlas issues adjust_counts=True
.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.