Coder Social home page Coder Social logo

bindome's People

Contributors

ege-erdogan avatar ilibarra avatar ivirshup avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

ivirshup

bindome's Issues

Generic downloader function for ChIP-atlas

Description

The branch chipatlas_seq_download now contains a more generic downloader to retrieve files through wget. I am using this file to sequentially download dozens of BigWig and bed.gz files into the webserver. If useful please take a look. @ivirshup @mumichae Chatting later about ways to refine this with code.
https://github.com/theislab/bindome/blob/chipatlas_seq_download/bindome/datasets/chipatlas.py#L151

Tasks

  • Function to wget download bw/bed from ChIP-atlas, using ID/genome assembly/filetype as input.
  • Reuse this function along with a generic .cache directory strategy, to actively retrieve bw/bed from ChIP-atlas into a local directory.

data loader structure

Components of a data loader are most likely:

  • a yaml file describing meta data. in single-cell resolved data, you can distinguish cell wise and data set wise meta data, we directly write sample wise meta data into the yaml and add the key of the obs column describing cell wise meta data in a corresponding entry in the yaml file. it may be that you can drop cell wise data here for some modalities, otherwise i would recommond adopting almost the exact same yaml, that would also help interoperability with sfaira and cellxgene data
  • a function loading a defined object, for us adata, here you have to decide what that is
  • a class that collects all written loaders, for us this is Dataset and DatasetGroupDIrectoryOriented, https://github.com/theislab/sfaira/blob/release/sfaira/data/dataloaders/loaders/super_group.py

See also here https://sfaira.readthedocs.io/en/latest/adding_datasets.html
For modalities that cou can and want to load into h5ad, i would even recommend adopting the collection classes of sfaira, you coul just import them.

RNA datasets - list and priority.

@mhorlacher Following up on the previous discussion, adding RNA datasets is expected to increase usability and exploratory insights based on downstream analyses, of this repository and connection with others for modeling.

Some examples of RNA datasets by priority IMO are.

This one is potentially great, but raw data does not seem to be available, and one should go per study:

Please feel free to list additional ones. The idea is to get 3-6 into functions and h5ad files, following general conventions (sequence data + counts available). Examples here.
https://github.com/theislab/bindome/blob/main/bindome/datasets/selex.py#L142
https://github.com/theislab/bindome/blob/main/bindome/datasets/probound.py#L16

Looking forward to keeping the discussion on this. Thanks!

Function to convert ChIP-atlas bigWig to raw counts

Description

While loading bigWig files path from ChIP-atlas using bindome.datasets.chipatlas, a normalization constant has to be applied to loaded bigWig files before generating the final object, based on the formula described here formula.
inutano/chip-atlas#84

Tasks

  • Using requests / bs4 / soup snippets in starpy/notebooks/bam, add a function to bindome.datasets.chipatlas that adjusts the counts into integers, using the factors indicated on the website e.g. def _convert_bw_to_raw_counts
  • Implement a method called get_bigWig, that similar to bindome.datasets.chipatlas downloads (if not local) and retrieves the bigWig file that is requested. A parm adjust would convert the counts from floating point to integer, based on the formula indicated in the chipatlas issues adjust_counts=True.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.