Coder Social home page Coder Social logo

hubmapconsortium / asctb-ct-label-mapper Goto Github PK

View Code? Open in Web Editor NEW
2.0 4.0 1.0 1.44 MB

asctb-ct-label-mapper: A package to recommend controlled vocabulary for annotations of scRNA-seq datasets. and thereby enable cross-dataset or cross-experiment comparison of annotations.

License: MIT License

Python 99.81% Shell 0.19%
bert-model cosine-similarity data-engineering embeddings-similarity natural-language-processing python single-cell-rna-seq web-scraping human-reference-atlas

asctb-ct-label-mapper's Issues

Update documentation and set up docs-pages

Add general documentation for package.

Update code-docstring for execute_nlp_pipeline() and other functions in the NLP script.

"""Returns the cleaned version of the annotation label after performing the following steps:

```python
remove_whitespaces()
expand_word_contractions()
replace_special_chars()
convert_number_to_word()
make_lowercase()
get_root_word()
```

Args:
    input_label (str): Input annotation label text.

Returns:
    str: Cleaned version of the annotation label text.
"""

Include Google-sheet "gid" for ASCT+B API call

Improve get_asctb_data_url() to also pull out gid from the Sheet-Config data on line 59, to make code more modular.
Update fetch_ct_info_from_asctb_google_sheet() line 88 to also include '&gid=0129321849sdkj00329'.

Enhancing and operationalizing crosswalks for multiple reference datasets

Work completed up till now:

  1. Azimuth Kidney --> ASCTB Kidney v1.2:

Translations verified by Sanjay Jain and Ellen Q.

  1. Azimuth Lung HLCAv2 --> ASCTB Lung v1.2

Translations verified by Gloria Pryhuber

  1. Azimuth Heart --> ASCTB Heart v1.2

Translations verified by Marc Halushka

Next-steps:

a. Confirm with Katy and Ellen which crosswalks to focus on. Brief discussion was about Azimuth's other reference organs, CellTypist organs, and PopV/Tabula Sapiens organs.
b. Confirm if we need all organ-datasets from CellTypist and PopV/Tabula Sapiens mapped to ASCTB using this package?
c. Souradeep to operationalize this package into a data-pipeline with potential for CICD.
d. Future feature request - Add logic to also consider gene-expression profiles (biomarkers from query-dataset) mapped to ASCTB canonical markers, in order to make a more reliable cross-dataset translation mapping.

Picture1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.