cancervariants / gene-normalization Goto Github PK
View Code? Open in Web Editor NEWServices and guidelines for normalizing genes
Home Page: https://gene-normalizer.readthedocs.io/latest/
License: MIT License
Services and guidelines for normalizing genes
Home Page: https://gene-normalizer.readthedocs.io/latest/
License: MIT License
Creating concept groups is slow and creating concept groups in production environment is even slower. We should look into speeding this up.
The normalize
endpoint should generate a single, merged concept for search terms.
For elastic beanstalk
We currently only use the non alternative loci set. We should also include the alternative loci set from the download page.
Not just the strongest match per source
NCBI uses 3 different files (history, info, gff). History and info data are updated daily, but gff data is versioned by assembly. We currently use the timestamp at which we retrieve the data (we should also fix this so that it's the timestamp from the ftp site). I think we should consider storing metadata for each file. Also, the current source meta does not indicate the files used and instead points to the ftp site.
The merged concept for hgnc:37133
has alternate_labels
: "A1BGAS", "FLJ23569", "NCRNA00181", "A1BG-AS"
. Querying these alternate_labels
returns different match_type
scores, when they theoretically should return the same score.
gene.vrs_locations
EBSampleApp-Python.iml
?Some models/fields have been renamed or deprecated
Our EB currently uses python 3.7. We should upgrade to 3.8.
NCBI has retired gene identifiers in the past, e.g.:
warnings
attribute for each such entry, akin to: ncbigene:103344718 is a discontinued gene concept.
When specifying locations, we should use VRS Location objects.
ChromosomeLocation for the ISCN-style entries in the HGNC "location" field
SequenceLocation for the Chr/Start/Stop entries from ensembl.
This should reduce the following attributes:
seqid
start
stop
strand
location
down to:
location
: (VRS Location)
strand
: enum( '+', '-', Null)
@jarbesfeld 's GH Actions in py-gene-fusions are failing due to our schema classes
Forgot to update schema examples to reflect VRS/VRSATILE updates
@jarbesfeld will be using these models in py-gene-fusions
Add an option to CLI to use local files rather than downloading from the source's FTP site
A docker container would be useful
Switch to downloading files from FTP sites
This will help with going serverless
Consider creating sample data to test ETL methods. If we don't go this route, we should clean up the current test data
Separate between those representing gene concepts from those representing associated concepts.
We currently have the production database as the default, but we should switch to using the local database.
Apr 7 00:37:15 ip-10-130-14-142 web: File "/var/app/current/gene/main.py", line 114, in normalize
Apr 7 00:37:15 ip-10-130-14-142 web: resp = query_handler.normalize(html.unescape(q))
Apr 7 00:37:15 ip-10-130-14-142 web: File "/var/app/current/gene/query.py", line 483, in normalize
Apr 7 00:37:15 ip-10-130-14-142 web: matching_records.sort(key=self._record_order)
Apr 7 00:37:15 ip-10-130-14-142 web: File "/var/app/current/gene/query.py", line 412, in _record_order
Apr 7 00:37:15 ip-10-130-14-142 web: src = record['src_name'].upper()
Apr 7 00:37:15 ip-10-130-14-142 web: TypeError: 'NoneType' object is not subscriptable
Some helpful posts:
If not all sources are loaded in the database, the queries on the search endpoint will fail.
We had been using vrs-python models for validation. The addition of validators being used in schemas are now causing pydantic validation errors when loading sources
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.