Comments (17)
In GitLab by @szabogtamas on Jan 30, 2020, 15:27
changed the description
from scirpy.
In GitLab by @grst on Jan 30, 2020, 15:52
st.tl.create_group(group_membership={Group1: ['barcode1', barcode2']}
A common pattern how this is handled with scanpy/anndata is to add another column to obs like this:
adata.obs["cell_type"] = "na" # initialize default value
adata.obs.loc[['barcode1', 'barcode2'], "group1"] = "CD8+ T cells"
So I don't think we need a tool
for that.
from scirpy.
In GitLab by @szabogtamas on Jan 31, 2020, 10:03
The reason I was thinking about a separate tool was that diversity (and also convergence) could be calculated at the same time a group is created.
The information what columns in the obs
are groups would be usefull in a scenario, when we want to see an information for all possible groupings (e.g. CDR3 length by sample, V genes, receptor pairing status, cell types, etc.) it would be convenient to just loop through the grouping columns and plot the same.
from scirpy.
In GitLab by @grst on Jan 31, 2020, 10:11
changed the description
from scirpy.
In GitLab by @grst on Jan 31, 2020, 10:18
I'm not a big fan of implicitly computing stuff that might not even be needed by the user.
Also, in such a case you can simply do
for group in ["TRA_1_cdr3_len", "TRB_1_cdr3_len", "TRA_1_v_gene", ...]:
st.pl.cdr3_length(adata, group=group)
Or, we might want to support an API that supports multiple groups, as scanpy
does for the color
attribute in umap
st.pl.cdr3_length(adata, group=["TRA_1_cdr3_len", "TRB_1_cdr3_len", "TRA_1_v_gene", ...])
But we can discuss that in more detail the next time we meet
from scirpy.
In GitLab by @szabogtamas on Jan 31, 2020, 10:34
It definitively makes sense to have a tool function that precomputes data and a plotting function to visualize. My only concern was that in our case, much of that information is never reused by anything other the actual plot. What I would see a great saving here, however, is to write the computed values into a table (or json) so that the plot can be generated (and the more important point here: modified to match a given visual style, or remove groups, add p-values, etc.) by just parsing a small file and not having to load the whole big table of obs
... Just wondering.
So let us stick to the canonical way and make tool functions, even if it seems duplicate. Since they will mostly compute group-level statistics, I would suggest adding the result to uns
Should this be a nested dictionary in uns
, like {sample_grouping: {name: 'Samples', groups: {g1: 'Sample 1', g2: 'Sample 2'}, diversities: {div_by_clonotypes: {name: 'Repertoire diversity', values: {g1: 2.3, g2: 2.7}}}, convergences: {conv_by_clonotypes: {name: 'Repertoire convergence', cdrlengths: {cdrlength: {name: 'Length distribution of CDR regions', values: {g1: [6, 9, 11], g2: [6, 9, 6]}}}, pairingratios: {cpr_by_clonotypes: {name: 'Ratio of unconventional number of chains', labels: {orhpan_alpha: 'Alpha chain only', orhpan_beta: 'Beta chain only'}, values: {orhpan_alpha: 0.3, orhpan_beta: 0.7}}}}
Or should we go for objects stored in the uns
?
from scirpy.
In GitLab by @grst on Jan 31, 2020, 10:40
Or should we go for objects stored in the
uns
?
This is not a good idea, since it cannot be stored by AnnData (yet). See the discussion at scverse/anndata#115.
We should stick to std python dictionaries here.
Should this be a nested dictionary in
uns
That's probably the way to go.
from scirpy.
In GitLab by @szabogtamas on Jan 31, 2020, 10:46
Well, I am not against dictionaries, and it usually comes to this for me in python: objects are nice, but let's just stay with a dictionary...
We only have to agree on a structure for the dictionary then. But I guess this will be flexible in the beginning and evolve as more tool functions are implemented.
from scirpy.
In GitLab by @grst on Jan 31, 2020, 10:50
I would go for one entry for each tool. What's done within this entry is flexible and can be decided on a tool-by-tool basis.
Also, because scanpy
also uses the uns
I would prefix every entry with tcr_
to avoid name conflicts.
Example
adata.uns["tcr_alpha_diversity"] = { ... }
adata.uns["tcr_sequence_logos"] = { ... }
Alternatively, we could go for another sub-dictionary:
adata.uns["sctcrpy"]["alpha_diversity"] = { ... }
adata.uns["sctcrpy"]["sequence_logos"] = { ... }
from scirpy.
In GitLab by @szabogtamas on Jan 31, 2020, 10:54
Yes, we can discuss this later, we don't need to deal with this right now. It is probably also the matter of what level of users we want to support - this idea was something towards automatically generating an exploratory report that can be refined by the user but points out right away some issues that are worth investigating. At this point, we should just leave it.
from scirpy.
In GitLab by @szabogtamas on Jan 31, 2020, 10:57
I would prefer the subdirectory just because it saves us the prefix. But this is not crucial.
from scirpy.
In GitLab by @grst on Feb 2, 2020, 19:48
marked the task st.tl.alpha_diversity(adata, groupby, diversityforgroup)
Now we were only thinking about calculating diversity of clonotypes in different groups. But the diversity of any group could just as well be calculated. as completed
from scirpy.
In GitLab by @grst on Feb 4, 2020, 14:14
changed the description
from scirpy.
In GitLab by @szabogtamas on Feb 12, 2020, 13:47
marked the task st.tl.tcr_dist(adata, chains=["TRA_1, "TRB_1"], combination=np.min)
adds TCR dist to obsm
(#11) as completed
from scirpy.
In GitLab by @szabogtamas on Feb 12, 2020, 13:47
marked the task st.tl.kidera_dist
adds Kidera distances to obsm
as completed
from scirpy.
In GitLab by @szabogtamas on Feb 12, 2020, 13:47
marked the task st.tl.chain_convergence(adata, groupby)
adds column to obs
that contains the number of nucleotide versions for each CDR3 AA sequence as completed
from scirpy.
In GitLab by @szabogtamas on Feb 12, 2020, 13:47
closed
from scirpy.
Related Issues (20)
- ir.tl.clonotype_network() The function runs with "Aborted (core dumped)" error? HOT 2
- Speed up read_airr
- speed up define_clonotypes
- Autoencoder-based sequence embedding HOT 10
- Scalability to >1M cells
- antigen specificity prediction
- switch to cookiecutter-scverse template
- Anything to do about plt.subplot HOT 4
- clean up usage of _is_na, _is_true and _is_false
- Allow only high confidence filtering when reading Cell Ranger output HOT 6
- MuData API considerations HOT 2
- Split IO into separate package HOT 2
- Speed up index_chains HOT 2
- Speed up IO HOT 1
- Refactor _get_color function to stop using private scanpy functions and to work with mudata
- Add tests for reference databases
- Compatibility with pandas 2.0 HOT 1
- Hackathon: add `generic_ir_from_biocypher()` function to ingest TCR data
- incomplete `airr` format HOT 2
- `AirrCell` to retrieve chain field attribute values directly like a dictionary of nested values HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scirpy.