Comments (19)
Hi @grst,
i've just release sc-dandelion==0.2.3 (it's actually 0.2.2 but i thought my upload went wrong)
Need to wait for pypi/warehouse#11696 to be fixed before making any changes here though... =(
from scirpy.
@grst,
The productive column shows "T + T" . I preprocessed my scBCR sequences from Dandelion and then use "ddl.to_scirpy". After that, I define clone by Scirpy. Now, I would like to convert it back in order to update germline to study mutational analysis in Dandelion.
from scirpy.
ok. the 'issue' is with line 273:
scirpy/scirpy/io/_datastructures.py
Lines 260 to 273 in 2c5b99e
because dandelion's productive
column in the metadata will update the productive
key that scirpy was making from to_airr_cells
because it appears later
Lines 726 to 739 in 2c5b99e
Can confirm that if just change the name away from productive
on dandelion's side, it resolves this.
@sbenjamaporn if you just rename the current productive
column to productive_status
:
adata.obs.rename(columns={'productive':'productive_status'}, inplace=True)
you should be able to do the transfer.
I will action this on dandelion's side to rename productive
to productive_status
.
from scirpy.
Hi @sbenjamaporn,
thanks for reporting this issue.
I'm not yet sure what could be the problem. Could you please report the following:
- the entire stack trace of the error message (not just the last part as above)
- the result of
ABC_irdata_exclude_orphan.obs.columns
Thanks,
Gregor
from scirpy.
Thanks for your prompt response!
This is my results attached as a pdf file.
from scirpy.
@sbenjamaporn, thanks for the stacktrace! Regarding my second request, you checked adata.columns
instead of adata.obs.columns
. Could you please send me the result of the latter?
@zktuong, according to the stacktrace, the error occurs within Dandelion. It could theoretically be that there's a problem with scirpy's output, but I have absolutely no idea where the T + T
should come from. Do you have an idea what could be the problem?
from scirpy.
Sorry for missunderstanding.
This is the result of adata.obs.columns
from scirpy.
I think it's got to do with a malformed entries in productive
columns in the adata.obs
that airrCell
is getting.
So @sbenjamaporn, can you check what's the unique values for:
scirpy's columns:
['IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_productive', 'IR_VJ_2_productive']
There shouldn't be any T + T
in these
I think the following are dandelion's columns:
['productive_VDJ', 'productive_VJ', 'productive', 'productive_summary']
There would be T + T
in these. But these columns would be ignored by scirpy
when converting to an airr table?
I don't think dandelion
use any of these columns when converting (it just refreshes based on the airr table that scirpy produces)
I wonder if the round trip of scirpy
<-> dandelion
somehow got the columns confused?
from scirpy.
Yes, ['IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_productive', 'IR_VJ_2_productive'] show only "TRUE or None" output. I agree with you about the convertion between them make it confuse. So, we could not convert scirpy result to dandelion, right?
from scirpy.
What's also weird is that in addition to the IR_V(D)J_1/2_productive
you also have a productive
column. What does this one contain?
"productive" should be a chain-level attribute (i.e only available as IR_V(D)J_1/2_productive).
Could you please describe
- how you initially load the data into scirpy (
ir.io.read_10x_vdj
,ir.io.read_airr
, etc)? - if you do any additional conversions between dandelion and scirpy before this error occurs?
from scirpy.
Hmm it looks like those additional productive columns are from dandelion.
Can you try and remove every column from “clone_id_by_size” onwards and see if there’s still the issue of conversion?
from scirpy.
Dear @grst,
Is there any ways that Scirpy could give full information as an AIRR standard ? ( I try ir.io.write_airr, but it did not create full information)
from scirpy.
@zktuong Thanks for suggestion, I will try!
from scirpy.
Hmm it looks like those additional productive columns are from dandelion.
It seems the actual problem is in the conversion from dandelion to scirpy. @zktuong ddl.to_scirpy
is just calling scirpy code?
Is there any ways that Scirpy could give full information as an AIRR standard ? ( I try ir.io.write_airr, but it did not create full information)
Happy to discuss this. Could you please open a separate issue and describe what's missing?
from scirpy.
It seems the actual problem is in the conversion from dandelion to scirpy. @zktuong
ddl.to_scirpy
is just calling scirpy code?
That's right. it's just a wrapper to call ir.io.from_dandelion
.
A small update on this - the issue seems to lie in:
# works ok
irdata = ddl.to_scirpy(vdj) # or ir.io.from_dandelion(vdj)
vdj2 = ir.io.to_dandelion(irdata)
# same issue with ValidationError: field productive has invalid bool T + T appears
irdata = ddl.to_scirpy(vdj, transfer = True) # or ir.io.from_dandelion(vdj, transfer = True)
vdj2 = ir.io.to_dandelion(irdata)
from scirpy.
@zktuong Thanks, It works now!.
I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?
OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.
from scirpy.
@grst
Of course!
from scirpy.
Thanks a lot @zktuong! LMK once you have a release including the fix, then I'll pin the latest version of dandelion.
from scirpy.
@zktuong Thanks, It works now!.
I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?
OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.
Let's follow up over on dandelion's side:
zktuong/dandelion#153
from scirpy.
Related Issues (20)
- tl.define_clonotypes within_group parameter returns ValueError HOT 1
- Integrate TCRdist3 HOT 5
- Retrieving specific portions of the Immune Receptor beyond the junction (or CDR3). HOT 2
- ir_dist alignment stuck HOT 4
- IEDB database cdr3_aa stored as junction_aa HOT 10
- Unclear default value for the Hamming Distance cut-off HOT 1
- Dandelion interoperability
- Where has UMI count for AIR chains gone? HOT 1
- Large dataset tutorial
- Make sure axes of nextwork plots don't have any ticks
- Add the Morisita-Horn index for repertoire overlap similarity scores HOT 1
- Sorting logic in `index_chains()` HOT 3
- Community tutorial page
- ir.tl.ir_query fails with error 'ValueError: max_workers must be greater than 0' HOT 1
- ir.tl.clonotype_modularity - ValueError: Length of values does not match length of index HOT 2
- "read_10x_vdj" not loading data properly HOT 2
- clone definition purely using CDR3 sequence HOT 1
- Optimize TCRdist metric HOT 1
- When running 'ir.tl.define_clonotypes' on MacOS14.4.1, I've got an Error:module 'os' has no attribute 'sched_getaffinity' HOT 2
- TypeError: join() got an unexpected keyword argument 'validate' HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scirpy.