Deion of the bug I try to convert the AnnData after clonal a

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

ok. the 'issue' is with line 273: <div

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Cannot convert output from Scirpy to dandelion,about scverse/scirpy

zktuong commented on July 18, 2024 2

Hi @grst,

i've just release sc-dandelion==0.2.3 (it's actually 0.2.2 but i thought my upload went wrong)
Need to wait for pypi/warehouse#11696 to be fixed before making any changes here though... =(

from scirpy.

sbenjamaporn commented on July 18, 2024 1

@grst,
The productive column shows "T + T" . I preprocessed my scBCR sequences from Dandelion and then use "ddl.to_scirpy". After that, I define clone by Scirpy. Now, I would like to convert it back in order to update germline to study mutational analysis in Dandelion.

from scirpy.

zktuong commented on July 18, 2024 1

ok. the 'issue' is with line 273:

scirpy/scirpy/io/_datastructures.py

Lines 260 to 273 in 2c5b99e

    
               def to_airr_records(self) -> Iterable[dict]: 
        
                   """Iterate over chains as AIRR-Rearrangent compliant dictonaries. 
        
                   Each dictionary will also include the cell-level information. 
        
                   Yields 
        
                   ------ 
        
                   Dictionary representing one row of a AIRR rearrangement table 
        
                   """ 
        
                   for tmp_chain in self.chains: 
        
                       chain = AirrCell.empty_chain_dict() 
        
                       # add the actual data 
        
                       chain.update(tmp_chain) 
        
                       # add cell-level attributes 
        
                       chain.update(self)

because dandelion's productive column in the metadata will update the productive key that scirpy was making from to_airr_cells because it appears later

scirpy/scirpy/io/_io.py

Lines 726 to 739 in 2c5b99e

    
           airr_cells = to_airr_cells(adata) 
        
           contig_dicts = {} 
        
           for tmp_cell in airr_cells: 
        
               for i, chain in enumerate(tmp_cell.to_airr_records(), start=1): 
        
                   # dandelion-specific modifications 
        
                   chain.update( 
        
                       { 
        
                           "sequence_id": f"{tmp_cell.cell_id}_contig_{i}", 
        
                       } 
        
                   ) 
        
                   contig_dicts[chain["sequence_id"]] = chain 
        
           data = pd.DataFrame.from_dict(contig_dicts, orient="index")

Can confirm that if just change the name away from productive on dandelion's side, it resolves this.

@sbenjamaporn if you just rename the current productive column to productive_status:

adata.obs.rename(columns={'productive':'productive_status'}, inplace=True)

you should be able to do the transfer.

I will action this on dandelion's side to rename productive to productive_status.

from scirpy.

grst commented on July 18, 2024

Hi @sbenjamaporn,

thanks for reporting this issue.
I'm not yet sure what could be the problem. Could you please report the following:

the entire stack trace of the error message (not just the last part as above)
the result of ABC_irdata_exclude_orphan.obs.columns

Thanks,
Gregor

from scirpy.

sbenjamaporn commented on July 18, 2024

Thanks for your prompt response!

This is my results attached as a pdf file.

Scirpy_to_Dandelion_error.pdf

from scirpy.

grst commented on July 18, 2024

@sbenjamaporn, thanks for the stacktrace! Regarding my second request, you checked adata.columns instead of adata.obs.columns. Could you please send me the result of the latter?

@zktuong, according to the stacktrace, the error occurs within Dandelion. It could theoretically be that there's a problem with scirpy's output, but I have absolutely no idea where the T + T should come from. Do you have an idea what could be the problem?

from scirpy.

sbenjamaporn commented on July 18, 2024

Sorry for missunderstanding.
This is the result of adata.obs.columns

from scirpy.

zktuong commented on July 18, 2024

hi @grst @sbenjamaporn

I think it's got to do with a malformed entries in productive columns in the adata.obs that airrCell is getting.

So @sbenjamaporn, can you check what's the unique values for:
scirpy's columns:
['IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_productive', 'IR_VJ_2_productive']
There shouldn't be any T + T in these

I think the following are dandelion's columns:
['productive_VDJ', 'productive_VJ', 'productive', 'productive_summary']
There would be T + T in these. But these columns would be ignored by scirpy when converting to an airr table?

I don't think dandelion use any of these columns when converting (it just refreshes based on the airr table that scirpy produces)

I wonder if the round trip of scirpy <-> dandelion somehow got the columns confused?

from scirpy.

sbenjamaporn commented on July 18, 2024

Dear @zktuong @grst

Yes, ['IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_productive', 'IR_VJ_2_productive'] show only "TRUE or None" output. I agree with you about the convertion between them make it confuse. So, we could not convert scirpy result to dandelion, right?

from scirpy.

grst commented on July 18, 2024

What's also weird is that in addition to the IR_V(D)J_1/2_productive you also have a productive column. What does this one contain?

"productive" should be a chain-level attribute (i.e only available as IR_V(D)J_1/2_productive).

Could you please describe

how you initially load the data into scirpy (ir.io.read_10x_vdj, ir.io.read_airr, etc)?
if you do any additional conversions between dandelion and scirpy before this error occurs?

from scirpy.

zktuong commented on July 18, 2024

Hmm it looks like those additional productive columns are from dandelion.

Can you try and remove every column from “clone_id_by_size” onwards and see if there’s still the issue of conversion?

from scirpy.

sbenjamaporn commented on July 18, 2024

Dear @grst,

Is there any ways that Scirpy could give full information as an AIRR standard ? ( I try ir.io.write_airr, but it did not create full information)

from scirpy.

sbenjamaporn commented on July 18, 2024

@zktuong Thanks for suggestion, I will try!

from scirpy.

grst commented on July 18, 2024

Hmm it looks like those additional productive columns are from dandelion.

It seems the actual problem is in the conversion from dandelion to scirpy. @zktuong ddl.to_scirpy is just calling scirpy code?

Is there any ways that Scirpy could give full information as an AIRR standard ? ( I try ir.io.write_airr, but it did not create full information)

Happy to discuss this. Could you please open a separate issue and describe what's missing?

from scirpy.

zktuong commented on July 18, 2024

It seems the actual problem is in the conversion from dandelion to scirpy. @zktuong ddl.to_scirpy is just calling scirpy code?

That's right. it's just a wrapper to call ir.io.from_dandelion.

A small update on this - the issue seems to lie in:

# works ok
irdata = ddl.to_scirpy(vdj) # or ir.io.from_dandelion(vdj)
vdj2 = ir.io.to_dandelion(irdata)

# same issue with ValidationError: field productive has invalid bool T + T appears
irdata = ddl.to_scirpy(vdj, transfer = True)  # or ir.io.from_dandelion(vdj, transfer = True)
vdj2 = ir.io.to_dandelion(irdata)

from scirpy.

sbenjamaporn commented on July 18, 2024

@zktuong Thanks, It works now!.

I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?

OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.

from scirpy.

sbenjamaporn commented on July 18, 2024

@grst
Of course!

from scirpy.

grst commented on July 18, 2024

Thanks a lot @zktuong! LMK once you have a release including the fix, then I'll pin the latest version of dandelion.

from scirpy.

zktuong commented on July 18, 2024

@zktuong Thanks, It works now!.

I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?

OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.

Let's follow up over on dandelion's side:
zktuong/dandelion#153

from scirpy.

Cannot convert output from Scirpy to dandelion about scirpy HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	def to_airr_records(self) -> Iterable[dict]:
	"""Iterate over chains as AIRR-Rearrangent compliant dictonaries.
	Each dictionary will also include the cell-level information.

	Yields
	------
	Dictionary representing one row of a AIRR rearrangement table
	"""
	for tmp_chain in self.chains:
	chain = AirrCell.empty_chain_dict()
	# add the actual data
	chain.update(tmp_chain)
	# add cell-level attributes
	chain.update(self)

	airr_cells = to_airr_cells(adata)

	contig_dicts = {}
	for tmp_cell in airr_cells:
	for i, chain in enumerate(tmp_cell.to_airr_records(), start=1):
	# dandelion-specific modifications
	chain.update(
	{
	"sequence_id": f"{tmp_cell.cell_id}_contig_{i}",
	}
	)
	contig_dicts[chain["sequence_id"]] = chain

	data = pd.DataFrame.from_dict(contig_dicts, orient="index")