Coder Social home page Coder Social logo

Comments (19)

zktuong avatar zktuong commented on July 18, 2024 2

Hi @grst,

i've just release sc-dandelion==0.2.3 (it's actually 0.2.2 but i thought my upload went wrong)
Need to wait for pypi/warehouse#11696 to be fixed before making any changes here though... =(

from scirpy.

sbenjamaporn avatar sbenjamaporn commented on July 18, 2024 1

@grst,
The productive column shows "T + T" . I preprocessed my scBCR sequences from Dandelion and then use "ddl.to_scirpy". After that, I define clone by Scirpy. Now, I would like to convert it back in order to update germline to study mutational analysis in Dandelion.

from scirpy.

zktuong avatar zktuong commented on July 18, 2024 1

ok. the 'issue' is with line 273:

def to_airr_records(self) -> Iterable[dict]:
"""Iterate over chains as AIRR-Rearrangent compliant dictonaries.
Each dictionary will also include the cell-level information.
Yields
------
Dictionary representing one row of a AIRR rearrangement table
"""
for tmp_chain in self.chains:
chain = AirrCell.empty_chain_dict()
# add the actual data
chain.update(tmp_chain)
# add cell-level attributes
chain.update(self)

because dandelion's productive column in the metadata will update the productive key that scirpy was making from to_airr_cells because it appears later

scirpy/scirpy/io/_io.py

Lines 726 to 739 in 2c5b99e

airr_cells = to_airr_cells(adata)
contig_dicts = {}
for tmp_cell in airr_cells:
for i, chain in enumerate(tmp_cell.to_airr_records(), start=1):
# dandelion-specific modifications
chain.update(
{
"sequence_id": f"{tmp_cell.cell_id}_contig_{i}",
}
)
contig_dicts[chain["sequence_id"]] = chain
data = pd.DataFrame.from_dict(contig_dicts, orient="index")

Can confirm that if just change the name away from productive on dandelion's side, it resolves this.

@sbenjamaporn if you just rename the current productive column to productive_status:

adata.obs.rename(columns={'productive':'productive_status'}, inplace=True)

you should be able to do the transfer.

I will action this on dandelion's side to rename productive to productive_status.

from scirpy.

grst avatar grst commented on July 18, 2024

Hi @sbenjamaporn,

thanks for reporting this issue.
I'm not yet sure what could be the problem. Could you please report the following:

  • the entire stack trace of the error message (not just the last part as above)
  • the result of ABC_irdata_exclude_orphan.obs.columns

Thanks,
Gregor

from scirpy.

sbenjamaporn avatar sbenjamaporn commented on July 18, 2024

Thanks for your prompt response!

This is my results attached as a pdf file.

Scirpy_to_Dandelion_error.pdf

from scirpy.

grst avatar grst commented on July 18, 2024

@sbenjamaporn, thanks for the stacktrace! Regarding my second request, you checked adata.columns instead of adata.obs.columns. Could you please send me the result of the latter?

@zktuong, according to the stacktrace, the error occurs within Dandelion. It could theoretically be that there's a problem with scirpy's output, but I have absolutely no idea where the T + T should come from. Do you have an idea what could be the problem?

from scirpy.

sbenjamaporn avatar sbenjamaporn commented on July 18, 2024

Sorry for missunderstanding.
This is the result of adata.obs.columns
6BCCCB0A-F5FB-4E1E-8940-F547F0423630

from scirpy.

zktuong avatar zktuong commented on July 18, 2024

hi @grst @sbenjamaporn

I think it's got to do with a malformed entries in productive columns in the adata.obs that airrCell is getting.

So @sbenjamaporn, can you check what's the unique values for:
scirpy's columns:
['IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_productive', 'IR_VJ_2_productive']
There shouldn't be any T + T in these

I think the following are dandelion's columns:
['productive_VDJ', 'productive_VJ', 'productive', 'productive_summary']
There would be T + T in these. But these columns would be ignored by scirpy when converting to an airr table?

I don't think dandelion use any of these columns when converting (it just refreshes based on the airr table that scirpy produces)

I wonder if the round trip of scirpy <-> dandelion somehow got the columns confused?

from scirpy.

sbenjamaporn avatar sbenjamaporn commented on July 18, 2024

Dear @zktuong @grst

Yes, ['IR_VDJ_1_productive', 'IR_VDJ_2_productive', 'IR_VJ_1_productive', 'IR_VJ_2_productive'] show only "TRUE or None" output. I agree with you about the convertion between them make it confuse. So, we could not convert scirpy result to dandelion, right?

from scirpy.

grst avatar grst commented on July 18, 2024

What's also weird is that in addition to the IR_V(D)J_1/2_productive you also have a productive column. What does this one contain?

"productive" should be a chain-level attribute (i.e only available as IR_V(D)J_1/2_productive).

Could you please describe

  • how you initially load the data into scirpy (ir.io.read_10x_vdj, ir.io.read_airr, etc)?
  • if you do any additional conversions between dandelion and scirpy before this error occurs?

from scirpy.

zktuong avatar zktuong commented on July 18, 2024

Hmm it looks like those additional productive columns are from dandelion.

Can you try and remove every column from “clone_id_by_size” onwards and see if there’s still the issue of conversion?

from scirpy.

sbenjamaporn avatar sbenjamaporn commented on July 18, 2024

Dear @grst,

Is there any ways that Scirpy could give full information as an AIRR standard ? ( I try ir.io.write_airr, but it did not create full information)

from scirpy.

sbenjamaporn avatar sbenjamaporn commented on July 18, 2024

@zktuong Thanks for suggestion, I will try!

from scirpy.

grst avatar grst commented on July 18, 2024

Hmm it looks like those additional productive columns are from dandelion.

It seems the actual problem is in the conversion from dandelion to scirpy. @zktuong ddl.to_scirpy is just calling scirpy code?

Is there any ways that Scirpy could give full information as an AIRR standard ? ( I try ir.io.write_airr, but it did not create full information)

Happy to discuss this. Could you please open a separate issue and describe what's missing?

from scirpy.

zktuong avatar zktuong commented on July 18, 2024

It seems the actual problem is in the conversion from dandelion to scirpy. @zktuong ddl.to_scirpy is just calling scirpy code?

That's right. it's just a wrapper to call ir.io.from_dandelion.

A small update on this - the issue seems to lie in:

# works ok
irdata = ddl.to_scirpy(vdj) # or ir.io.from_dandelion(vdj)
vdj2 = ir.io.to_dandelion(irdata)

# same issue with ValidationError: field productive has invalid bool T + T appears
irdata = ddl.to_scirpy(vdj, transfer = True)  # or ir.io.from_dandelion(vdj, transfer = True)
vdj2 = ir.io.to_dandelion(irdata)

from scirpy.

sbenjamaporn avatar sbenjamaporn commented on July 18, 2024

@zktuong Thanks, It works now!.

I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?

OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.

from scirpy.

sbenjamaporn avatar sbenjamaporn commented on July 18, 2024

@grst
Of course!

from scirpy.

grst avatar grst commented on July 18, 2024

Thanks a lot @zktuong! LMK once you have a release including the fix, then I'll pin the latest version of dandelion.

from scirpy.

zktuong avatar zktuong commented on July 18, 2024

@zktuong Thanks, It works now!.

I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?

OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.

Let's follow up over on dandelion's side:
zktuong/dandelion#153

from scirpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.