Coder Social home page Coder Social logo

Comments (5)

grst avatar grst commented on July 1, 2024

Thanks for the test set, there is definitely something wrong.

non-productive chains should always end up in the extra_chains column. My first guess would be that there's something wrong how "T"/"F" are interpreted.

After all

> bool("F")
True

from scirpy.

zktuong avatar zktuong commented on July 1, 2024

hmm but the airr schema says that boolean columns should be encoded as str T/F according to https://github.com/airr-community/airr-standards/blob/master/docs/datarep/rearrangements.rst

Boolean values
Boolean values must be encoded as T for true and F for false.

from scirpy.

grst avatar grst commented on July 1, 2024

I think I found it:

scirpy/scirpy/io/_io.py

Lines 482 to 485 in e954d17

if isinstance(tmp_path, pd.DataFrame):
iterator = tmp_path.to_dict(orient="records")
else:
iterator = airr.read_rearrangement(str(tmp_path))

When read_airr gets a .tsv file, it uses airr.read_rearrangement, which correctly converts the types. However, when it gets a data.frame it doesn't do any type conversion or validation.

Probably it would be best to rely on airr to validate data frames as well. Unfortunately, the RearrangementReader isn't designed in a way to easily cope with anything but a tsv file.

So either I'll need a hacky workaround or dump the dataframe to a temporary tsv file which isn't particularly efficient.

from scirpy.

grst avatar grst commented on July 1, 2024

Well not that hacky after all:

scirpy/scirpy/io/_util.py

Lines 78 to 102 in 9e23474

def _read_airr_rearrangement_df(df: pd.DataFrame, validate=False, debug=False):
"""Like airr.read_rearrangement, but from a data frame instead of a tsv file.
Provides RearrangementReader with an alternative iterator to its csv.DictReader
"""
class PdDictReader(csv.DictReader):
def __init__(self, df, *args, **kwargs):
super().__init__(os.devnull)
self.df = df
self.reader = iter(df.to_dict(orient="records"))
@property
def fieldnames(self):
return self.df.columns.tolist()
def __next__(self):
return next(self.reader)
class PdRearrangementReader(RearrangementReader):
def __init__(self, df, *args, **kwargs):
super().__init__(os.devnull, *args, **kwargs)
self.dict_reader = PdDictReader(df)
return PdRearrangementReader(df, validate=validate, debug=debug)

Can you try if this behaves as expected?
#349

from scirpy.

zktuong avatar zktuong commented on July 1, 2024

yes amazing! the fix works!

from scirpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.