Coder Social home page Coder Social logo

Comments (4)

calebweinreb avatar calebweinreb commented on August 18, 2024 1

The new draft looks great and totally addresses my concern.

from scores.

tennlee avatar tennlee commented on August 18, 2024

@calebweinreb Thanks, this is fair. The path of least resistance for now would be to de-emphasise what is in the JOSS paper. However, I would like feedback on one other intermediate option which I’ll outline here.

I have previously looked into the merits of a basic conversion approach versus a more integrated approach to the xarray/pandas APIs, but I don’t think I documented my findings. I have therefore included some brief notes further below. The very short summary is that using a basic conversion approach mostly works but leaves too many rough edges and results in a poor user experience.

In addition to the options for JOSS publication you identified, something else that’s doable would be:

  • Create a tutorial notebook which sets out how to convert to/from xarray most effectively and avoid issues
  • Provider a helper conversion method which handles the most common issues, so that a great many of the use cases could be handled in a simple fashion while more fulsome support for pandas is being developed, without requiring the user to do their own conversions entirely on their own.

I would be interested in your thoughts on whether the following changes would be sufficient for the paper:
(a) Create a tutorial notebook and helper conversion function as described above
(b) Change “expanding support” to “support for pandas is being introduced”
(c) Leave the title as is

It is totally reasonable if that’s not seen as sufficient - it’s important pandas users don’t have a negative experience when looking at the package, and the paper shouldn’t overstate things. However I’d appreciate your feedback on that option. If that’s not sufficient, I would be inclined to then remove pandas from the title.

The following are some notes about the merits of a basic conversion approach versus a more integrated approach to the xarray/pandas APIs:

xarray provides some functionality for transforming both to and from pandas, which would form the basis of the pandas support. However, a few things prevented this from meeting all the requirements. None of these are ‘showstoppers’, but would all need to be carefully tested to ensure a quality result.

Examples include:
(1) The Pandas series objects which make up the columns in a data frame didn’t always have the “name” set to the header of the column. As a result, sometimes parts of the data structure end up with an unexpected name. This has flow-on effects to accessing, labelling and plotting the data.
(2) The default dimension names for new data structures differ between xarray and pandas. This can similarly result in unexpected names for things.

I also have some uncertainty about handling pandas data structures which might have duplicate or semi-duplicate entries when converted to an xarray type. This is unlikely to be a showstopper, but will require use case testing, creating useful error messages, and likely some documentation.

Originally, the hope was to have polymorphic methods in the main API (i.e. you can pass either a data frame or an xarray object into any method) however this became a challenge for type hinting and inheritance. Xarray recommend creating “accessor methods” (https://tutorial.xarray.dev/advanced/accessors/01_accessor_examples.html) rather than creating new classes which inherit from xarray types. Some experimentation revealed that inheritance from xarray types is not straightforward.

As a result, it was a cleaner implementation and also clearer for users to provide a separate API for pandas. This would also allow any different treatments that might be wanted for data frames to be handled there.

from scores.

tennlee avatar tennlee commented on August 18, 2024

I spent some time yesterday looking into the intermediate option that I suggested. I think it would still end up being a bit rushed, so I would prefer to de-emphasise the role of pandas in the paper until we have a more fulsome implementation. I'll update the paper accordingly and let you know when the update is ready.

from scores.

tennlee avatar tennlee commented on August 18, 2024

@calebweinreb we have merged a change to the paper into 528-joss-paper-submission which we believe addresses your feedback. Please let us know whether this seems suitable.

from scores.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.