Coder Social home page Coder Social logo

Comments (10)

robin900 avatar robin900 commented on May 26, 2024
  1. MultiIndex can be used for a DataFrame's _ columns_ as well. In that case, what behavior would you want to see? A single header row, or one header row for each level in the MultiIndex (which is what I would expect to see)?
  2. The API seems to have changed with version 0.24.0, where "labels" got renamed to "codes". I really strive to have gspread-dataframe work with a wide range of Pandas versions and I strive not to call specific API methods when possible, precisely because of these kinds of API changes. I'm hoping there's a way to avoid referencing labels/codes directly in this package's code.
  3. Please tell me more about how you'd like to see MultiIndex support: any extra parameters or options you'd like to see?

from gspread-dataframe.

sam-s avatar sam-s commented on May 26, 2024
  1. MultiIndex can be used for a DataFrame's _ columns_ as well. In that case, what behavior would you want to see? A single header row, or one header row for each level in the MultiIndex (which is what I would expect to see)?

the latter - one header row for each level

  1. The API seems to have changed with version 0.24.0, where "labels" got renamed to "codes". I really strive to have gspread-dataframe work with a wide range of Pandas versions and I strive not to call specific API methods when possible, precisely because of these kinds of API changes. I'm hoping there's a way to avoid referencing labels/codes directly in this package's code.

I understand the predicament you are in - backwards compatibility is important, but I see no reason to keep support for pre-1.0 pandas.

  1. Please tell me more about how you'd like to see MultiIndex support: any extra parameters or options you'd like to see?

Generally speaking, I would like the appearance of the uploaded sheet to be similar to what I see when a DataFrame is printed.
In this case, I would like to have a separate column (with a header!) for each level.

I am torn between

foo one
foo two

and

foo one
    two

I think there should be an option to select one of them, the default being the 1st variant.
The same goes for column indexes.

Thank you for your attention!

from gspread-dataframe.

robin900 avatar robin900 commented on May 26, 2024

@sam-s I've pushed a commit to master that offers support for MultiIndex indexes in set_with_dataframe. (MultiIndexes have always been supported via index_col parameter to get_as_dataframe.)

If you test the package code directly from master (I added integration test coverage of the feature, and it passes), and you're happy with it, I can cut a new release today.

There is not yet support for:

  1. the second variant that displays levels without redundant values in higher levels;
  2. MultiIndex support for a dataframe's columns. That will come in a later commit; it's more complicated to implement.

from gspread-dataframe.

robin900 avatar robin900 commented on May 26, 2024

I also just noticed that pandas.read_csv interprets multiple header rows in an unexpected way. If you have a spreadsheet like

image

the columns headers are read as a MultiIndex with names ['SQL', 'Query'] and levels ['SQL', 'SQL', 'Misc', 'Misc', 'Misc'], ['Parameters', 'Plan', 'Date', 'Author', 'Misc']. @sam-s Is that what you would expect?

from gspread-dataframe.

sam-s avatar sam-s commented on May 26, 2024

Basically,

| foo | foo |
| bar | baz |
|  1  |  2  |

is treated the same as

|    foo    |
| bar | baz |
|  1  |  2  |

which does seem wrong

from gspread-dataframe.

robin900 avatar robin900 commented on May 26, 2024

It gets even worse if there's an index column with a name, or if there's an index column that's a multi-index. From my local testing, I actually don't see a way for read_csv in pandas to succeed in reading the levels of the non-index columns and the levels of the index column(s) properly.

I do think there's a way to support MultiIndex either for index column when include_index=True, or MultiIndex for dataframe.columns when include_column_header=True. I'll probably have MultiIndex for dataframe.columns implemented by tomorrow.

But not both in the same dataframe, at least not where a round-trip back from Google Sheets produces the same MultiIndexes. I'll implement so that at least a write will succeed, and let pandas.TextParser take the blame for not reading it back properly. :)

from gspread-dataframe.

robin900 avatar robin900 commented on May 26, 2024

Implementation complete, even when include_index=True and index is MultiIndex and columns are MultiIndex. Release 3.0.5 is now on PyPI with this feature. Please confirm that 3.0.5 is working for you!

(The option to have the second display variant needs to wait. I'm going to open another issue just for that and mention you.)

from gspread-dataframe.

sam-s avatar sam-s commented on May 26, 2024

It works! Thank you!

from gspread-dataframe.

robin900 avatar robin900 commented on May 26, 2024

@sam-s If you're interested, try gspread_formatting which offers a good default sheet formatter for dataframes, and with release 0.2.1 today supports MultiIndex index or columns.

from gspread-dataframe.

sam-s avatar sam-s commented on May 26, 2024

I do use gspread_formatting, but I don't like its default behavior.
Here is what I do:

from gspread_dataframe import set_with_dataframe
from gspread_formatting import set_frozen, CellFormat, Color, format_cell_ranges, \
    NumberFormat, TextFormat

def upload_dataframe(workbook, worksheet, df,  percent=lambda s: s.endswith("%")):
    "Upload DataFrame to Google Sheet."
    ws = get_worksheet(get_workbook(workbook),worksheet)
    set_with_dataframe(ws, df, include_index=True, resize=True)
    set_frozen(ws, rows=df.columns.nlevels, cols=df.index.nlevels)
    bold = CellFormat(textFormat=TextFormat(bold=True),
                      backgroundColor=Color(0.9,0.9,0.9))
    pct = CellFormat(numberFormat=NumberFormat(type="percent",pattern="##.##%"))
    ws_cols = list(string.ascii_uppercase)
    if len(ws_cols) < df.shape[1] + df.index.nlevels:
        ws_cols.extend(["%s%s"%p for p in itertools.product(ws_cols,repeat=2)])
    format_cell_ranges(ws, [("A:%s" % (ws_cols[df.index.nlevels-1]), bold),
                            ("1:%d" % (df.columns.nlevels), bold)] +
                       [("%s:%s" % (c1,c1),pct) for c0,c1 in zip(
                           [""]*df.index.nlevels + list(df.columns), ws_cols)
                        if percent(c0)])

from gspread-dataframe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.