<a href="https://pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html#advance

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Basically, <div class="snippet-clipboard-content notranslate position-relative ove

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

hierarchical indexes are uploaded as a single column about gspread-dataframe HOT 10 CLOSED

robin900 commented on May 26, 2024

hierarchical indexes are uploaded as a single column

from gspread-dataframe.

Comments (10)

robin900 commented on May 26, 2024

MultiIndex can be used for a DataFrame's _ columns_ as well. In that case, what behavior would you want to see? A single header row, or one header row for each level in the MultiIndex (which is what I would expect to see)?
The API seems to have changed with version 0.24.0, where "labels" got renamed to "codes". I really strive to have gspread-dataframe work with a wide range of Pandas versions and I strive not to call specific API methods when possible, precisely because of these kinds of API changes. I'm hoping there's a way to avoid referencing labels/codes directly in this package's code.
Please tell me more about how you'd like to see MultiIndex support: any extra parameters or options you'd like to see?

from gspread-dataframe.

sam-s commented on May 26, 2024

MultiIndex can be used for a DataFrame's _ columns_ as well. In that case, what behavior would you want to see? A single header row, or one header row for each level in the MultiIndex (which is what I would expect to see)?

the latter - one header row for each level

The API seems to have changed with version 0.24.0, where "labels" got renamed to "codes". I really strive to have gspread-dataframe work with a wide range of Pandas versions and I strive not to call specific API methods when possible, precisely because of these kinds of API changes. I'm hoping there's a way to avoid referencing labels/codes directly in this package's code.

I understand the predicament you are in - backwards compatibility is important, but I see no reason to keep support for pre-1.0 pandas.

Please tell me more about how you'd like to see MultiIndex support: any extra parameters or options you'd like to see?

Generally speaking, I would like the appearance of the uploaded sheet to be similar to what I see when a DataFrame is printed.
In this case, I would like to have a separate column (with a header!) for each level.

I am torn between

foo one
foo two

and

foo one
    two

I think there should be an option to select one of them, the default being the 1st variant.
The same goes for column indexes.

Thank you for your attention!

from gspread-dataframe.

robin900 commented on May 26, 2024

@sam-s I've pushed a commit to master that offers support for MultiIndex indexes in set_with_dataframe. (MultiIndexes have always been supported via index_col parameter to get_as_dataframe.)

If you test the package code directly from master (I added integration test coverage of the feature, and it passes), and you're happy with it, I can cut a new release today.

There is not yet support for:

the second variant that displays levels without redundant values in higher levels;
MultiIndex support for a dataframe's columns. That will come in a later commit; it's more complicated to implement.

from gspread-dataframe.

robin900 commented on May 26, 2024

I also just noticed that pandas.read_csv interprets multiple header rows in an unexpected way. If you have a spreadsheet like

the columns headers are read as a MultiIndex with names ['SQL', 'Query'] and levels ['SQL', 'SQL', 'Misc', 'Misc', 'Misc'], ['Parameters', 'Plan', 'Date', 'Author', 'Misc']. @sam-s Is that what you would expect?

from gspread-dataframe.

sam-s commented on May 26, 2024

Basically,

| foo | foo |
| bar | baz |
|  1  |  2  |

is treated the same as

|    foo    |
| bar | baz |
|  1  |  2  |

which does seem wrong

from gspread-dataframe.

robin900 commented on May 26, 2024

It gets even worse if there's an index column with a name, or if there's an index column that's a multi-index. From my local testing, I actually don't see a way for read_csv in pandas to succeed in reading the levels of the non-index columns and the levels of the index column(s) properly.

I do think there's a way to support MultiIndex either for index column when include_index=True, or MultiIndex for dataframe.columns when include_column_header=True. I'll probably have MultiIndex for dataframe.columns implemented by tomorrow.

But not both in the same dataframe, at least not where a round-trip back from Google Sheets produces the same MultiIndexes. I'll implement so that at least a write will succeed, and let pandas.TextParser take the blame for not reading it back properly. :)

from gspread-dataframe.

robin900 commented on May 26, 2024

Implementation complete, even when include_index=True and index is MultiIndex and columns are MultiIndex. Release 3.0.5 is now on PyPI with this feature. Please confirm that 3.0.5 is working for you!

(The option to have the second display variant needs to wait. I'm going to open another issue just for that and mention you.)

from gspread-dataframe.

sam-s commented on May 26, 2024

It works! Thank you!

from gspread-dataframe.

robin900 commented on May 26, 2024

@sam-s If you're interested, try gspread_formatting which offers a good default sheet formatter for dataframes, and with release 0.2.1 today supports MultiIndex index or columns.

from gspread-dataframe.

sam-s commented on May 26, 2024

I do use gspread_formatting, but I don't like its default behavior.
Here is what I do:

from gspread_dataframe import set_with_dataframe
from gspread_formatting import set_frozen, CellFormat, Color, format_cell_ranges, \
    NumberFormat, TextFormat

def upload_dataframe(workbook, worksheet, df,  percent=lambda s: s.endswith("%")):
    "Upload DataFrame to Google Sheet."
    ws = get_worksheet(get_workbook(workbook),worksheet)
    set_with_dataframe(ws, df, include_index=True, resize=True)
    set_frozen(ws, rows=df.columns.nlevels, cols=df.index.nlevels)
    bold = CellFormat(textFormat=TextFormat(bold=True),
                      backgroundColor=Color(0.9,0.9,0.9))
    pct = CellFormat(numberFormat=NumberFormat(type="percent",pattern="##.##%"))
    ws_cols = list(string.ascii_uppercase)
    if len(ws_cols) < df.shape[1] + df.index.nlevels:
        ws_cols.extend(["%s%s"%p for p in itertools.product(ws_cols,repeat=2)])
    format_cell_ranges(ws, [("A:%s" % (ws_cols[df.index.nlevels-1]), bold),
                            ("1:%d" % (df.columns.nlevels), bold)] +
                       [("%s:%s" % (c1,c1),pct) for c0,c1 in zip(
                           [""]*df.index.nlevels + list(df.columns), ws_cols)
                        if percent(c0)])

from gspread-dataframe.

hierarchical indexes are uploaded as a single column about gspread-dataframe HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent