Comments (7)
df["day"] = df["t1"].dt.date
^ this returns an dtype=object
column contain Python datetime.date objects. dtype=object
type columns are loosely typed and not vectorizable
You would either have to live with df["day"]
being a datetime instead of a date, or alternately use pyarrow types for a stricter differentiation between date / datetime.
I'm not 100% clear on what your SO question is trying to accomplish since the timedelta you are constructing measures nanosecond differences, but I think this is what you are after:
import pandas as pd
import pyarrow as pa
s1 = pd.DataFrame({"year": [2015, 2016], "month": [2, 3], "day": [4, 5]})
s1 = pd.to_datetime(s1)
df = pd.DataFrame(s1)
df = df.rename(columns={0: "t1", 1: "t2"})
df["day"] = df["t1"].astype(pd.ArrowDtype(pa.date32()))
df["n"] = pd.to_timedelta(df["t1"].dt.day_of_week)
df["week"] = df["day"] - df["n"]
@jbrockmendel for any other guidance
from pandas.
id suggest using a Period[D] dtype. I'd be open to making obj.dt.date do that.
from pandas.
That makes sense from purely a pandas perspective. I think the downside is when you start talking about I/O (thinking especially with databases where DATE / TIMESTAMP are usually distinct types) I'm not sure how proper our Period support would be. With tools like ADBC the arrow types are already accounted for.
Not going to solve that issue in this issue per se - just food for larger thought
from pandas.
I'm pretty sure you've mentioned concerns like that before. how difficult would it be to make Period[D] work like you expect with a database? is that concern a show-stopper for many users?
from pandas.
Not sure. To be honest I don't know a ton of the internals on that - I'm sure its possible but I just question if its worth the effort when its already been done by pyarrow.
FWIW using dtype_backend="pyarrow" with read_csv will return dates as date32 already, so that would be something else we'd have to wire into the parsers. date32 is also exclusively a date type; I suppose a period could represent more things that we would have to handle when serializing outwards (ex: Period("D") may make sense for a DATE database type, but what about Period("Q")?)
from pandas.
when its already been done by pyarrow.
IIUC the suggestion you are implicitly making (here and in #58220) is to have obj.dt.date return with date32[pyarrow] dtype. The trouble with this is 1) pyarrow is not required and 2) it would give users mixed-and-matched null-propagation semantics, which we agreed we needed to avoid when implementing the hybrid string dtype. So for the foreseeable future i just don't see that as a viable option. Period[D] is our de facto date dtype (there has been discussion of making a DateDtype as a thin wrapper around this, but im not finding it on the tracker).
FWIW converting a Period[D] PeriodArray to date32[pyarrow] can be done with:
i4vals = arr.view("i8").astype("int32")
dt32 = pa.array(i4vals, type="date32")
(assuming the astype to int32 doesn't overflow)
from pandas.
Right now I think ser.dt.date should only return a pa.date32 if the series is a pa.timestamp. I agree I don't want to mix those systems, so I see your point about that returning a period when the call is a datetime64.
Im +/- 0 on that versus encouraging more arrow date / timestamp usage
from pandas.
Related Issues (20)
- BUG: Pandas 2 is broken! HOT 2
- BUG: 2-sided inplace drop loses freq in DatetimeIndex HOT 3
- BUG: read_orc does not use the provided filesystem for all operations HOT 1
- BUG: pd.to_datetime fails to identify actual date format HOT 4
- BUG: eval fails for ExtensionArray HOT 2
- ENH: Randomised row selection with read_csv() HOT 4
- BUG: read_parquet converts all digits strings to int HOT 2
- Make specific pandas dataframe column immuteable / not changeable HOT 4
- BUG: df.drop_duplicates fails if there is only a single row HOT 3
- Potential regression with PR "PERF: Eliminate circular references in accessor attributes (#58733)" HOT 1
- ENH: support parquet's enum type using Categorical when (de)serializing HOT 3
- ENH: generalize `__init__` on a `dict` to `abc.collections.Mapping` and `__getitem__` on a `list` to `abc.collections.Sequence` HOT 10
- ENH: Add a Series method which checks whether a Series is constant HOT 2
- BUG: df.agg with pd.NamedAgg axis=1 unsupported, but errors differently depending on contents of index HOT 1
- BUG: Segmentation Fault when importing Pandas in python 3.10.14 HOT 3
- BUG: df.agg with df with missing values results in IndexError HOT 3
- BUG: Groupby transformation (cumsum) output dtype depends on whether NA is among group labels HOT 7
- DOC: Docstrings missing from .py files in Sphinxext docs folder HOT 7
- BUG: Lookup by datetime in timestamp index does not work HOT 1
- DOC: Insufficient Project Background Information HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandas.