Comments (8)
I'm working on a release for this week.
from turbodbc.
At the risk of muddying things here, I for one would certainly appreciate it if decimals were returned as Python decimals, rather than floats, regardless of size.
from turbodbc.
@keitherskine Interesting, I've seen this behavior with pyodbc, but never found it personally useful since I do most things in the numpy world. As I understand it, numpy doesn't have a special decimal representation and would take a performance hit treating them all as python objects, right? Is this useful just for preserving the correct amount of precision?
from turbodbc.
The current behavior means that pure integers are always retrieved without loss of precision while staying in the world of data-science types. Turbodbc does not use floats for the former reason and does not use decimals for the latter. For the target audience of turbodbc, this still seems like a sensible default.
I agree that the conversion behavior should be configurable, however. I am thinking about options such as large_integer_decimals_as='float'
or something.
I might support decimals eventually, since this would open up turbodbc for people in finance. But that would be in another issue ;-).
from turbodbc.
while staying in the world of data-science types
If I understand it correctly, turbodbc currently treats large decimals as strings. I'm curious what is data-sciency about this type? Is it because it's a relatively low overhead python object that preserves precision or something else?
My current workflow automates just calling .astype(float)
on them after I put them in a dataframe (since I don't really make use of precision benefits of large decimals). I'm just wondering if there's a better way I should be handling them if I want to treat them as floats.
from turbodbc.
@d10genes Yes, primarily, returning Decimal
objects would maintain precision. It's also kinda nice when columns in the database are returned as their closest Python representative, but I appreciate in many cases that's easier said than done.
from turbodbc.
Rendering DECIMAL(38, 0)
as a string
is more data-sciency because it does not loose precision for a type that is often used for categorization, and enables you to do whatever you please. Also, since most data science uses 8-byte types (or smaller ones), I opted for a good solution for anything that fits into 8 bytes of precision (such as DECIMAL(18, 0)
).
Your automatic casting seems like a sensible approach, but I am not opposed to give turbodbc users the option to choose what to do for larger decimals. Alternatively, you could just cast the column to DOUBLE
or INTEGER
depending on the precision you really need for this field.
Decimal
would maintain precision, that is true. unixodbc
offers 128 bits of precision with its NUMERIC
and DECIMAL
data types. That is sufficient for MSSQL, where NUMERIC(38, 0)
seems to be the maximum. It is not sufficient for MySQL that supports NUMERIC(65, 0)
or PostgreSQL (DECIMAL(1000, 0)
). In these cases, values need to be retrieved as strings in any case. And I also want to avoid other stuff such as DECIMAL(18, 0)
to be retrieved as a DECIMAL
, because it perfectly fits in a 64 bit integer without lying at all.
from turbodbc.
Is there a timeframe we can expect this to be released?
from turbodbc.
Related Issues (20)
- Not detecting installed numpy after installing turbodbc 4.5.6 with Python 3.11 HOT 1
- Arrow support doesn't work in 3.10 with pyarrow-9.0.0/10.0.1 on Ubuntu 22.04
- 4..5.10 produces ImportError with pyarrow-11.0.0 and Python 3.10 on Linux-x86_64 HOT 4
- Issues Parsing Multiple Result Sets HOT 2
- To build turbodbc with arrow support with pyarrow 12.0.0
- Add pyarrow 12.0.0 support
- How to set pre-connect connection attributes? HOT 1
- Advanced use: Arrow 'double' type as supported dtype -> is not a pyarrow dtype? HOT 1
- Using more than 2 threads
- How to deal with Snowflake string length reporting HOT 2
- turbodbc does not play well with poetry HOT 6
- Support pyarrow=14.0.1 HOT 2
- Does turbodbc still need boost as a dependency?
- executemany does not execute sql statement HOT 3
- release 4.10.1 broke pip/poetry installation because of missing dependency. HOT 6
- Can't resolve correct pyarrow verision with pip and pyproject.toml
- Numpy version issue when installing turbodbc
- 4.11.0 Installation error: fatal error C1083: Cannot open include file: 'simdutf.h': No such file or directory HOT 4
- Return pyarrow.RecordBatchReader from cursor.fetcharrowbatches
- Turbodbc + pyarrow installation issue HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from turbodbc.