I'm pulling some numbers that are represented as 'length 38 numeric' type in SQL Serve

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

while staying in the world of data-science types <p dir

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Rendering DECIMAL(38, 0) as a <code class="notranslat

Option to override numpy dtype for decimals about turbodbc HOT 8 CLOSED

blue-yonder commented on May 20, 2024

Option to override numpy dtype for decimals

from turbodbc.

Comments (8)

MathMagique commented on May 20, 2024 1

I'm working on a release for this week.

from turbodbc.

keitherskine commented on May 20, 2024

At the risk of muddying things here, I for one would certainly appreciate it if decimals were returned as Python decimals, rather than floats, regardless of size.

from turbodbc.

wcbeard commented on May 20, 2024

@keitherskine Interesting, I've seen this behavior with pyodbc, but never found it personally useful since I do most things in the numpy world. As I understand it, numpy doesn't have a special decimal representation and would take a performance hit treating them all as python objects, right? Is this useful just for preserving the correct amount of precision?

from turbodbc.

MathMagique commented on May 20, 2024

The current behavior means that pure integers are always retrieved without loss of precision while staying in the world of data-science types. Turbodbc does not use floats for the former reason and does not use decimals for the latter. For the target audience of turbodbc, this still seems like a sensible default.

I agree that the conversion behavior should be configurable, however. I am thinking about options such as large_integer_decimals_as='float' or something.

I might support decimals eventually, since this would open up turbodbc for people in finance. But that would be in another issue ;-).

from turbodbc.

wcbeard commented on May 20, 2024

while staying in the world of data-science types

If I understand it correctly, turbodbc currently treats large decimals as strings. I'm curious what is data-sciency about this type? Is it because it's a relatively low overhead python object that preserves precision or something else?

My current workflow automates just calling .astype(float) on them after I put them in a dataframe (since I don't really make use of precision benefits of large decimals). I'm just wondering if there's a better way I should be handling them if I want to treat them as floats.

from turbodbc.

keitherskine commented on May 20, 2024

@d10genes Yes, primarily, returning Decimal objects would maintain precision. It's also kinda nice when columns in the database are returned as their closest Python representative, but I appreciate in many cases that's easier said than done.

from turbodbc.

MathMagique commented on May 20, 2024

Rendering DECIMAL(38, 0) as a string is more data-sciency because it does not loose precision for a type that is often used for categorization, and enables you to do whatever you please. Also, since most data science uses 8-byte types (or smaller ones), I opted for a good solution for anything that fits into 8 bytes of precision (such as DECIMAL(18, 0)).

Your automatic casting seems like a sensible approach, but I am not opposed to give turbodbc users the option to choose what to do for larger decimals. Alternatively, you could just cast the column to DOUBLE or INTEGER depending on the precision you really need for this field.

Decimal would maintain precision, that is true. unixodbc offers 128 bits of precision with its NUMERIC and DECIMAL data types. That is sufficient for MSSQL, where NUMERIC(38, 0) seems to be the maximum. It is not sufficient for MySQL that supports NUMERIC(65, 0) or PostgreSQL (DECIMAL(1000, 0)). In these cases, values need to be retrieved as strings in any case. And I also want to avoid other stuff such as DECIMAL(18, 0) to be retrieved as a DECIMAL, because it perfectly fits in a 64 bit integer without lying at all.

from turbodbc.

m1racoli commented on May 20, 2024

Is there a timeframe we can expect this to be released?

from turbodbc.

Option to override numpy dtype for decimals about turbodbc HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent