Coder Social home page Coder Social logo

Comments (20)

martindurant avatar martindurant commented on August 18, 2024

What is the data you are working with, please, how was it made?

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

we received it from other vendor. not sure how they are generating.

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

it's supporting for view of the online viewers and some of them were not able to open. i was able to load all the 25 files with ADF. if i assumed files are corrupted. it's should be load from any of the viewer or ETL.

from fastparquet.

martindurant avatar martindurant commented on August 18, 2024

The indicated line in the code is reading a Parquet V1 data page, but encoding 5, DELTA_BINARY_PACKED, is a feature of Parquet V2. There's no technical reason V1 pages couldn't be decoded, but the code path does not exist.

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

I appologies for that. I didn't get you. you mean my code is refering v1. current i am using 0.8.3 fastparquet version.

from fastparquet.

martindurant avatar martindurant commented on August 18, 2024

No, V1 of the parquet spec, which (roughly) corresponds to V1 type data pages, which is what your file has

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

Ok. As per my understanding, some of the files has been generated with V1 type and some of them are genereated with V2 type? Could you please suggest how can i verfiy? if that is the case which is the better approch to load V1 and V2 type files with fastparquet lib.

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

The NotImplementedError you're encountering indicates that fastparquet encountered an encoding that it does not support. The error message specifically mentions "Encoding 5," which is likely a Parquet encoding method or type that fastparquet does not currently handle.

from fastparquet.

martindurant avatar martindurant commented on August 18, 2024

some of the files has been generated with V1 type and some of them are genereated with V2 type

I suppose so? fastparquet does not expose this information, although I can see it would be useful.

To be sure, it is the combination of v1 and encoding 5 that is the problem. encoding 5 works with v2, and other encodings work with v1.

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

Thank you so much.

from fastparquet.

martindurant avatar martindurant commented on August 18, 2024

Given that the decoding algorithm already exists, I'll see if I can carve out a little time to implement it for v1 data pages - but I would need a file to test against.

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

I didn't find the file attachment. can you share me your email id?

from fastparquet.

martindurant avatar martindurant commented on August 18, 2024

You can put binary files directly into a comment box here, but maybe it needs to be inside a .zip . Alternatively, you can start a private Gist and invite me if the file is sensitive.

I can't promise when I might get a chance to look into the matter.

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

TestFile_Failure.zip

from fastparquet.

martindurant avatar martindurant commented on August 18, 2024

I am seeing BookingTypeIDs incrementing from 1 to 12. Is this right?

Can this test file be included in the repo?

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

yes. Thats the file only. please do not include in repo.

from fastparquet.

martindurant avatar martindurant commented on August 18, 2024

@venkatsura , can you build and test this on your data, please?

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

I am using astparquet-2023.8.0. build my code and tested. it's giving the same error.

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

Thank you. It's working. May i know when it can be released to public. we are going to be deploy it to Azure. for azure lib's will be installed by pip online.

from fastparquet.

venkatsura avatar venkatsura commented on August 18, 2024

FYI.. we are doing some testing. we identified integer values are loading with negative (-) sign. I will post the details.

from fastparquet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.