Coder Social home page Coder Social logo

File format about ace.jl HOT 8 CLOSED

acesuit avatar acesuit commented on July 16, 2024
File format

from ace.jl.

Comments (8)

albapa avatar albapa commented on July 16, 2024 1

I was going to say go with bson but then I saw your message on slack... Maybe read/write is still faster though.

I think zipping the json would be perfect - although I don't know how parsing performance (of default libraries) compares to XML.

from ace.jl.

gabor1 avatar gabor1 commented on July 16, 2024

So the route that @albapa and I have taken is to write a "fat" file, with everything in it, even the original training data, i.e. enough to actually rerun the training with a future version of the code, but structured in such a way that the file can simply be transformed (by removing lines) to a "thin" format, that is just enough to evaluate the potential, possibly using some restricted versions of the code (in our case, with versions of the code > version that wrote the file, but you could even be thinner than that)

We write the fat version by default, because users often don't mind large files, and helps debugging. if there is a utility provided to transform fat files to thin files then they don't need to carry around large files if that is a problem. developers who might be creating a huge number of potential files in a short space of time during development will know how to switch on the thin writer.

from ace.jl.

cortner avatar cortner commented on July 16, 2024

Ok so that sounds like some form of mixed thin/fat format would be ideal.

from ace.jl.

gabor1 avatar gabor1 commented on July 16, 2024

from ace.jl.

albapa avatar albapa commented on July 16, 2024

We used to use a binary format which was very fast but a pain in all other respects. Then we went for XML, with CDATA lines for the meta-data, i.e. training configurations and command line options. I think this was a very good choice, for the reasons above. We actually have the options for companion files to store lots of reals, which are slow and cumbersome to read by XML - these are read in C. In the training code, there is an option to omit the training data, which is useful for explorations and quick tests, and for distribution we use the full version.

I guess today we would use a json file.

from ace.jl.

cortner avatar cortner commented on July 16, 2024

So far I've stored huge amounts of reals in a separate HDF5 file. So similar to your approach.

What's your view on JSON (XML) compressed as zip as needed?

from ace.jl.

cortner avatar cortner commented on July 16, 2024

I think that's where I'm going then. Julia has very nice zip format integration via ZipFile.jl

from ace.jl.

cortner avatar cortner commented on July 16, 2024

I'm going to close this - Zipped JSON files turn out to be easy to manage in Julia and exactly the level of flexibility we need.

from ace.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.