Coder Social home page Coder Social logo

Comments (7)

jdmarshall avatar jdmarshall commented on September 15, 2024

I like it. Make the file descriptor a list of 'inodes' and I think you have something there. Reduces the cost of appending to the end of the file.

from pifs.

typecasto avatar typecasto commented on September 15, 2024

It seems like this would just be a tradeoff between compression speed and compression ratio, this being a novelty program I personally think it'd have more use if the ratio was higher, at least then there would be SOME practical use for it, if transfer speed between computers was a heavy bottleneck (floppy drives). If write speed is a bottleneck, there's much much better filesystems and compression algos that satisfy that niche, but a heavy compression algorithm has many more use cases imo.

from pifs.

jdmarshall avatar jdmarshall commented on September 15, 2024

Oh gosh, I completely forgot about this project.

I think it's a matter of difference in opinion rooted in whether you're thinking about files at rest or files in motion. Encoding a file in pifs could take an arbitrarily long time if you represent the file as a single offset. During that time you can report no progress to an interested party.

If you made the file contents into a series of offsets (leaving aside hamming distance and correction codes for a moment), then you could stream the encoding at creation time. This could be writing a new file on disk (moving from memory to disk) or lazily encoding a file for transport to another system. Additionally, as I mentioned before, appending to a file is cheap because the common prefix between the new and old file could be represented by the same bytes.

from pifs.

typecasto avatar typecasto commented on September 15, 2024

I'd honestly say that pi-based compression is better as a program like 7-zip, rather than a filesystem, but i'd gladly make a sacrifice of speed for a heavy compression algorithm.

It seems to me (just based on intuition) that as the amount required to store the location in pi increases linearly, the amount of data stored increases exponentially (this may not be the case, or it might have a zero that's way more than any sensible person could store, or take longer than any person could actually be expected to wait)

While this might have some application, I doubt that write speed would ever be very good, and it might not even be feasible for large amounts of data.

Though, error correction COULD be used to vastly decrease storage size, if we find a partial match early in pi, and then correct the errors from there. I'd say that if we have a n byte file, and we find a match of the file that has more than n/2 bytes correct, we generate error correction data based on that match and then write the location, ecc data, and length to a file, and it could even be shorter than the file itself in some cases.

I might honestly just write my own version at this point. Thoughts?

from pifs.

mmhobi7 avatar mmhobi7 commented on September 15, 2024

From my experience, when I tried to store a small image, the file size of the pi compressed file dramatically larger than the file before compression. Could be a draw of the luck but it could also be that the index value has to be massive to accompany the data.

from pifs.

typecasto avatar typecasto commented on September 15, 2024

That's why the error correction data would be added, so that we could find a much lower match and still have it match a majority of the file, storing less data.

from pifs.

jdmarshall avatar jdmarshall commented on September 15, 2024

Pigeonhole principle. If you want a library with an infinite number of books, (which is what you're doing here) then the catalog is also infinite. The numbering on the book will on average be the same size as the contents of the book. In some cases shorter, in others longer. Compression just exploits the fact that some sequences of bytes are gibberish and are not useful to human beings. So gibberish takes longer numbers and comprehensible data shorter ones. The Library of Congress has standards that don't include admitting books full of gibberish. So they only have to index books that make some sense. Finnegan's Wake notwithstanding.

But pi is irrational. It's gibberish by definition. You're trying to find one sensible sequence of bytes in a sea of random noise. Fuzzing makes a smaller file if you can approximate the output with a simpler pattern. Essentially predict what a more boring version of the file would look like. There is no simpler pattern here. The file is always going to take as many bytes as it will take.

The counterproposal I made doesn't improve space either. In fact it's a little worse. I'm constrained by the same laws of Information Theory as the rest of you. But it would improve time, and quite substantially.

But as we're essentially talking about BogoSort for files, there's not enough beer in this conversation I'm afraid :)

from pifs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.