Coder Social home page Coder Social logo

Comments (9)

eduard-sukharev avatar eduard-sukharev commented on August 11, 2024 1

Prove me wrong, but the sole purpose of the project is to demonstrate the possibility, not to actually get usable or sane results.
pifs is unusable and in no way viable project, it has no real value apart from academic use, i guess. This project is actually more of a joke, not real life solution. Although it works in some way, the issue you've raised is quite obvious and is a show-stopper.
Increasing the data chunk length leads to increased offset lengths which tend to take more memory than saving. Though, this project might be more usable as cryptography solution of some kind.

from pifs.

kenorb avatar kenorb commented on August 11, 2024 1

Related: Are there any compression algorithms based on PI?

from pifs.

georgy7 avatar georgy7 commented on August 11, 2024

I supposed the author of pifs already had solved this problem (throught an additional bit for example). But really how about some statistics approving pifs viability? I mean ability to compress data.

from pifs.

ddaws avatar ddaws commented on August 11, 2024

It can not consequently be stated the :

Increasing the data chunk length leads to increased offset lengths

This is not always the case. You are just as likely to find a large data chunk in the first several digits of pi as you are anywhere in pi.

from pifs.

georgy7 avatar georgy7 commented on August 11, 2024

You are just as likely to find a large data chunk in the first several digits of pi as you are anywhere in pi.

That's not true. If we have some integer cell for a pi offset, it can direct to a limited count of places in pi. In that part of pi we have a big count of small chunks and a smaller count of bigger chunks. And if there are all possible small chunks, not every small chunk have every possible continuation.

from pifs.

anubisthejackle avatar anubisthejackle commented on August 11, 2024

It's very true that it's mathematically impossible for a single file to be stored using this method and use less memory than it would now. But, what of a cloud solution? Storing individual bits this way, sort of like rainbow tables, and referencing individual bits--or even long strings of bits--which can be compiled by reference.

Granted, this may or may not have the same speed deficiencies that the current code has, but I believe storage wise, storing this meta data would be less burden.

from pifs.

georgy7 avatar georgy7 commented on August 11, 2024

How about this algorithm: when you know exact pi offset B (bignum), you create its escimated value D (as Long exponent of some big base). So you can reverse convert D to bignum and get D'<B.

Then you take a few first bytes of original sequence and look up if the segment D'B contains this little opening subsequence. If it does then you slightly increase this opening subsequence and try again.

Finally you get D (64 bit Long), length of original sequence and initial subsequence of original sequence. So in worst case, Result = Input + 64 bits + ? bits for length.

from pifs.

ddaws avatar ddaws commented on August 11, 2024

@georgy7 I don't think there are less large chunks within a range of pi than their are small chunks.

Consider having n sequential digits of pi and wanting to determine the number of unique sequential subsets of size x. At position 0 the first subset is 0 through to position x, and at position 1 the second subset is 1 through to position x + 1. Thus at position i the subset is i through to position x + i. Subsets may be made from position 0 through to position n - x.

Thus when looking for chunks through the first 10000 digits of Pi a chunk size of :

  • 16 results in 9984 chunks.
  • 32 results in 9968 chunks.

So doubling the chunk size barely reduces the number of potential chunks.

Next consider uniqueness. Larger chunks are more likely to be unique and for this reason I believe their will likely be the same of more unique chunks of a larger size than a lesser.

Experiment :

f = open('PI25K_DP.TXT', 'r')
pi_str = f.read().rstrip()

print(pi_str)
print('digits : {0}'.format(len(pi_str)))


def count_unique_chunks(chunk_size):

    set = {}
    for i, char in enumerate(pi_str):

        if i < len(pi_str) - chunk_size:
            chunk = pi_str[i:i + chunk_size]
            if chunk not in set:
                set[chunk] = True

    num_unique_chunks = len(set.keys())
    print('{0} | {1}'.format(chunk_size, num_unique_chunks))
    return num_unique_chunks


for i in xrange(64):
    count_unique_chunks(i)

You may find PI25K_DP.TXT here.

Results :

Chunk Size Number of Unique Chunks
0 1
1 10
2 100
3 1000
4 9169
5 22170
6 24690
7 24957
8 24987
9 24991
10 24990
11 24989
12 24988
13 24987
14 24986
15 24985
16 24984
17 24983
18 24982
19 24981
20 24980
21 24979
22 24978
23 24977
24 24976
25 24975
26 24974
27 24973
28 24972
29 24971
30 24970
31 24969
32 24968
33 24967
34 24966
35 24965
36 24964
37 24963
38 24962
39 24961
40 24960
41 24959
42 24958
43 24957
44 24956
45 24955
46 24954
47 24953
48 24952
49 24951
50 24950
51 24949
52 24948
53 24947
54 24946
55 24945
56 24944
57 24943
58 24942
59 24941
60 24940
61 24939
62 24938
63 24937

from pifs.

georgy7 avatar georgy7 commented on August 11, 2024

@dreid93
25K of pi digits contains 24937 chunks of length = 63 decimal digits.
But how many chunks of this length can exist. Its 10^63.
Possibility of first 25K of pi contains a random 63 number sequence = 24937 / 10^63.

from pifs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.