Hey! So just say I find my data at index x . Wouldn

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Wouldn't storing the index into π be as big (or bigger) than the original data? about pifs HOT 9 OPEN

philipl commented on August 11, 2024 1

Wouldn't storing the index into π be as big (or bigger) than the original data?

from pifs.

Comments (9)

eduard-sukharev commented on August 11, 2024 1

Prove me wrong, but the sole purpose of the project is to demonstrate the possibility, not to actually get usable or sane results.
pifs is unusable and in no way viable project, it has no real value apart from academic use, i guess. This project is actually more of a joke, not real life solution. Although it works in some way, the issue you've raised is quite obvious and is a show-stopper.
Increasing the data chunk length leads to increased offset lengths which tend to take more memory than saving. Though, this project might be more usable as cryptography solution of some kind.

from pifs.

kenorb commented on August 11, 2024 1

from pifs.

georgy7 commented on August 11, 2024

I supposed the author of pifs already had solved this problem (throught an additional bit for example). But really how about some statistics approving pifs viability? I mean ability to compress data.

from pifs.

ddaws commented on August 11, 2024

It can not consequently be stated the :

Increasing the data chunk length leads to increased offset lengths

This is not always the case. You are just as likely to find a large data chunk in the first several digits of pi as you are anywhere in pi.

from pifs.

georgy7 commented on August 11, 2024

You are just as likely to find a large data chunk in the first several digits of pi as you are anywhere in pi.

That's not true. If we have some integer cell for a pi offset, it can direct to a limited count of places in pi. In that part of pi we have a big count of small chunks and a smaller count of bigger chunks. And if there are all possible small chunks, not every small chunk have every possible continuation.

from pifs.

anubisthejackle commented on August 11, 2024

It's very true that it's mathematically impossible for a single file to be stored using this method and use less memory than it would now. But, what of a cloud solution? Storing individual bits this way, sort of like rainbow tables, and referencing individual bits--or even long strings of bits--which can be compiled by reference.

Granted, this may or may not have the same speed deficiencies that the current code has, but I believe storage wise, storing this meta data would be less burden.

from pifs.

georgy7 commented on August 11, 2024

How about this algorithm: when you know exact pi offset B (bignum), you create its escimated value D (as Long exponent of some big base). So you can reverse convert D to bignum and get D'<B.

Then you take a few first bytes of original sequence and look up if the segment D'B contains this little opening subsequence. If it does then you slightly increase this opening subsequence and try again.

Finally you get D (64 bit Long), length of original sequence and initial subsequence of original sequence. So in worst case, Result = Input + 64 bits + ? bits for length.

from pifs.

ddaws commented on August 11, 2024

@georgy7 I don't think there are less large chunks within a range of pi than their are small chunks.

Consider having n sequential digits of pi and wanting to determine the number of unique sequential subsets of size x. At position 0 the first subset is 0 through to position x, and at position 1 the second subset is 1 through to position x + 1. Thus at position i the subset is i through to position x + i. Subsets may be made from position 0 through to position n - x.

Thus when looking for chunks through the first 10000 digits of Pi a chunk size of :

16 results in 9984 chunks.
32 results in 9968 chunks.

So doubling the chunk size barely reduces the number of potential chunks.

Next consider uniqueness. Larger chunks are more likely to be unique and for this reason I believe their will likely be the same of more unique chunks of a larger size than a lesser.

Experiment :

f = open('PI25K_DP.TXT', 'r')
pi_str = f.read().rstrip()

print(pi_str)
print('digits : {0}'.format(len(pi_str)))


def count_unique_chunks(chunk_size):

    set = {}
    for i, char in enumerate(pi_str):

        if i < len(pi_str) - chunk_size:
            chunk = pi_str[i:i + chunk_size]
            if chunk not in set:
                set[chunk] = True

    num_unique_chunks = len(set.keys())
    print('{0} | {1}'.format(chunk_size, num_unique_chunks))
    return num_unique_chunks


for i in xrange(64):
    count_unique_chunks(i)

You may find PI25K_DP.TXT here.

Results :

Chunk Size	Number of Unique Chunks
0	1
1	10
2	100
3	1000
4	9169
5	22170
6	24690
7	24957
8	24987
9	24991
10	24990
11	24989
12	24988
13	24987
14	24986
15	24985
16	24984
17	24983
18	24982
19	24981
20	24980
21	24979
22	24978
23	24977
24	24976
25	24975
26	24974
27	24973
28	24972
29	24971
30	24970
31	24969
32	24968
33	24967
34	24966
35	24965
36	24964
37	24963
38	24962
39	24961
40	24960
41	24959
42	24958
43	24957
44	24956
45	24955
46	24954
47	24953
48	24952
49	24951
50	24950
51	24949
52	24948
53	24947
54	24946
55	24945
56	24944
57	24943
58	24942
59	24941
60	24940
61	24939
62	24938
63	24937

from pifs.

georgy7 commented on August 11, 2024

@dreid93
25K of pi digits contains 24937 chunks of length = 63 decimal digits.
But how many chunks of this length can exist. Its 10^63.
Possibility of first 25K of pi contains a random 63 number sequence = 24937 / 10^63.

from pifs.

Wouldn't storing the index into π be as big (or bigger) than the original data? about pifs HOT 9 OPEN

Comments (9)

Experiment :

Results :

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent