CSharp_Cantor_and_Szudzik_Pairing

If this project helped you reduce developement time or you just want to help me continue making useful tools

Feel free to buy me a cup of coffee! :)

Overview

This kit was created so programmers could encode a SEQUENCE of numbers (n-tuple) into 1 by using the Szudzik Pairing function (a space saving version of the Cantor Pairing Function [explained below in "Cantor VS Szudik"]) with effective use of C# integral types.

Note that if you want to encode a SET of numbers you can still use this Kit but you would need to do some preliminary steps. Either you

Encode the Set as a Sequence

by sorting your numbers [O(nlogn)]

Encode every possible Sequence that can be created from this Set

by creating every possible sequence [O(2^n)]

Additionally, it expands the capability of both the Szudzik and Cantor Pairing function so you can

combine up to 8 numbers in a sequence into 1 (even though these 2 originally only worked to encode 2 numbers)

using repetition [explained below]

Encode Natural numbers, Integers, and Rational numbers into 1 (even though these 2 originally only worked to encode natural numbers)

using bijections between sets [explained below]

Documentation

Namespace:

pairingKit (at the beginning of your script type 'using pairingKit')

Public Classes

tupleBase (holds the base code used by _2tuple)
_2tuple [for 2 numbers of types (uint or int) or (ushort or short) or (byte or sbyte)]
_3tuple [for 3 numbers of types (ushort or short) or (byte or sbyte)]
_4tuple [for 4 numbers of types (ushort or short) or (byte or sbyte)]
_5tuple [for 5 numbers of types (byte or sbyte)]
_6tuple [for 6 numbers of types (byte or sbyte)]
_7tuple [for 7 numbers of types (byte or sbyte)]
_8tuple [for 8 numbers of types (byte or sbyte)]

Every _nTuple Public Class has many different versions of 2 functions

combine(a,b,...)

combines a sequence of numbers indicated by the class into 1 number
NOTE: the different versions are so that you can encode numbers of different Integral types with each other (within the limits explained below)

reverse(z)

returns the sequence of numbers indicated by the class from 1 number
NOTE: the sequence returned will be an array of the same unsigned type

in some cases you will have to do some unsigned to signed conversion OR some type casting to get back your original numbers (using tupleBase)

Additionally...

ALL (signed or unsigned) nTuples will map to ONE unsigned Integral
ALL unsigned Integrals will map to a unsigned nTuple of the largest size possible

so if your original mapping used either (1) signed numbers (2) different types, you will have to use casting to get back your exact numbers

Encoding and Decoding

Natural Numbers, Integers, and Rational Numbers

The Original Cantor and Szudzik Functions only map 2 Natural Numbers to another Natural Number. (Natural Numbers in this case numbers 0,1,2,3,...) [[N]] (original)

We can also Map 2 Integers to another Integer by simply converting the Integer into a Natural number given its type. So In other words if we want to convert an sbyte (with range -128 to 127) into a byte (with range 0 to 255) we simply add 128 to our sbyte and we are guaranteed to get something within byte range. (integers are numbers …,-2,-1,0,1,2,...) [[I]] (automatic)

Rational Numbers are numbers that can be formed by dividing 2 Integers. Instead of mapping 2 Rational Numbers we could simply map 4 Integers into 1. [[R]] (using _4tuple)

KIT EXPLANATION

Cantor VS Szudik

The Cantor Pairing Function maps 2 Natural Numbers to a Natural Number but it wasn't specifically created to work with computers. Consequently,

mapping 2 [8 bit] numbers requires 1 [32 bit] number

even though all possible combinations of 8 bit numbers sum to 65536 combinations

which is exactly the amount of different numbers a [16 bit] number can represent.

The Szudik Pairing Function is a modification of the Cantor Pairing function where

2 [8 bit] numbers can map to 1 [16 bit] number

2 [16 bit] numbers can map to 1 [32 bit] number

2 [32 bit] numbers can map to 1 [64 bit] number

and so on if a larger integral types existed (but I decided to not use BigInteger)

It was created to use space effectively.

Additionally, performance testing revealed that on Average Cantor Pairing and Szudzik Pairing takes the same amount of Ticks (using the C# "Stopwatch" Class from "System.Diagnostics")

On Average

mapping 2 numbers to 1 takes 3 ticks
mapping 1 number to 2 takes 30 ticks

For that reason even though there is an implementation of Cantor Pairing in the tupleBase class we use Szudzik Pairing for all the main _nTuple Public Classes.

Why It's Useful

This is useful in many scenarios but for the sake of simplicity I will explain how It is useful in a specific scenario.

Scenario:

We have a space of N^K.

Note: a 3D space of 100 by 100 by 100… would be a space of 100^3 where N=100, and K=3

Every Location is a Sequence of Integers and has a set of data assigned to it.
We want to access this information given the location.

There are multiple solutions

Use a Multi-Dimensional Array
Use a nested dictionaries
Use pairing functions multiple times
feel free to suggest another...

As long as (N is small && K is small) Multi-dimensional Arrays work great because they are easy to use, and have constant access time. However when (N is large || K is large) the amount of memory required to store this Multi-dimensional Array becomes ridiculous.

If your N and K are both something ridiculous AND everything in those N^K locations is different… the there is not much you can do… you need a ridiculous amount of memory regardless.

However, if any of those mappings are the same then why would you have a structure with duplicate data or empty data slots? That is where you would use (2) or (3).

Scenario Extended:

some quantity of your data is repeated

Similar to Arrays, Nested Dictionaries have constant access time and are also easy to use. But they are great because even though your N and K are something ridiculous you don't need to have N^K different memory locations. You only need to have 1 copy of every piece of data that is different which is of course is a tremendous space saver. (assuming you only had ([N^K] / 2) different types of data… you only store that much data and not N^K chunks of data… where half of it is either empty or a duplicate)

However, note that I am giving you the K dimensional location that should map to a piece of data for that location but it is not a requirement to save that K dimensional location.

Scenario Extended Again:

don't store data you don't require

Once again, similar to previous solutions, we have ease of use and constant access time but now we can store "less data" by combining the K dimensional location into 1 number when we are given a new K dimensional location and then simply get what the 1 numbers maps to by using the pairingKit.

More Details Below in "Nested Dictionaries VS Pairing Kit"

Nested Dictionaries VS Pairing Kit

Nested Dictionaries are a better option If

your sequence is longer than 8
or if even 1 number in the sequence is larger than the kit allows for that size of sequence

EX 1: you need a sequence of 8 uints or ints
EX 2: you need a sequence of 2 ulongs

Pairing Kit is a better option otherwise because

nested Dictionaries can be a bigger hassle to use
nested Dictionaries would take up more space than just mapping all values into 1 and storing 1 Dictionary with that mapping (based on my limited knowledge)

Note that I believe this is the case for C# but it might not be for another language

How Encoding a Sequence of 2 to 8 Integrals Works

and Custom Encoding

Note that If I am encoding 2 [8 bit] numbers I require a [16 bit] number to store all possible combinations. This is because each 8 bit number can store 256 numbers (0 -> 255). And every possible combination of two [8 bit] numbers is 256*256 = 65,536 combinations which is exactly how many numbers a [16 bit] number can store (0 -> 65,535). So as mentioned previously we can see that...

to encode 2, [8 bit] numbers we need a [16 bit] number
to encode 2, [16 bit] numbers we need a [32 bit] number
to encode 2, [32 bit] numbers we need a [64 bit] number
and so on...

I avoided the use of BigInteger anywhere because:

It would have made this already large simple kit even larger
Technically (from my limited understanding) BigInteger can store arbitrarily large numbers without losing precision so the users of this kit might use it and not realize that in their particular scenario using nested dictionaries would work better

Since I am not using BigInteger, the larger the sequence, the smaller each number in the sequence must be. This is because the largest number I am allowing all the 2->8 numbers to encode into is a ulong which can store a max of 2^64 possibilities.

I illustrate this visually below...

ulong 64 bits

uint | uint 64 bits = 32 bit + 32 bits

ushort | ushort | ushort | ushort 64 bits = 16 bits + 16 bits + 16 bits + 16 bits

Since our largest number after the encoding must fit into 64 bits.

To combine 2 numbers each number

must be of type: (uint or int) or (ushort or short) or (byte or sbyte)
not of type: (ulong or long)

To combine 3 numbers each number

must be of type: (ushort or short) or (byte or sbyte)
not of type: (uint or int) or (ulong or long)

To combine 4 numbers each number

must be of type: (ushort or short) or (byte or sbyte)
not of type: (uint or int) or (ulong or long)

To combine 5 numbers each number

must be of type: (byte or sbyte)
not of type: (uint or int) or (ushort or short) or(ulong or long)

To combine 6 numbers each number

must be of type: (byte or sbyte)
not of type: (uint or int) or (ushort or short) or(ulong or long)

To combine 7 numbers each number

must be of type: (byte or sbyte)
not of type: (uint or int) or (ushort or short) or(ulong or long)

To combine 8 numbers each number

must be of type: (byte or sbyte)
not of type: (uint or int) or (ushort or short) or(ulong or long)

Note that these functions provide what is needed for most scenarios but you can make your own custom functions to handle all kinds of weird scenarios.

For example _4tuple(uint a, byte b, byte c, byte d). For this to work you have to keep in mind how the mappings work. First you can map 'c' and 'd' into a ushort, then you can map 'b' and 'cd' into a uint, then you map 'a' 'bcd' into a ulong. You cannot however map 'a' with 'b' first because 'ab' will be a ulong and a ulong and another other type paired together would require BigInteger.

Additionally, you will have to make a reverse method that unwraps the numbers in the reverse way in which you combined them.

What Is Tested

all automatic casting

between integrals of different size
between signed and unsigned integrals

2 number pairings using sbyte and byte types

Implications

Given that everything is based on 2 pairing, unless I made a typo (which is unlikely since I checked the functions thoroughly) everything should work.

In Either case, encoding random numbers and decoding them and checking if the result is the same as the original given you specific types and such is enough to confirm that it all works.

b-cancel / csharp_cantor_and_szudzik_pairing Goto Github PK