Coder Social home page Coder Social logo

b-cancel / csharp_cantor_and_szudzik_pairing Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 54 KB

Map a sequence or set of (2->8) integrals into 1 unsigned integral and back. Your (2-8) Integrals can be Natural Numbers, Integers, or Rationals.

C# 100.00%
pairing pairing-functions cantor szudzik

csharp_cantor_and_szudzik_pairing's Introduction

CSharp_Cantor_and_Szudzik_Pairing

If this project helped you reduce developement time or you just want to help me continue making useful tools
Feel free to buy me a cup of coffee! :)
PayPal Donate Button

Overview

This kit was created so programmers could encode a SEQUENCE of numbers (n-tuple) into 1 by using the Szudzik Pairing function (a space saving version of the Cantor Pairing Function [explained below in "Cantor VS Szudik"]) with effective use of C# integral types.

Note that if you want to encode a SET of numbers you can still use this Kit but you would need to do some preliminary steps. Either you

  1. Encode the Set as a Sequence
    • by sorting your numbers [O(nlogn)]
  2. Encode every possible Sequence that can be created from this Set
    • by creating every possible sequence [O(2^n)]

Additionally, it expands the capability of both the Szudzik and Cantor Pairing function so you can

  1. combine up to 8 numbers in a sequence into 1 (even though these 2 originally only worked to encode 2 numbers)
    • using repetition [explained below]
  2. Encode Natural numbers, Integers, and Rational numbers into 1 (even though these 2 originally only worked to encode natural numbers)
    • using bijections between sets [explained below]

 

Documentation

Namespace:

  • pairingKit (at the beginning of your script type 'using pairingKit')

Public Classes

  • tupleBase (holds the base code used by _2tuple)
  • _2tuple [for 2 numbers of types (uint or int) or (ushort or short) or (byte or sbyte)]
  • _3tuple [for 3 numbers of types (ushort or short) or (byte or sbyte)]
  • _4tuple [for 4 numbers of types (ushort or short) or (byte or sbyte)]
  • _5tuple [for 5 numbers of types (byte or sbyte)]
  • _6tuple [for 6 numbers of types (byte or sbyte)]
  • _7tuple [for 7 numbers of types (byte or sbyte)]
  • _8tuple [for 8 numbers of types (byte or sbyte)]

Every _nTuple Public Class has many different versions of 2 functions

  • combine(a,b,...)
    • combines a sequence of numbers indicated by the class into 1 number
    • NOTE: the different versions are so that you can encode numbers of different Integral types with each other (within the limits explained below)
  • reverse(z)
    • returns the sequence of numbers indicated by the class from 1 number
    • NOTE: the sequence returned will be an array of the same unsigned type
      • in some cases you will have to do some unsigned to signed conversion OR some type casting to get back your original numbers (using tupleBase)

Additionally...

  • ALL (signed or unsigned) nTuples will map to ONE unsigned Integral
  • ALL unsigned Integrals will map to a unsigned nTuple of the largest size possible
    • so if your original mapping used either (1) signed numbers (2) different types, you will have to use casting to get back your exact numbers

 

Encoding and Decoding

Natural Numbers, Integers, and Rational Numbers

The Original Cantor and Szudzik Functions only map 2 Natural Numbers to another Natural Number. (Natural Numbers in this case numbers 0,1,2,3,...) [[N]] (original)

We can also Map 2 Integers to another Integer by simply converting the Integer into a Natural number given its type. So In other words if we want to convert an sbyte (with range -128 to 127) into a byte (with range 0 to 255) we simply add 128 to our sbyte and we are guaranteed to get something within byte range. (integers are numbers …,-2,-1,0,1,2,...) [[I]] (automatic)

Rational Numbers are numbers that can be formed by dividing 2 Integers. Instead of mapping 2 Rational Numbers we could simply map 4 Integers into 1. [[R]] (using _4tuple)

 

KIT EXPLANATION

Cantor VS Szudik

The Cantor Pairing Function maps 2 Natural Numbers to a Natural Number but it wasn't specifically created to work with computers. Consequently,

mapping 2 [8 bit] numbers requires 1 [32 bit] number

even though all possible combinations of 8 bit numbers sum to 65536 combinations

which is exactly the amount of different numbers a [16 bit] number can represent.

 

The Szudik Pairing Function is a modification of the Cantor Pairing function where

2 [8 bit] numbers can map to 1 [16 bit] number

2 [16 bit] numbers can map to 1 [32 bit] number

2 [32 bit] numbers can map to 1 [64 bit] number

and so on if a larger integral types existed (but I decided to not use BigInteger)

It was created to use space effectively.

 

Additionally, performance testing revealed that on Average Cantor Pairing and Szudzik Pairing takes the same amount of Ticks (using the C# "Stopwatch" Class from "System.Diagnostics")

On Average

  • mapping 2 numbers to 1 takes 3 ticks
  • mapping 1 number to 2 takes 30 ticks

For that reason even though there is an implementation of Cantor Pairing in the tupleBase class we use Szudzik Pairing for all the main _nTuple Public Classes.

 

More on this below

 

 

Why It's Useful

This is useful in many scenarios but for the sake of simplicity I will explain how It is useful in a specific scenario.

Scenario:

  • We have a space of N^K.
    • Note: a 3D space of 100 by 100 by 100… would be a space of 100^3 where N=100, and K=3
  • Every Location is a Sequence of Integers and has a set of data assigned to it.
  • We want to access this information given the location.

There are multiple solutions

  1. Use a Multi-Dimensional Array
  2. Use a nested dictionaries
  3. Use pairing functions multiple times
  4. feel free to suggest another...

As long as (N is small && K is small) Multi-dimensional Arrays work great because they are easy to use, and have constant access time. However when (N is large || K is large) the amount of memory required to store this Multi-dimensional Array becomes ridiculous.

If your N and K are both something ridiculous AND everything in those N^K locations is different… the there is not much you can do… you need a ridiculous amount of memory regardless.

However, if any of those mappings are the same then why would you have a structure with duplicate data or empty data slots? That is where you would use (2) or (3).

Scenario Extended:

  • some quantity of your data is repeated

Similar to Arrays, Nested Dictionaries have constant access time and are also easy to use. But they are great because even though your N and K are something ridiculous you don't need to have N^K different memory locations. You only need to have 1 copy of every piece of data that is different which is of course is a tremendous space saver. (assuming you only had ([N^K] / 2) different types of data… you only store that much data and not N^K chunks of data… where half of it is either empty or a duplicate)

However, note that I am giving you the K dimensional location that should map to a piece of data for that location but it is not a requirement to save that K dimensional location.

Scenario Extended Again:

  • don't store data you don't require

Once again, similar to previous solutions, we have ease of use and constant access time but now we can store "less data" by combining the K dimensional location into 1 number when we are given a new K dimensional location and then simply get what the 1 numbers maps to by using the pairingKit.

More Details Below in "Nested Dictionaries VS Pairing Kit"

 

Nested Dictionaries VS Pairing Kit

Nested Dictionaries are a better option If

  1. your sequence is longer than 8
  2. or if even 1 number in the sequence is larger than the kit allows for that size of sequence
    1. EX 1: you need a sequence of 8 uints or ints
    2. EX 2: you need a sequence of 2 ulongs

Pairing Kit is a better option otherwise because

  • nested Dictionaries can be a bigger hassle to use
  • nested Dictionaries would take up more space than just mapping all values into 1 and storing 1 Dictionary with that mapping (based on my limited knowledge)
    • Note that I believe this is the case for C# but it might not be for another language



How Encoding a Sequence of 2 to 8 Integrals Works

and Custom Encoding

Note that If I am encoding 2 [8 bit] numbers I require a [16 bit] number to store all possible combinations. This is because each 8 bit number can store 256 numbers (0 -> 255). And every possible combination of two [8 bit] numbers is 256*256 = 65,536 combinations which is exactly how many numbers a [16 bit] number can store (0 -> 65,535). So as mentioned previously we can see that...

  • to encode 2, [8 bit] numbers we need a [16 bit] number
  • to encode 2, [16 bit] numbers we need a [32 bit] number
  • to encode 2, [32 bit] numbers we need a [64 bit] number
  • and so on...

I avoided the use of BigInteger anywhere because:

  1. It would have made this already large simple kit even larger
  2. Technically (from my limited understanding) BigInteger can store arbitrarily large numbers without losing precision so the users of this kit might use it and not realize that in their particular scenario using nested dictionaries would work better

Since I am not using BigInteger, the larger the sequence, the smaller each number in the sequence must be. This is because the largest number I am allowing all the 2->8 numbers to encode into is a ulong which can store a max of 2^64 possibilities.

I illustrate this visually below...

                                ulong                                      64 bits

               uint               |                uint                    64 bits = 32 bit + 32 bits

  ushort      |    ushort    |    ushort     |    ushort         64 bits = 16 bits + 16 bits + 16 bits + 16 bits

byte | byte | byte | byte | byte | byte | byte | byte     64 bits = 8 bits * 8

 

Since our largest number after the encoding must fit into 64 bits.

  • To combine 2 numbers each number
    • must be of type: (uint or int) or (ushort or short) or (byte or sbyte)
    • not of type: (ulong or long)
  • To combine 3 numbers each number
    • must be of type: (ushort or short) or (byte or sbyte)
    • not of type: (uint or int) or (ulong or long)
  • To combine 4 numbers each number
    • must be of type: (ushort or short) or (byte or sbyte)
    • not of type: (uint or int) or (ulong or long)
  • To combine 5 numbers each number
    • must be of type: (byte or sbyte)
    • not of type: (uint or int) or (ushort or short) or(ulong or long)
  • To combine 6 numbers each number
    • must be of type: (byte or sbyte)
    • not of type: (uint or int) or (ushort or short) or(ulong or long)
  • To combine 7 numbers each number
    • must be of type: (byte or sbyte)
    • not of type: (uint or int) or (ushort or short) or(ulong or long)
  • To combine 8 numbers each number
    • must be of type: (byte or sbyte)
    • not of type: (uint or int) or (ushort or short) or(ulong or long)

Note that these functions provide what is needed for most scenarios but you can make your own custom functions to handle all kinds of weird scenarios.

For example _4tuple(uint a, byte b, byte c, byte d). For this to work you have to keep in mind how the mappings work. First you can map 'c' and 'd' into a ushort, then you can map 'b' and 'cd' into a uint, then you map 'a' 'bcd' into a ulong. You cannot however map 'a' with 'b' first because 'ab' will be a ulong and a ulong and another other type paired together would require BigInteger.

Additionally, you will have to make a reverse method that unwraps the numbers in the reverse way in which you combined them.

What Is Tested

  • all automatic casting
    • between integrals of different size
    • between signed and unsigned integrals
  • 2 number pairings using sbyte and byte types

 

Implications

Given that everything is based on 2 pairing, unless I made a typo (which is unlikely since I checked the functions thoroughly) everything should work. 

In Either case, encoding random numbers and decoding them and checking if the result is the same as the original given you specific types and such is enough to confirm that it all works.

csharp_cantor_and_szudzik_pairing's People

Contributors

b-cancel avatar

Watchers

 avatar  avatar

csharp_cantor_and_szudzik_pairing's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.