Coder Social home page Coder Social logo

ksuid's Introduction

ksuid Go Report Card GoDoc Circle CI

ksuid is an efficient, comprehensive, battle-tested Go library for generating and parsing a specific kind of globally unique identifier called a KSUID. This library serves as its reference implementation.

Install

go get -u github.com/segmentio/ksuid

What is a KSUID?

KSUID is for K-Sortable Unique IDentifier. It is a kind of globally unique identifier similar to a RFC 4122 UUID, built from the ground-up to be "naturally" sorted by generation timestamp without any special type-aware logic.

In short, running a set of KSUIDs through the UNIX sort command will result in a list ordered by generation time.

Why use KSUIDs?

There are numerous methods for generating unique identifiers, so why KSUID?

  1. Naturally ordered by generation time
  2. Collision-free, coordination-free, dependency-free
  3. Highly portable representations

Even if only one of these properties are important to you, KSUID is a great choice! :) Many projects chose to use KSUIDs just because the text representation is copy-and-paste friendly.

For a follow up read on the topic: A brief history of UUID

1. Naturally Ordered By Generation Time

Unlike the more ubiquitous UUIDv4, a KSUID contains a timestamp component that allows them to be loosely sorted by generation time. This is not a strong guarantee (an invariant) as it depends on wall clocks, but is still incredibly useful in practice. Both the binary and text representations will sort by creation time without any special sorting logic.

2. Collision-free, Coordination-free, Dependency-free

While RFC 4122 UUIDv1s do include a time component, there aren't enough bytes of randomness to provide strong protection against collisions (duplicates). With such a low amount of entropy, it is feasible for a malicious party to guess generated IDs, creating a problem for systems whose security is, implicitly or explicitly, sensitive to an adversary guessing identifiers.

To fit into a 64-bit number space, Snowflake IDs and its derivatives require coordination to avoid collisions, which significantly increases the deployment complexity and operational burden.

A KSUID includes 128 bits of pseudorandom data ("entropy"). This number space is 64 times larger than the 122 bits used by the well-accepted RFC 4122 UUIDv4 standard. The additional timestamp component can be considered "bonus entropy" which further decreases the probability of collisions, to the point of physical infeasibility in any practical implementation.

3. Highly Portable Representations

The text and binary representations are lexicographically sortable, which allows them to be dropped into systems which do not natively support KSUIDs and retain their time-ordered property.

The text representation is an alphanumeric base62 encoding, so it "fits" anywhere alphanumeric strings are accepted. No delimiters are used, so stringified KSUIDs won't be inadvertently truncated or tokenized when interpreted by software that is designed for human-readable text, a common problem for the text representation of RFC 4122 UUIDs.

How do KSUIDs work?

Binary KSUIDs are 20-bytes: a 32-bit unsigned integer UTC timestamp and a 128-bit randomly generated payload. The timestamp uses big-endian encoding, to support lexicographic sorting. The timestamp epoch is adjusted to May 13th, 2014, providing over 100 years of life. The payload is generated by a cryptographically-strong pseudorandom number generator.

The text representation is always 27 characters, encoded in alphanumeric base62 that will lexicographically sort by timestamp.

High Performance

This library is designed to be used in code paths that are performance critical. Its code has been tuned to eliminate all non-essential overhead. The KSUID type is derived from a fixed-size array, which eliminates the additional reference chasing and allocation involved in a variable-width type.

The API provides an interface for use in code paths which are sensitive to allocation. For example, the Append method can be used to parse the text representation and replace the contents of a KSUID value without additional heap allocation.

All public package level "pure" functions are concurrency-safe, protected by a global mutex. For hot loops that generate a large amount of KSUIDs from a single Goroutine, the Sequence type is provided to elide the potential contention.

By default, out of an abundance of caution, the cryptographically-secure PRNG is used to generate the random bits of a KSUID. This can be relaxed in extremely performance-critical code using the included FastRander type. FastRander uses the standard PRNG with a seed generated by the cryptographically-secure PRNG.

NOTE: While there is no evidence that FastRander will increase the probability of a collision, it shouldn't be used in scenarios where uniqueness is important to security, as there is an increased chance the generated IDs can be predicted by an adversary.

Battle Tested

This code has been used in production at Segment for several years, across a diverse array of projects. Trillions upon trillions of KSUIDs have been generated in some of Segment's most performance-critical, large-scale distributed systems.

Plays Well With Others

Designed to be integrated with other libraries, the KSUID type implements many standard library interfaces, including:

  • Stringer
  • database/sql.Scanner and database/sql/driver.Valuer
  • encoding.BinaryMarshal and encoding.BinaryUnmarshal
  • encoding.TextMarshal and encoding.TextUnmarshal (encoding/json friendly!)

Command Line Tool

This package comes with a command-line tool ksuid, useful for generating KSUIDs as well as inspecting the internal components of existing KSUIDs. Machine-friendly output is provided for scripting use cases.

Given a Go build environment, it can be installed with the command:

$ go install github.com/segmentio/ksuid/cmd/ksuid

CLI Usage Examples

Generate a KSUID

$ ksuid
0ujsswThIGTUYm2K8FjOOfXtY1K

Generate 4 KSUIDs

$ ksuid -n 4
0ujsszwN8NRY24YaXiTIE2VWDTS
0ujsswThIGTUYm2K8FjOOfXtY1K
0ujssxh0cECutqzMgbtXSGnjorm
0ujsszgFvbiEr7CDgE3z8MAUPFt

Inspect the components of a KSUID

$ ksuid -f inspect 0ujtsYcgvSTl8PAuAdqWYSMnLOv

REPRESENTATION:

  String: 0ujtsYcgvSTl8PAuAdqWYSMnLOv
     Raw: 0669F7EFB5A1CD34B5F99D1154FB6853345C9735

COMPONENTS:

       Time: 2017-10-09 21:00:47 -0700 PDT
  Timestamp: 107608047
    Payload: B5A1CD34B5F99D1154FB6853345C9735

Generate a KSUID and inspect its components

$ ksuid -f inspect

REPRESENTATION:

  String: 0ujzPyRiIAffKhBux4PvQdDqMHY
     Raw: 066A029C73FC1AA3B2446246D6E89FCD909E8FE8

COMPONENTS:

       Time: 2017-10-09 21:46:20 -0700 PDT
  Timestamp: 107610780
    Payload: 73FC1AA3B2446246D6E89FCD909E8FE8

Inspect a KSUID with template formatted inspection output

$ ksuid -f template -t '{{ .Time }}: {{ .Payload }}' 0ujtsYcgvSTl8PAuAdqWYSMnLOv
2017-10-09 21:00:47 -0700 PDT: B5A1CD34B5F99D1154FB6853345C9735

Inspect multiple KSUIDs with template formatted output

$ ksuid -f template -t '{{ .Time }}: {{ .Payload }}' $(ksuid -n 4)
2017-10-09 21:05:37 -0700 PDT: 304102BC687E087CC3A811F21D113CCF
2017-10-09 21:05:37 -0700 PDT: EAF0B240A9BFA55E079D887120D962F0
2017-10-09 21:05:37 -0700 PDT: DF0761769909ABB0C7BB9D66F79FC041
2017-10-09 21:05:37 -0700 PDT: 1A8F0E3D0BDEB84A5FAD702876F46543

Generate KSUIDs and output JSON using template formatting

$ ksuid -f template -t '{ "timestamp": "{{ .Timestamp }}", "payload": "{{ .Payload }}", "ksuid": "{{.String}}"}' -n 4
{ "timestamp": "107611700", "payload": "9850EEEC191BF4FF26F99315CE43B0C8", "ksuid": "0uk1Hbc9dQ9pxyTqJ93IUrfhdGq"}
{ "timestamp": "107611700", "payload": "CC55072555316F45B8CA2D2979D3ED0A", "ksuid": "0uk1HdCJ6hUZKDgcxhpJwUl5ZEI"}
{ "timestamp": "107611700", "payload": "BA1C205D6177F0992D15EE606AE32238", "ksuid": "0uk1HcdvF0p8C20KtTfdRSB9XIm"}
{ "timestamp": "107611700", "payload": "67517BA309EA62AE7991B27BB6F2FCAC", "ksuid": "0uk1Ha7hGJ1Q9Xbnkt0yZgNwg3g"}

OrNil functions

There are times when you are sure your ksuid is correct. But you need to get it from bytes or string and pass it it's to the structure. For this, there are OrNil functions that return ksuid.Nil on error and can be called directly in the structure.

Functions:

  • ParseOrNil()
  • FromPartsOrNil()
  • FromBytesOrNil()

An example of using the function without OrNil:

func getPosts(before, after []byte) {
	b, err := ksuid.FromBytes(before)
	if err != nil {
		// handle error
	}

	a, err := ksuid.FromBytes(after)
	if err != nil {
		// handle error
	}

	sortOptions := SortOptions{Before: b, After: a}
}

It is much more convenient to do it like this:

func getPosts(before, after []byte) {
	sortOptions := SortOptions{
		Before: ksuid.FromBytesOrNil(before),
		After:  ksuid.FromBytesOrNil(after),
	}
}

OrNil functions are also used in many other libraries:

Implementations for other languages

License

ksuid source code is available under an MIT License.

ksuid's People

Contributors

achille-roussel avatar aerostitch avatar alphawong avatar anthonyfok avatar codelingoteam avatar deleplace avatar eilgin avatar fabiolimace avatar gearnode avatar mbyczkowski avatar nghialt avatar olivoil avatar pilatuz avatar rbranson avatar sanva avatar steve-warren avatar tasn avatar timonwong avatar tmthrgd avatar toffaletti avatar v1def avatar wdbetts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ksuid's Issues

KSUID from string?

Is it possible to get KSUID object back from string representation (KSUID.String())?

KSUID order not predictable

Hey there

The README states this:

In short, running a set of KSUIDs through the UNIX sort command will result in a list ordered by generation time.

I have the following code:

func TestKsuidOrder(t *testing.T) {
	k1 := ksuid.New()
	k2 := ksuid.New()
	assert.Equal(t, -1, ksuid.Compare(k1, k2))
}

Running it like this go test ./ -v -count=100 -run=TestKsuidOrder results in

    ksuid_test.go:31: 
                Error Trace:    ./ksuid_test.go:31
                Error:          Not equal: 
                                expected: -1
                                actual  : 1
                Test:           TestKsuidOrder

I was expecting that ksuid.New is generating sequences which can be naturally ordered?

If this is the expected behaviour which I see - then I guess the README should be more clear about that.

Is it possible to generate KSUID from SQL?

Hi,
To set DEFAULT value for an PK of table, I wonder if has anyone tried to generate a valid ksuid value by using SQL.

For example for UUIDv4, we can generate default value if uuid-ossp extension is not installed for PostrgreSQL as:

`CREATE TABLE IF NOT EXISTS table (
   id VARCHAR(36) DEFAULT md5(random()::text || clock_timestamp()::text)::uuid  NOT NULL,
....

Of course it'll be better to generate from application but, I it makes simpler to add default value for manual interactions with tables.

Has anyone tried to develop an SQL function to geneate ksuid?

Min/Max KSUID private

Making min/max public would simplify use cases where list of documents is queried by ksuid from given time range.

Passing a timestamp before the epoch to FromParts does not return an error

It seems that it wraps around rather than returning an error.

https://go.dev/play/p/s2aA3-ERd-E

Uses timestamp 2000-01-01T00:00:00.00Z and zeros for the payload.
Returns: WfeXVK2UN8Qh86bSI6R4zNvC7ge

$ ksuid -f inspect WfeXVK2UN8Qh86bSI6R4zNvC7ge

REPRESENTATION:

  String: WfeXVK2UN8Qh86bSI6R4zNvC7ge
     Raw: E4FAF58000000000000000000000000000000000

COMPONENTS:

       Time: 2136-02-07 01:28:16 -0500 EST
  Timestamp: 3841652096
    Payload: 00000000000000000000000000000000

Add go modules compatible tags

Could you please guys add new tags or retag the existing ones to be compatible with vgo/go modules?

Instead of 1.0.0 it should be v1.0.0, same for 1.0.1 -> v1.0.1.

What happens after 100 years?

Readme says that the timestamp provides over 100 years of life. What happens after 100 years? Does it wrap and use the same timestamp around 100 years ago? Will it break index logicality or other things?
Moving 4 bits from the random part to the timestamp would make it 2000 years and wouldn't change much the library guarantees so I wonder why not do it?

[Request] Add Go mod support

This makes the list of dependencies clearer even if none exists.

It also prevents the +incompatible suffix that is added in downstream projects dependent on this.

ksuid.Parse sometimes panics instead of returning an error

When parsing a 27-long string that is not a valid ksuid, it will panic instead of returning an error. For instance ksuid.Parse("aaaaaaaaaaaaaaaaaaaaaaaaaaa") will panic.

The signature func Parse(s string) (KSUID, error) seems to imply that any string that can't successfully be parsed should return an error, and not panic. I think that should be the expected behavior at least.

Maybe the fix would be to update Parse or fastDecodeBase62 to not panic? In any case, if there's a valid reason for wanting it to panic, the documentation should be updated so it doesn't come as a surprise.

Does ksuid support for Postgres

I'm very exciting about ksuid. But when I want create a new table on postgres with primary key is ksuid. So what kind of data type should I set for ksuid ? How about performance of it in big query ?
Beause with UUID, it's supported by data type is uuid.
Thank you so much

Differences with MongoDB ObjectID

I want to understand the differences and advantages of using KSUID over MongoDB ObjectID. The ObjectID is constructed with first 4 bytes as timestamp which provides a very similar behaviour of being able to sort by _id field.

Or the use case for this library a similar tool but for non-mongodb environments and general usage?

Discussion: Replace rander with factory to allow pooling instances

The idea is to avoid the lock in NewRandomWithTime by allowing instances of the rander to be pooled perhaps in a sync.Pool instance.

I don't see any particular throughput issues but highly contended use could be affected by the global lock.

I could very well be missing some other way to instantiate ids but afaik the New method should be peoples starting point right?

Go's default random number generators are not safe for concurrent use?

Hi, this is an observation, not a bug.

ksuid.go says

Go's default random number generators are not safe for concurrent use by multiple goroutines, the use of the rander and randBuffer are explicitly synchronized here.

Thus it proceeds to lock its own randMutex.

I have the intuition that crypto/rand.Reader would be broken if it wasn't safe for concurrent use, and that actually it is safe. This seems confirmed by the implementations in rand_unix.go and rand_windows.go which already use their own mutex.

I was curious if removing randMutex would improve the benchmark numbers, but it doesn't.

Still, randMutex might be unnecessary.

KSUIDs do not sort properly.

On the Linux command line and in PostgreSQL, ksuids do not sort properly.

  1. 2SmasRGkif9qETAHwssB4GGYwdi can be converted to the timestamp 2023-07-19 04:13:36 -0400 EDT
  2. 2STcKMQ7MGgoDxpX9A5WuaJlx8V can be converted to the timestamp 2023-07-12 10:59:06 -0400 EDT

However, PostgreSQL and the Linux sort command both sort 2SmasRGkif9qETAHwssB4GGYwdi lower than 2STcKMQ7MGgoDxpX9A5WuaJlx8V, because evidently, all lower-case letters sort before capital letters.

I'm assuming that the ksuid algorithm uses the logic that capital letters have a lower ASCII value:

Python 3.11.2 (main, May 30 2023, 17:45:26) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> ord('A')
65
>>> ord('a')
97
>>> 

I love the project and use ksuids as much as possible, but this issue has bitten me at work and been the cause of some subtle bugs.
Is it impossible to fix in a backwards-compatible way?

KSUID is not nanosecond-aware

The go source code shows that a wall time for time.Time is in fact 8 bytes, a uint64. Unlike how it's defined in this library to be 4 bytes, a uint32. This means we're throwing out the nanoseconds when generating the timestamps. Is there a particular reason for this? This breaks ordering for me where items are made within the same second.

Parse accepts argument containing non-alphanumeric characters

Parse is happy to accept some arguments that contain non-alphanumeric characters.

Here's an example:

$ go run ./cmd/ksuid 0??????????????????????????
022222222222222222222222222

Is this by design? (I was expecting an error for an argument that isn't valid base62.)

cmd template should use text/template to avoid escaping

The ksuid command uses html/template here instead of text/template. html/template escapes the output and corrupts it.

This is the result of running the ksuid command:

$ ksuid -f template -t '{{ .Time }}: {{ .Payload }}' $(ksuid)
2018-10-17 00:04:57 +1030 ACDT: 956449DBD1047A4E0F7963973073EEF2

+ should be +.

Add Support for Configurable Future Epoch Stamp for Generating Decreasing KSUIDs

Background

Currently, the ksuid library generates KSUIDs that are naturally ordered by generation time. While this ordering is suitable for many use cases, there are scenarios where it would be advantageous to generate KSUIDs in decreasing order, with the latest keys appearing first. This is particularly valuable in situations where managing a large volume of data is involved, and there's a significant performance penalty for using databases that do not support descending indexes or where the order by operation is resource-intensive. By enabling the generation of decreasing KSUIDs, users can optimise database operations and enhance overall system performance, especially in scenarios with substantial data processing requirements.

Proposal

I propose extending the ksuid library to add support for configuring a future epoch stamp, which would result in generating decreasing KSUIDs based on generation time. By allowing users to specify a future epoch stamp or just being hardcoded to e.g. 2160, the library would subtract this value from the current Unix timestamp during KSUID generation, effectively producing KSUIDs that decrease over time.

Benefits

  • Allows users to generate KSUIDs that are naturally ordered in decreasing order based on generation time.
  • Enhances the flexibility and usability of the ksuid library for a wider range of use cases.
    Additional Considerations:

I would be delighted to contribute the functionality for generating KSUIDs in decreasing order to the library. Currently, to achieve this, we would need to maintain a fork of the library. Adding this feature directly to the main library would hopefully help a few folks out there.

Would you be interested in incorporating this feature into the library?

Analysis of hash distribution of ksuids.

Has any analysis of common hash distribution of ksuid values been done? There exists use cases where you want the time ordering for client side uses of ids but on the server side use the ids for distributed storage in a cluster data store. In this case a typical strategy is to hash the ids and use that to determine a shard within the datastore for that entity.

Does the timestamp prefixing cause hashes of ksuids to cluster under common hash functions, e.g. MD5, various FNV or SHA algorithms etc?

Customizable epochStamp for KSUIDs Generated from Past Dates

I'm working on a project that involves migrating records older than 10 years. My plan was to use KSUIDs with the original creation dates of these records. However, the hardcoded epochStamp starting in 2017 prevents this. I relied on the internal KSUID timestamp to query specific database partitions, which I can't do now due to this limitation (I could still do it but one partition where old records fall would be more populated that the rest).

Would it be possible to allow customization of epochStamp through a PR? If not, should I reconsider using KSUIDs for historical records?

Wrong epoch date in README?

Maybe I'm misunderstanding something, but, isn't the date in README wrong?

I mean, where it says

The timestamp epoch is adjusted to March 5th, 2014 [...]

, shouldn't it be May 13th, 2014 ? Given that

go run ./ksuid -f inspect 000000000000000000000000000

outputs

REPRESENTATION:

  String: 000000000000000000000000000
     Raw: 0000000000000000000000000000000000000000

COMPONENTS:

       Time: 2014-05-13 18:53:20 +0200 CEST
  Timestamp: 0
    Payload: 00000000000000000000000000000000

SemVer releases

Now that the Go ecosystem started to lean towards using semantic versioning, using git tags for this package might help people utilizing the new way in their applications. Also, gives the impression that this package is production ready.

What do you think?

Support for gin query param unmarshaling?

Gin just added a new interface called BindUnmarshaler that allows types to define a UnmarshalParam method which lets them be used as query parameters.

Implementing this would allow binding KSUID API query params like this:

type queryParams struct {
  ID KSUID `form:"id"`
}

route.GET("/test", func(ctx *gin.Context) {
  var params queryParams{}
  _ = ctx.BindQuery(&params)
 entity, err := GetEntity(params.ID)
 ...
})

I think all that is necessary is adding this method:

func (i *KSUID) UnmarshalParam(s string) error {
	return i.UnmarshalText([]byte(s))
}

I'm happy to open a PR for it if that would be helpful.

Is KSUID safe to use as a Login Session Id

The readme says "KSUIDs use 128-bits of pseudorandom data" I am wondering why PRNG and not a TRNG? My actual question is would KSUID be safe to use as a Login Session Id?

Add CI

There's no CI on this repository, it would be wise and safe to add it.

GUID comparison

Greetings,

I am compiling a GUID summary sheet which tries to aggregate different dimensions with the intent to help engineers pick the right generator.

Would you be kind enough to help me validate the KSUID column and point to any issues?

Thank you

image

[Question]: I know that is not possible to use more than 16 bit payload, but what's the reason?

If there's no technical problem could make it configurable? Like:

// It will return a configurable instance.
instance := ksuid.NewInstance(ksuid.Config{
   MaxPayloadLength: 32,
})

And if possible a method to replace the global instance like golang zap:

instance := ksuid.NewInstance(ksuid.Config{
   MaxPayloadLength: 32,
})

// instance will create a 32 bit payload while ksuid.New() will create a 16 bit payload.
// to avoid that we could do something like:

undo := ksuid.ReplaceGlobals(instance)
defer undo()

// here the ksuid and instance are both configurable in the same way so I could call 
// ksuid.New() or instance.New() and both will create a 32 bit payload.

Unable to use a zero value KSUID with a sql driver/database

I'd like to pass all 0s to sql for querying, for example:

SELECT *
FROM users
WHERE id > '000000000000000000000000000';

This is not possible currently because this library treats a zero value ksuid as nil: https://github.com/segmentio/ksuid/blob/master/ksuid.go#L139-L144

func (i KSUID) Value() (driver.Value, error) {
	if i.IsNil() {
		return nil, nil
	}
	return i.String(), nil
}

Is there something I'm missing on how to pass 0s to sql or is it not possible? I'd like to avoid having to manage my own type which simply shadows the Value func if possible.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.