Coder Social home page Coder Social logo

min-know's Introduction

Min-know

An implementation of the ERC-time-ordered-distributable-database (TODD) as a generic library. It can be used to make data TODD-compliant to facilitate peer-to-peer distribution.

Status: prototype

Why does this library exist?

To test out a new database design, where user participation makes the entire database more available.

Questions for you:

  • Do you have data that grows over time and that you would like users to host?
  • Are you providing data as a public good and are wondering how to wean to community?

Min-know makes data into an append-only structure that anyone can publish to. Distribution happens like a print publication where users obtain Volumes as they are released. A user becomes a distributer too.

Volumes contain Chapters that can be obtained separately. This effectively divides the database, making large databases manageable for resource-constrained users.

Principles

๐Ÿ“˜๐Ÿ”๐ŸŸ

To make any database TODD-compliant so that data-users become data-providers.

TODD-compliance is about:

  1. Delivering a user the minimum knowledge that is useful to them.
  2. Delivering a user some extra data.
  3. Making it easy for a user to become a data provider for the next user.

A minnow is a small fish ๐ŸŸ that can be part of a larger collective.

End Users

Data is published in Volumes.

๐Ÿ“˜ - A Volume

Volumes are added over time:

๐Ÿ“˜ ๐Ÿ“˜ ๐Ÿ“˜ ๐Ÿ“˜ ๐Ÿ“˜ ... ๐Ÿ“˜ <--- ๐Ÿ“˜ - All Volumes (published so far).

Volumes have Chapters for specific content. Chapters can be obtained individually.

  • ๐Ÿ“˜ An example volume with 256 Chapters
    • ๐Ÿ“• 0x00 First chapter (1st)
    • ...
    • ...
    • ๐Ÿ“™ 0xff Last Chapter (256th)

A Manifest ๐Ÿ“œ exists that lists all Chapters for all Volumes. A manifest simple contains IPFS hashes for data (see example manifests). A user can check the manifest and find which Chapter is right for them. They can ignore the IPFS hashes that don't match their needs.

๐Ÿ“œ๐Ÿ”๐ŸŸ

The user starts with something they know (a key), for example, an address. For every key, only one Chapter will be important.

  • User (๐ŸŸ) key is an address: 0xf154...f00d.
  • Data is divided into chapters using the first two characters of address (Chapter = 0xf1)

Visually:

  • ๐Ÿ“• 0x00
  • ...
  • ...
  • ๐Ÿ“— 0xf1 <--- ๐ŸŸ 0xf154...f00d (user only needs this Chapter)
  • ...
  • ...
  • ๐Ÿ“™ 0xff

For every published Volume, the user only downloads the right Chapter for their needs. The Min-know library automates this by using the CIDs in the manifest to find files on IPFS.

This means obtaining one Chapter from every Volume that has ever been published. Hence, the user ๐ŸŸ only needs 1/256th of the entire database.

Once downloaded, the Chapters can be queried for useful information that the database contains.

Optionally, they can also pin their Chapters to IPFS, which makes the data available from more sources.

Interface

Iteraction with the library occurs the Todd struct ([database::types::Todd]) through the methods:

  • For users:
    • obtain_relevant_data()
    • check_completeness()
    • find()
  • For maintainers:
    • full_transformation()
    • extend()
    • repair_from_raw()
    • generate_manifest()
    • manifest()

Architecture

See ./ARCHITECTURE.md for how this library is structured.

Examples

All examples can be seen with the following command:

cargo run --example

See ./examples/README.md for more information.

Databases

See ./DATABASES.md for different databases that have been implmemented in this library.

Database Maintainers

The maintainer methods in the examples are used to create and extend a TODD-compliant database.

This requires having a local "raw" source, which will be different for every data type. The library will use the methods in the ./extraction module to convert the data.

For example:

  • The address-appearance-index is created and maintained by having locally available Unchained Index chunk files (produced by trueblocks-core https://github.com/TrueBlocks/trueblocks-core)). They are parsed and reorganised to form the TODD-compliant format.
  • The nametags database is created and maintained by having individual files (one per address) that contain JSON-encoded names and tags.

Other raw formats might be flat files containing data of various kinds.

Extend the library for your data

See ./GETTING_STARTED.md for how to use min-know for a new database.

Manifest coordination using a smart contract

TODD-compilance is about coordination by default (e.g., having a Schelling point for a distributed database).

The manifest contains the CIDs of all the Chapters for a given database. A new manifest is created when a database is updated and new CIDs are added. Old CIDs remain unchanged.

After creating the manifest, that person can post it under their own IPNS. Anyone who knows this IPNS can watch for new manifests to be published there.

To broadcast that you are going to publish, you can perform a single transaction to a broadcasting contract (https://github.com/perama-v/GAMB) to record your IPNS name with the topic you wish to publish (the name of the database you are publishing). An example GAMB-compliant contract might look like: PublisherRegistry.sol, with main functions as follows:

/// @notice Record the IPNS of a publisher who will publish for a topic.
/// @dev Maps the given IPNS to the specified topic, appending to existing submissions.
function registerPublisher(string memory topic, string memory ipns_of_publisher) public {
    topics.push(topic);
    publisherHashMap[topic].push(Publisher({submitted_by: msg.sender, ipns: ipns_of_publisher}));
    emit NewPublisher(msg.sender, topic, ipns_of_publisher);
}
/// @notice Gets all publishers for a topic.
/// @dev Gets the Publishers that are mapped to the given topic string.
/// @return Returns an array of Publishers.
function getPublishersForTopic(string memory topic) public view returns (Publisher[] memory) {
    return publisherHashMap[topic];
}

After this single transaction, you can update your IPNS to the latest manifest hash for free.

The purpose of the contract is two-fold:

  1. Discovery (anyone can find publishers for a topic from a single "meeting point".)
  2. Censorshiop resistance (no one can stop you from posting your IPNS to a topic.)

Anyone else can also submit their IPNS name to the contract and publish new volumes for the database. While not yet implmemented, the process of checking that contract, fetching manifests, comparing the CIDs they contain and coordinating to collaborate on publishing can all be automated.

Pin by default to IPFS

While not implmemented in this library, it is intended that end-users of a TODD-compliant diatabase could automatically pin any Chapters they download. This could be an opt out process and could result in most users contributing significantly to the long term availability of data.

Frequently Asked Questions

See ./FAQ.md

Contributing

This is a very experimental library that is mostly an exploration for feasibility analysis.

The library is not currently being used to deliver data to real end users. Though it is designed to be readily implemented f (see ./GETTING_STARTED.md) that can all share the same core of the library.

Does the idea interest you? A suited for?

  • twitter: @eth_worm
  • github @perama-v

Raise an issue or say hi โค

min-know's People

Contributors

perama-v avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

min-know's Issues

Make the tool general purpose

Problem

The prototype is an implementation of the address appearance index specs. The specs are a specific instantiation of the more general time ordered distributable database framework. Some parts of the prototype would be common to other instantiations and could be shared.

Solution

Generalise the core functions of the prototype so that it could be used for converting and managing other databases.
Make the core functions operate on more abstract terms, following RecordKey/RecordValue terminology as per:
perama-v/TODD#2

Integrate IPFS

Problem

The data structures that min-know creates are distributable over IPFS by design (they are TODD-compliant). Currently the files are not shared over IPFS after a user obtains them.

Solution

Add the ability for a user to share over IPFS after obtaining the Chapters relevant to them.

Perhaps using: https://github.com/ferristseng/rust-ipfs-api to pin files to an existing local IPFS node.

Define source data format

Description

For a publisher to build/maintain the signatures database, min-know must ingest data from a raw source. The source must
have some format.

Some more background is previously explored here: https://perama-v.github.io/ethereum/protocol/poking/part_8

Resolution

Decide on a format by reviewing available raw data sources. For raw sources that do not already exist in this format, a
transformation may be applied prior to ingestion by min-know.

Existing databases

As previously explored: https://perama-v.github.io/ethereum/protocol/poking/part_8

Name Entries Interface Repo Database available en-masse API available (if no DB) Bytes per sig
Etherface 2.2 million https://www.etherface.io/statistics https://github.com/volsa/etherface โŒ ? โœ… 20
4byte 0.5-1 million 4byte.directory (~1M) https://github.com/ethereum-lists/4bytes (~500K) โœ… โœ… 4
Samczsun's Sigs ? https://sig.eth.samczsun.com/ โŒ ? โœ… 4
Topic0 7_800 - https://github.com/wmitsuda/topic0 โœ… N/A 20

Formats

The main consideration is file format and how collisions (two strings creating same 4byte signature) are represented.

Name Format Comment
Etherface (API) JSON -
4byte Flat files with single string Collisions (n=40/534574) separated with ";"
Samczsun's Sigs (API) JSON -
Topic0 Flat files with single string Collisions separated with ";"

Hence, flat files appears to be the best raw input format. The API-based databases could be transformed into this format if that
becomes relevant.

Parameter names

Some sources (topic0) separate out data in to with- and without-parameter name buckets. How should this be handled when ingesting raw data into min-know.

Solidity strips out the parameter names prior to signature computation. Thus a user cannot be sure if a given set of names
is correct. If they have the source code, they can (e.g., sourcify), but then they will not need the signature database (they can compute them from the ABI they have).

Leaning toward not distinguishing between them in the database, but undecided.

Open questions

  • Are there more sources available?
  • Of the sources listed that have an API but no database, can the format for this internal data be discovered to make min-know useful for them?
  • Are any of the sources interested in transforming their data with min-know in order to publish with the in a distributable form (e.g., according to the signatures spec here: https://github.com/perama-v/TODD/blob/main/example_specs/signatures.md)

Prevent type mismatches

Description

The library is generic and when initialising a database, the spec must be provided:

let db: Todd<AAISpec> = Todd::init(data_kind, DirNature::Sample)?;

However, you are allowed to pass in a mismatched data_kind:

// Should be this:
// let data_kind = DataKind::AddressAppearanceIndex(Network::default());

// But this is permitted:
let data_kind = DataKind::NameTags 

let db: Todd<AAISpec> = Todd::init(data_kind, DirNature::Sample)?;

Solution

Ideally, have the init() function derive the spec type from the data_kind enum to allow
omission of explicit typing.

let db = Todd::init(data_kind, DirNature::Sample)?;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.