Coder Social home page Coder Social logo

Comments (16)

leighmcculloch avatar leighmcculloch commented on August 22, 2024

If we define this new type in the XDR, which therefore means it exists in the stellar-xdr crate, we can hand roll From and TryFrom conversions in the two directions so that it is trivial to go from an ScSmallVal => ScVal (infallible) and from ScVal => ScSmallVal (fallible).

Can we structure these types so that ScSmallVal is XDR bit-for-bit compatible with ScVal too, and write tests confirming such?

from stellar-protocol.

sisuresh avatar sisuresh commented on August 22, 2024

Yeah I think this make sense.

cc @tomerweller and @graydon because they've mentioned this before.

from stellar-protocol.

graydon avatar graydon commented on August 22, 2024

My interest in this has mostly to do with the fact that the same general issue has come up at the ltx and bucket layers (and anything that trades in ledgerkeys: footprints, the strife algorithm, etc.): we're probably going to want to impose a size limit on the SCVal keys under which we store CONTRACT_DATA LEs.

That said, I think the restriction currently written in cap 0056 is too tight for this scenario. Eliminating all vecs and maps is no good. We're going to have composite keys -- small vecs at least, probably also small maps (think small UDTs) -- we just want to make sure they're not "too big". I expect something in the ballpark of 256 - 2048 bytes?

If we try to pick a limit for this more systematically: assume people are using UDT keys, and the field values are the largest reasonable scalar, 32 bytes. Then a key/val pair is 8+32=40 bytes and a small set of them (say "up to 8"?) fits in 320 bytes. People might not flatten their UDTs either, so there might be a degree of nesting -- a map-of-maps -- which adds a little additional SCVal structure-tag framing, but probably no more than 8*8=64 more bytes. This makes me think a limit of, say, 512 bytes serialized would probably be plenty, and 1kb would provide a decent amount of future-proofing.

Another approach would be to continue what we're doing now and limit the value size not based on total encoded bytes, but per-element structural limits: binaries up to 32 bytes only, vecs and maps up to (say) 4 elements each, and at most 1 level of nesting containers-of-containers. Then your worst case is a map-of-maps-of-32-byte-scalars, something like 4 * 4 * 48 = 768 bytes, also reasonable but you have to set the branching-factor limit low enough that it removes a degree of freedom users would otherwise have in how they allocate the key (i.e. they can't use those 768 bytes to have a single 700-byte binary or a single vec of 64 8-byte symbols or whatever).

I expect @jonjove will also have some thoughts here, in terms of "the kinds of composite keys we can imagine users wanting to use to access separate, uncontended LEs under".

(And I should emphasize that giving users "more flexibility than the bare minimum" is key to allowing them to tune storage-overhead amortization and parallel contention in their program, to make it run well.)

from stellar-protocol.

leighmcculloch avatar leighmcculloch commented on August 22, 2024

This makes me think a limit of, say, 512 bytes serialized would probably be plenty, and 1kb would provide a decent amount of future-proofing.

I'm leaning towards we should define the limits at each component rather than at the final serialized form, otherwise the operation won't fail until serialization which means the operation may consume a decent amount of gas unnecessarily. I also feel like it becomes harder for a developer to predict whether they are going to exceed the limit. Especially given they don't regularly interact with the serialized form inside a contract.

Anything that makes a contract unpredictable reduces the reliability of the contract and risks causing the contract to become unrecoverable.

from stellar-protocol.

jonjove avatar jonjove commented on August 22, 2024

Basically all of the keys in contracts I've written are UDTs, and some of them have several levels of nesting. For example, https://github.com/stellar/soroban-token-contract/blob/2e0b985df858ccecb5dbc1da54d02066b0208265/src/storage_types.rs#L13-L14 is a UDT-enum (DataKey) containing a UDT-struct (AllowanceKey) containing UDT-enums (Identifier). If/when we implement stellar/rs-soroban-sdk#494 this can be flattened one level by removing AllowanceKey.

Being able to support complex keys seems pretty important. A size limit seems reasonable, but restricting the layout might have consequences. That being said, I get @leighmcculloch's concern about only failing at serialization time.

I recall that @graydon said long ago that he doesn't want people to make their keys into hashes (@graydon correct me if I misremember or if this stance has changed). But if the key limitations don't allow developers to fit the data that they want in, they will probably resort to cryptographic hashing.

from stellar-protocol.

graydon avatar graydon commented on August 22, 2024

@jonjove makes a good point -- the escape hatch here is "the user can always just hash their keys if they're too big". Some systems hash keys by default (eg. ethereum) and while I think their level of hashing is overkill, it's probably not a bad option for some instances on soroban too. I think my only opposition(s) to doing this by default are:

  • Diagnostic meaning is lost -- you can't dump the ledger entries associated with a contract and browse its structure.
  • Range queries are permanently prohibited. We probably never want to allow range queries anyways (it's unlikely to be possible to retrofit into the concurrency control story) but making all keys hashes would paint it out of the picture entirely.
  • Any possibility of spatial locality is gone. It's not super likely, but if you had (for example) a token contract with thousands of "user accounts" and you keyed its LEs by [user, some-subkey] then when we send keys to the IO layer for bulk reading at the beginning of a txn, the IO layer will be able to scoop up all the LEs with subkeys under user in fewer page-reads than if it had to read "one page-worth of the ledger for every LE" as you'd get with hashing.
  • Any other subsetting/sharding or locality consideration that might be apparent at a lower-level interface (like say horizon-light / RPC servers) is probably also shot, but that's even more speculative.

How would others feel about a type -- say SCKey -- which is a union between Val(SCVal) and Hash(u256) and a function that projects an SCVal into it by measuring the serial size, and hashing if the serial size is over some threshold? Too mysterious?

from stellar-protocol.

graydon avatar graydon commented on August 22, 2024

(I should also note: I would prefer not to use such an SCKey for the keys in Map since that type does support range queries)

from stellar-protocol.

leighmcculloch avatar leighmcculloch commented on August 22, 2024

SCKey -- which is a union between Val(SCVal) and Hash(u256) and a function that projects an SCVal into it by measuring the serial size, and hashing if the serial size is over some threshold? Too mysterious?

This is too magical and mysterious. It may result in stable keys, but inconsistent across a global set, that will be difficult for other systems off-chain to build against.

We provide everything a developer needs to already do hashing. All types in the SDK can be serialized to XDR bytes, and there's a hashing function that a developer can use to hash the key.

We could introduce a very clear way to signal in the SDK that a key should be hashed rather than stored verbatim, such as wrapping the value in another type that makes that happen. I think we should experiment with this in the SDK before adding anything to the env / host to support it.

The bigger issue is if we continue to support non-hashing keys, how do we make it really clear what is supported as a key and what isn't? Probably just really well defined docs, and keeping the definition understandable and testable, i.e. not based on total serialized bytes.

from stellar-protocol.

github-actions avatar github-actions commented on August 22, 2024

This issue is stale because it has been open for 30 days with no activity. It will be closed in 30 days unless the stale label is removed.

from stellar-protocol.

jayz22 avatar jayz22 commented on August 22, 2024

The issue of limiting ScVal size came up today during discussion with @graydon regarding host budget metering for the Storage. Currently the storage key is a LedgerKey which may be an ContractData containing ScVal. We would have to limit the size of the storage key by using a size-limited ScVal.
One way to limit the size is to specify the maximum number of total ScObject the ScVal can contain over all levels, and the maximum size of an ScObject (since an object is only the point of expansion in the "val tree"). This way allows the maximum size of ScVal to be bounded without limiting its structural layout (nestedness) thus gives flexibility.
Although with this approach we are indirectly limiting the size. I.e. two ScVals may both reach the size limit but their sizes can be different (due to different number of objects contained).

from stellar-protocol.

graydon avatar graydon commented on August 22, 2024

Agreed, I think this "number of objects" * "size of a single object" approach is probably workable (and has the benefit of being easy to explain and relatively easy to pick values for, eg. if we're aiming to stay under 2048 bytes of XDR, we could say "16 objects each no more than 128 bytes each" -- a 128 byte Bytes object is plenty future-proof for cryptographic identifiers and a 128-byte map is one with 5 24-byte 2-times-12-byte SCVal-pair entries, so "5-way fanout" fits in 128 bytes). Or we could say 8 way fanout on maps but the leaves are no more than 32 bytes, or such.

from stellar-protocol.

github-actions avatar github-actions commented on August 22, 2024

This issue is stale because it has been open for 30 days with no activity. It will be closed in 30 days unless the stale label is removed.

from stellar-protocol.

github-actions avatar github-actions commented on August 22, 2024

This issue is stale because it has been open for 30 days with no activity. It will be closed in 30 days unless the stale label is removed.

from stellar-protocol.

github-actions avatar github-actions commented on August 22, 2024

This issue is stale because it has been open for 30 days with no activity. It will be closed in 30 days unless the stale label is removed.

from stellar-protocol.

graydon avatar graydon commented on August 22, 2024

In conversation with @jayz22 today, we discussed the possibility of host objects gaining a field that says their current cumulative size (in terms of number of subobjects), computed from the cumulative sizes of their inputs (possibly lazily-and-cached, possibly eagerly on construction), such that checking such sizes is (amortized) O(1) and we can check it either by a host function or just on input to functions that want to limit themselves to "small" objects.

from stellar-protocol.

github-actions avatar github-actions commented on August 22, 2024

This issue is stale because it has been open for 30 days with no activity. It will be closed in 30 days unless the stale label is removed.

from stellar-protocol.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.