Comments (10)
I feel like this is not a concern. The correct way to prevent any DoS attack is to distribute computational time fairly between clients, so that any client can only slow down service for itself.
from roc.
About your claim of "pure" functions, you can make nondeterminism contained inside each hash table. Basically, hash of any object in Roc is deterministic, but each hash table (default implementation) has a secret key that even the language user cannot access. Inside a hash table, the actual key used to decide which bin to put the object in is HMAC(object hash, secret key)
.
One important point: preserve the order of objects inserted, so iteration of hash table is deterministic.
from roc.
What are the ordering requirements for a standard hashmap going to be, or is the only guarantee a per-type determinism (for referential transparency)?
from roc.
Another mitigation in the http case is using a record internally for standard or ubiquitous headers (Host, User-Agent, Authentication, Content-Type, etc), leaving a hashmap only for lesser used headers. At the protocol level, iirc, http2 has a standard compression dictionary for such headers so that the server's compressed stream doesn't even need to communicate that "this bit sequence means "Host", as any client will already have a "codebook" for that (my term, don't recall actual).
So iow, unrolling header maps is already literally standard practice.
Such a record could be exposed, but in any case header lookup by name could use a when
to match known record names and then fall back to dictionary lookups. Presumably a when
against a bunch of strings will be optimized by LLVM into some kind of table lookup. This would also reduce the size of the hashmap, as well as the likelihood of encountering collisions even if they were crafted to exist, since looking up the Host header wouldn't bother checking the hashmap, and it's unlikely that the server app would be trying to lookup the Totally-Not-Real header.
Finally, there would be some value in a hypothetical Roc HTTP server platform in requiring that endpoint handlers declare the headers (and query params and body format, and same for the response) that it will or even can look at, primarily to allow making the contract very visible, and convertible and statically verifiable to and from OpenAPI, but also finally letting us break away from writing endpoint handlers deal with low-level IO concerns, not to mention allowing for some optimizations, like skipping over decoding or storing headers that neither the platform nor the app's handler will even use.
from roc.
Another mitigation in the http case is using a record internally for standard or ubiquitous headers (Host, User-Agent, Authentication, Content-Type, etc), leaving a hashmap only for lesser used headers
Cool, I hadn't heard of that technique - makes sense!
That should improve performance in general, although unfortunately I don't think it would help with hash flooding in particular. Essentially all the collisions would necessarily be made-up headers anyway (since if common headers collided, this wouldn't need to be an attack; it would just happen all the time naturally!) and since the uncommon headers would all end up in the "lesser-used headers" hashmap, we'd be in the same situation.
(Of note, just creating the "lesser-used headers" hashmap would still result in lots of collisions and lots of linked lists being created, if linked lists were used for collision resolution...so the concern still stands in that scenario!)
That said, I think the idea of parsing common headers directly into a record instead of into a hashmap makes a ton of sense, and I want to use it in a HTTP server platform! 😃
from roc.
requiring that endpoint handlers declare the headers (and query params and body format, and same for the response) that it will or even can look at, primarily to allow making the contract very visible, and convertible and statically verifiable to and from OpenAPI
This is also a cool idea!
In case it ends up ruling out some important but rare use cases, maybe the platform could offer something like "here's the raw un-parsed header, feel free to parse it into something else."
from roc.
Latest idea: apparently what Java 8 does is to mitigate hash flooding at the data structure level (rather than the hash function level) as follows:
- Use their normal collision detection resolution algorithm for the first 4 collisions on a given key
- As soon as it has more than 5 collisions, create a search tree and store subsequent collisions in there, so if there are a ton of collisions, they end up being worst-case logarithmic instead of linear.
This seems like a strategy that could work well for Roc. It could mean:
- The non-attack case would be barely affected; there would be an extra conditional (and maybe an extra counter to track how many collisions have accumulated, but that might already be available) during collision resolution, but that's about it.
- The attack would be reduced to making the collision resolution logarithmic (as opposed to linear), which shouldn't be harmful enough to performance to make the hash flooding worthwhile.
- There wouldn't need to be any changes to the dictionary's public API to implement this. We could avoid introducing an extra Ord constriant to the keys by using our existing "Ord inference on structural types" logic and extending it to automatically skip over opaque type wrappers. That implementation would have the necessary properties to defeat the hash flooding, without affecting the public API.
from roc.
Of note, Java 8 uses linked lists for collision resolution normally, so falling back on a different data structure seems straightforward. However, we'd likely want to use more efficient hash maps which didn't use linked lists, so "fall back on a tree" wouldn't necessarily be as straightforward.
However, the tree could be a pointer stored on the heap (after the refcount, for example) which initially is null but then we make an allocation for it after the first time we have 4 collisions. Alterntively, it could be a Vec of pointers to ordered maps, and each colliding key could allocate its own map and push it onto the Vec for later deallocation.
If we don't have a natural way to access collision counts, I assume there has to be a way to calculate that which wouldn't have to be stored in the data structure. For example, maybe we could to the "resolve a collision" function which is "number of collisions encountered so far" and each time we call it to find another slot, we increment that argument.
from roc.
I really want to avoid having random hashing in Roc. We can get a lot of benefits from everything in the language being a pure function - in terms of caching, reproducibility of test runs, etc. - and using a randomly seeded hasher would violate some of those assumptions.
Note that non-determinism doesn’t have to be observable: you probably still want ASLR for Roc.
Similarly, a hash map can maintain well-defined iteration order, but still used randomization internally. That’s what new versions of Python and JS do: store two slices inside, one values in insertion order, the other indexes in hash-map order.
Rust uses SipHash by default.
My gut feeling is that Rust is quite suboptimal here, and that basically no one bothered to replace SipHash with something faster and still secure enough. Which is surprising, but I don’t really remember any substantial effort to find something faster.
I’d look at Go. Well, I have looked at Go,
https://go.dev/src/runtime/asm_arm64.s
(search memhash)
and they seem to use AES hardware instructions, which I’d expect fast hash to use, and, well, structurally Go should care a lot about both DoS resistance and perf of default hash function (in contrast, structurally Rust cares about the perf of non dos resistant hash, bc that’s what both Rust and FF use: https://nnethercote.github.io/2021/12/08/a-brutally-effective-hash-function-in-rust.html)
from roc.
Note that non-determinism doesn’t have to be observable: you probably still want ASLR for Roc. Similarly, a hash map can maintain well-defined iteration order, but still used randomization internally. That’s what new versions of Python and JS do: store two slices inside, one values in insertion order, the other indexes in hash-map order.
This is actually what we ended up going with, and in fact it uses ASLR under the hood to seed the randomness! 😃
Full credit to @bhansconnect for that idea. 🎉
Speaking of which, I guess we can close this issue now?
from roc.
Related Issues (20)
- Implement dec_alloca for Wasi HOT 1
- `roc fmt` removes parens from around `when` in list
- Improve Dict Memory Usage HOT 2
- Compiler panics and hangs with 'list element is error type'. HOT 1
- Crash when adding type annotation to function HOT 3
- Compiler crash on unqualified import for unexposed value HOT 1
- Update cli for new platform host build process
- Roc should error for non-roc files
- "roc format --stdin --stdout" recursively traverses the current working directory
- Add markdown style links for generated roc docs HOT 1
- Optional record fields can't be used in two different ways HOT 3
- Low level "bit-twiddling" performance... HOT 8
- Formatter inconsistently drops parens in the type definition HOT 1
- Update actions/checkout and actions/upload-artifact to latest version
- `_SIZE_CHECK_` assertions fail for some unions [RustGlue.roc]
- LLVM alloca cleanup
- Getting a 'Segmentation fault (core dumped)' while trying to decode a json from sqlite3 HOT 3
- Pass memory copy functions to Zig
- Language server occasionally fails to shutdown
- Miss-Identified Infinite Recursive Type for Complex Recursion...
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from roc.