etcimon / fast Goto Github PK
View Code? Open in Web Editor NEWA library for D that aims to provide the fastest possible implementation of some every day routines.
A library for D that aims to provide the fastest possible implementation of some every day routines.
Marco, big thanks for your Json parser, but could you add move examples in readme of how it's can be used?
Example:
import std.utf;
import fast.json;
auto str = `{"a":"SΛNNO"}`;
str.validate;
auto js = Json!validateAll(str);
foreach(key; js.byKey)
{
if(key == "a")
{
auto t = js.read!string;
writeln(t);
}
else
{
writeln(key);
js.skipValue();
}
}
Error:
std.json.JSONException@std/json.d(1168): Byte 0xce forms invalid UTF-8 sequence in string. (Line 1:7)
----------------
../../.dub/packages/fast-0.3.0/source/fast/json.d:1317 void fast.json.Json!(2u, true).Json.handleError(immutable(char)[]) [0x48f7f3]
../../.dub/packages/fast-0.3.0/source/fast/json.d:1286 void fast.json.Json!(2u, true).Json.expectNot(immutable(char)[]) [0x48f63d]
../../.dub/packages/fast-0.3.0/source/fast/json.d:403 bool fast.json.Json!(2u, true).Json.scanString!(true).scanString() [0x499955]
../../.dub/packages/fast-0.3.0/source/fast/json.d:322 const(char)[] fast.json.Json!(2u, true).Json.borrowString() [0x48ec66]
../../.dub/packages/fast-0.3.0/source/fast/json.d:277 immutable(char)[] fast.json.Json!(2u, true).Json.read!(immutable(char)[]).read(bool) [0x49ce76]
source/app.d:23 int app.main().__foreachbody1(ref const(char[])) [0x48bf33]
../../.dub/packages/fast-0.3.0/source/fast/json.d:1197 bool fast.json.Json!(2u, true).Json.iterationGuts!("{}", const(char)[], int delegate(ref const(char[]))).iterationGuts(ref int, const(char)[], scope int delegate(ref const(char[])), immutable(char)[]) [0x49bbc6]
../../.dub/packages/fast-0.3.0/source/fast/json.d:906 int fast.json.Json!(2u, true).Json.byKeyImpl(scope int delegate(ref const(char[]))) [0x48f200]
source/app.d:19 _Dmain [0x48be79]
??:? _D2rt6dmain211_d_run_mainUiPPaPUAAaZiZ6runAllMFZ9__lambda1MFZv [0x4a3bfe]
GDC feeds all overflow intrinsics to the __builtin_xxx functions.
I noticed you use mmap in fast.json.
It is a nice trick, but also doesn't need to be.
Refactor it so this mmap file implementation is in separate module, fast.file
? It uses alias m_json this; Json m_json;
, but it can simply be a template parameter, so it can still leave be outside of json module, and json module would simply isntantiate it FastFile!Json
for example. It looks to be static class/struct, so that is all. If you need access to some outer class variables, this can still be as template in separate module and used using mixing mixin FastFile!Json file;
, with protocol defined and documented.
There is std.mmfile
which is working nice, and has read-only mode too, so it should be used instead. If there are some flags (like madvise
, etc) that makes a difference they could be upstramed to std.mmfile
as options.
For small files it probably doesn't make sense to use mmap, as it can be expensive, not scale with number of cores / threads, and waste memory. Each mmap will probably mmap whole 4kB page, if left there for long. So for small files, it is better to just read explicitly into the properly-sized buffer, or even only do so for strings. A custom arena allocator can also be used. Yes, there is a function to parse from memory block, but then you put a lot of extra logic on the caller side, instead making it in a library.
What about reading a stream from network socket, Unix pipe, or from decompression library in chunks? It can't be easily mmaped, but is often an important application of parsing, and probably most common use case. Right now the only option is to buffer whole thing, and then parse, which in some synthetic scenarios can be about 2x more memory usage.
Current implementation has inaccurate number parsing.
How to Read Floating Point Numbers Accurately by William D. Clinger - a classic paper on number parsing.
mir.bignum.decimal can be used for parsing and precise conversion to double.
Hi Marco.
Small enhancement request. (Apologies if it's implemented already and I didn't see).
Quite often one wants to parse a JSON stream (like from Twitter or the Reddit comment dump). It would be nice to have that implemented as part of the library, so it's very easy to use. I have written a small range to do this, but it's quite crude, and I haven't paid attention to efficiency. I can make a pull request if you would like (and you can refine it later), but you may prefer to implement yourself - let me know.
Here is some very simple code to process Reddit comments:
https://gist.github.com/Laeeth/bbd08dd576cb7aeff444
The original comments are here:
https://archive.org/details/2015_reddit_comments_corpus
On one core it takes 35 minutes to process one month's data (35 Gig).
Thanks for getting in touch by email. That was about something else - have had to figure out some other things but will respond shortly.
Laeeth.
On Windows 10:
Performing "debug" build using C:\project\dmd2\windows\bin64\dmd.exe for x86_64.
fast 0.3.5: building configuration ""...
\fast-0.3.5\fast\source\fast\cstring.d(198,59): Error: function fast.cstring.string2wstringSize(const(char[]) src) is not callable using argument types (const(ushort[]))
\fast-0.3.5\fast\source\fast\cstring.d(198,59): cannot pass argument fname of type const(ushort[]) to parameter const(char[]) src
...
etc.
Namely
x >= '0' && x <= '9' ==> auto xx = x - '0'; xx < ('9' - '0').
Obviously works with 'a' and 'f' as well.
import std.stdio;
import fast.json;
void main() {
auto json = parseTrustedJSON(`{"x":123}`);
writeln(json.x); // shall I get 123 here?
}
core.exception.AssertError@/home/xxx/.dub/packages/fast-0.3.5/fast/source/fast/json.d(1208): Assertion failure
Many D-language projects are licensed with the Boost license. fast is currently under a GPL3.0 license which would require any user of the project also license their project under GPL3.0.
This consideration came up via the announcement thread on the dlang forums:
http://forum.dlang.org/thread/20151014090114.60780ad6@marco-toshiba?page=2
Hi Marko.
Hope you're well.
When you have time, would you mind dropping me an email please ?
Laeeth
At
Kaleidicassociates.com
Thanks a lot.
Laeeth
cstring.d, internal.d and string.d all import fast.buffer, but it is nowhere to be found.
The validation is wrong fot D strings.
static if (isValidating)
if (*m_text != '\0')
laeeth@engine parsereddit]$ dub build --compiler=gdc
WARNING: A deprecated branch based version specification is used for the dependency fast. Please use numbered versions instead. Also note that you can still use the dub.selections.json file to override a certain dependency to use a branch instead.
Performing "debug" build using gdc for x86_64.
fast ~master: building configuration "library"...
parsereddit ~master: building configuration "application"...
/home/laeeth/.dub/packages/fast-master/source/fast/json.d: In member function 'skipWhitespace':
/home/laeeth/.dub/packages/fast-master/source/fast/parsing.d:661:6: error: inlining failed in call to always_inline 'skipAsciiWhitespace': function body not available
void skipAsciiWhitespace(ref const(char)* p)
^
/home/laeeth/.dub/packages/fast-master/source/fast/json.d:1336:4: error: called from here
m_text.skipAsciiWhitespace();
^
gdc failed with exit code 1.
Hello,
I'm currently upgrading the spasm framework to be able to develop web applications in webassembly much like you do with React or Angular. I modified your library to be compatible with better C to generate wasm code: https://github.com/etcimon/libwasm/tree/master/fast
I intend on putting more work on libwasm to allow developers to create their apps through it mostly for mobile development, but I noticed the GPL3 license recently. I don't think anyone would want to create an application using a tool that forces them to make it open source. Do you think it could be changed to a more permissive license for this specific case? (removing the SIMD parts)
Thanks!
Hi Marco, I really like the speed that comes with your pull based approach.
I have a simple program, that I would like to implement, but I am struggling applying the pull based thing to the problem:
I want to analyse nist's cve data (https://nvd.nist.gov/vuln/data-feeds) e.g. by searching through a datafile and printing out the whole json entry that matches an id.
The datalooks like this:
"CVE_Items" : [ {
"cve" : {
"data_type" : "CVE",
"data_format" : "MITRE",
"data_version" : "4.0",
"CVE_data_meta" : {
"ID" : "CVE-1999-0001",
"ASSIGNER" : "[email protected]"
},
"affects" : {
...
},
"problemtype" : {
...
},
"references" : {
...
},
"description" : {
...
}
},
"configurations" : {
...
},
"impact" : {
...
},
"publishedDate" : "1999-12-30T05:00Z",
"lastModifiedDate" : "2010-12-16T05:00Z"
}, {
"cve" : {
...
with your nice library I can easily write something like this:
foreach (cveFile; cves) {
foreach (item; cveFile.CVE_Items) {
cveFile.cve.CVE_data_meta.keySwitch!("ID")(
{
auto id = cveFile.read!string;
if (id in toFind) writeln(id);
});
}
}
But instead of just outputting the id, i would like to dump everything, that belongs to the object that contains the matching id.
Whats the best way to do this?
I'm unable to get this to run successfully on Apple Silicon because of an illegal hardware instruction error, which comes from the use of vpcmpistri
, an AVX instruction not supported by Apple Rosetta. I've tried using dub with LDC2, GDC, and DMD to build with the same issue at runtime.
Here's the problematic assembly from lldb:
Process 25914 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
frame #0: 0x000000010000d275 app`_D4fast7parsing__T10vpcmpistriTaVyAaa2_225cVEQBrQBp9Operationi0VEQClQCj8Polarityi0Vbi0ZQCrFNaNbNiKPxaZv(p=0x000000030456ae38) at parsing.d:817
814 movdqu XMM0, [RAX];
815 mov RAX, [RDI];
816 L1:
-> 817 vpcmpistri XMM0, [RAX], mode;
818 add RAX, 16;
819 cmp ECX, 16;
820 je L1;
I have a use case for parsing JSON as fast as possible to sequentially load a massive pile of JSON files into a processing queue, so I decided to try this on an Xeon EC2 instance (which does support AVX-512) – the same line of assembly throws a SIGSEGV
on an m5.xlarge EC2 running Ubuntu with the following trace:
* thread #1, name = 'historical-load', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff6d2e000)
frame #0: 0x000055555568ac21 app`_D4fast7parsing__T10vpcmpistriTaVyAaa3_227b7dVEQBtQBr9Operationi0VEQCnQCl8Polarityi0Vbi0ZQCtFNaNbNiKPxaZv(p=0x00007fffffffdcf8) at parsing.d:817
814 movdqu XMM0, [RAX];
815 mov RAX, [RDI];
816 L1:
-> 817 vpcmpistri XMM0, [RAX], mode;
818 add RAX, 16;
819 cmp ECX, 16;
820 je L1;
I also tried this on an m5.metal
EC2 instance and got the same result.
My simple test implementation:
auto parsedJSON = parseJSON(`{"now":10}`);
writeln(parsedJSON.singleKey!"now");
Is there any plan to provide broader CPU architecture support?
when i try to run the benchmark (or include the library) on os x it fails with
dub --build=release -c benchmark
Performing "release" build using dmd for x86_64.
fast 0.3.1+commit.3.gbd6c92e: building configuration "benchmark"...
source/fast/parsing.d(808,29): Error: cannot directly load global variable 'SIMDFromString' with PIC code
source/fast/parsing.d(602,3): Error: template instance fast.parsing.vpcmpistri!(char, "\x01\x1f\"\"\\\\\x7f\xff", cast(Operation)4, cast(Polarity)0, false) error instantiating
source/fast/json.d(375,11): instantiated from here: seekToRanges!"\x00\x1f\"\"\\\\\x7f\xff"
source/fast/json.d(328,7): instantiated from here: scanString!true
source/fast/benchmarks.d(171,17): instantiated from here: Json!(2u, true)
source/fast/parsing.d(808,29): Error: cannot directly load global variable 'SIMDFromString' with PIC code
source/fast/parsing.d(595,3): Error: template instance fast.parsing.vpcmpistri!(char, "\"\\", cast(Operation)0, cast(Polarity)0, false) error instantiating
source/fast/json.d(414,10): instantiated from here: seekToAnyOf!"\\\"\x00"
source/fast/json.d(328,7): instantiated from here: scanString!false
source/fast/json.d(97,1): instantiated from here: Json!(0u, false)
source/fast/parsing.d(808,29): Error: cannot directly load global variable 'SIMDFromString' with PIC code
source/fast/parsing.d(655,3): Error: template instance fast.parsing.vpcmpistri!(char, " \x09\x0d\x0a", cast(Operation)0, cast(Polarity)16, false) error instantiating
source/fast/parsing.d(675,3): instantiated from here: skipAllOf!" \x09\x0d\x0a"
source/fast/parsing.d(808,29): Error: cannot directly load global variable 'SIMDFromString' with PIC code
source/fast/parsing.d(690,3): Error: template instance fast.parsing.vpcmpistri!(char, "\x0d\x0a", cast(Operation)0, cast(Polarity)0, false) error instantiating
i guess its because OS X has PIC by default. so can i disable this or could there be a workaround in fast to make it work on OS X for development?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.