Comments (14)
Do we want to proceed with StringLiteral#to_utf16
then?
I think it's more elegant than String#utf16_literal
(#14670 (comment)). An apparently more performant (#14676 (comment)).
from crystal.
Maybe the converstion algorithm from String#to_utf16 could be implemented as a macro method? It's a bit complex, but not too much. I don't think we can explicitly do math operations on 16-bit integers in the macro language, though.
Simply porting #14671 works fine, explicit math operations on 16-bit integers are not needed.
class String
macro utf16_literal(data)
{%
arr = [] of NumberLiteral
data.chars.each do |c|
c = c.ord
if c < 0x1_0000
arr << c
else
c -= 0x1_0000
arr << 0xd800 + ((c >> 10) & 0x3ff)
arr << 0xdc00 + (c & 0x3ff)
end
end
arr << 0
%}
Slice(UInt16).literal({{arr.splat}})[0, {{arr.size - 1}}]
end
end
s = String.utf16_literal("TEST 😐🐙 ±∀ の")
# => Slice[84, 69, 83, 84, 32, 55357, 56848, 55357, 56345, 32, 177, 8704, 32, 12398]
String.from_utf16(s)
# => "TEST 😐🐙 ±∀ の"
Encoding 10000 characters takes around 300ms.
That's certainly not fast, but probably good enough.
EDIT: Added a final 0 byte
from crystal.
The macro is nice, but if we want to eventually have the compiler optimize it, maybe we could just expose the String.to_utf16
to macros directly? For example {{ "CRYSTAL_TRACE".to_utf16 }}
would be lovely & fast.
from crystal.
The version from my comment uses the literals from #13716, so it is static data in this case.
Although it is still experimental API.
from crystal.
So the conversations could be avoided entirely
Would be nice. But I believe we're quite a bit away from that. The Windows ecosystem is huge and it has 30 years of wide chars in it.
from crystal.
Hm, that's an interesting idea. Exposing StringLiteral#to_utf16
would certainly have the benefit that you have the resulting literal easily available in macro land.
I like that it's exactly identical to the runtime version, but in a macro expansion which makes it clear that this happens at compile time.
FTR: Eventual compiler optimization would also be possible with String.utf16_literal
as well. We could turn this macro into a primitive later.
Let's focus on UTF-16 string literals here and continue the discussion about UTF-8 support on Win32 in a different issue. I'm pretty sure we won't lose all use cases for UTF-16 string literals over night, so this will still be useful.
from crystal.
I like StringLiteral#to_utf16
and if to do that we end up having a SliceLiteral even one without first class syntax yet it would still be a double win. Because then embedding resources could leverage a similar StringLiteral#to_slice
in compile-time.
from crystal.
Looks like a winner, then 🚀
That's certainly not fast, but probably good enough
Yeah, this is mainly for relatively short strings, so performance should not be an issue.
We can always push it up into the compiler if the need arises.
Btw. CharLiteral#ord
was only added in 1.11 (#13910), so this wouldn't have been possible before.
from crystal.
In order to make it actually static data, we'd also need a slice literal (#2886).
from crystal.
Worth noting that Windows supports UTF8 now and encourages use of those APIs
https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page#-a-vs--w-apis
So the conversations could be avoided entirely
from crystal.
@straight-shoota this is reusing the "old" ANSI API to use the UTF-8 codepage, so it might just work 🤷
It took me a while to find this: at the above link there is the explanation to set the Active Code Page (ACP) to UTF-8 which requires a manifest and calling an EXE to "add the manifest" to an executable. Then the executable the ANSI variant of the Windows API will use UTF-8.
That being said, it requires Windows 10 v1903 (2019) and GDI applications won't support it unless the user activates a beta setting.
from crystal.
The difficulty to implement StringLiteral#to_utf16
is that there is no SliceLiteral
and we should generate a Slice(UInt16).literal(..., 0)
and I have no idea how to achieve that.
from crystal.
The difficulty to implement StringLiteral#to_utf16 is that there is no SliceLiteral and we should generate a Slice(UInt16).literal(..., 0) and I have no idea how to achieve that.
It could return ArrayLiteral(NumberLiteral)
(or Call(@receiver=Generic(@name=Slice, @type_vars=[UInt16]) @name="literal", @args=[0, 1, 2, 3, 4, 5, ...])
)
Btw I just tested the performance of my macro code a bit more.
Simply replacing the line {{ arr.splat }}
with {% arr.splat %} 0
(so the resulting splat is not parsed) improves the runtime of encoding 10000 characters from ~300ms to ~20ms.
The macro language actually isn't that slow - the parser is.
Implementing StringLiteral#to_utf16
wouldn't improve performance in a perceivable manner since it would only remove <10% of the runtime.
Maybe there should be a way to create AST nodes directly inside the macro language, so we don't have to parse everything again.
from crystal.
GDI applications won't support it unless the user activates a beta setting.
You can activate the code pages in code, this is how applications like MS Edge browser run.
MS Edge being a react native app, so runs using JS and UTF8 (although Microsoft is removing react)
from crystal.
Related Issues (20)
- Nilable `Proc` types inside libs
- Cannot return `Proc`s from top-level funs
- `ReferenceStorage(T)` is always atomic even when `T` isn't HOT 1
- Add `crystal tool method_types` for listing method parameter types HOT 4
- Passing nil to Addrinfo.getaddrinfo gives unexpected error message HOT 1
- Package installation fails on Windows due to missing SQLite3 .lib files HOT 2
- `File#truncate` raises `File::AccessDeniedError` on Windows when file was opened in append mode HOT 3
- Cache compiler results for tools
- Include more types in `crystal tool hierarchy` HOT 9
- `close_on_exec` on Windows HOT 2
- Pointer equality for `Slice` HOT 4
- Forbid variable assignment in function call HOT 4
- Captured block parameter not recognised when used inside macro HOT 2
- Internal error when using `sizeof` as generic type argument in inferred ivar type
- ECR escape sequences do not work with `-`
- Customizing or hiding `Benchmark.ips`'s output format HOT 3
- Adding a Difference method to the Math module HOT 2
- Visit the Time.local in the macro. HOT 3
- Add Makefile support `--mcpu=native` as override FLAGS to permit build crystal compiler can enable this option optional for a better performance. HOT 4
- Compiler should Emit Warning/Notes when Deduced Type Differs from Annotated Type. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crystal.