Comments (12)
Hi Vadim,
I currently have little time to have a look, but I will, as soon as I do.
Hopefully this will be in a week or so.Thank you for providing a minimal example, that helps a lot!
All the Best
Jakob
from tiny-utf8.
Thanks, Jakob!
from tiny-utf8.
Does it work, if you replace line 736 in tinyutf8.cpp
// Iterate over relevant multibyte indices
while( lut_iter > lut_begin ){
with
while( lut_iter >= lut_begin ){
?
I'm pretty sure, that's the bug, but don't have time to try it out right now.
I look forward to hearing from you!
Jakob
from tiny-utf8.
Hi Jakob,
Now I'm unavailable (relocating) :) - sorry, will take me a day or two to get to that. I will let you know though.
Best regards, Vadim
from tiny-utf8.
Yes, it works with my example! Thanks a lot for fixing it.
Best regards, Vadim
from tiny-utf8.
Hi Jakob,
Long time no talk :) !
Sadly, the same thing strikes again, this time, with a longer string but again the multibytes are the culprits. Here is an example:
utf8_string anotherFindBug = "who were engaged in “recent narcotics trafficking activity.” Those two individuals were of Newport Beach.";
utf8_string dot = ".";
std::cout << anotherFindBug.find(dot, 60);
It is supposed to return 104. It returns 102. If you replace the smart quotes by "
, then it's all correct.
The workaround for me is to comment out the block starting at the line 2701:
if( utf8_string::is_lut_active( lut_iter ) )
...
from tiny-utf8.
Hi Vadim!
Indeed, I hope your doing well!
As I am currently not at home, I haven’t tried out the issue yet. Just to make sure, are you aware, that find returns the number of the Codepoint, not the Byte index? I ask, because In case of question, the lower is more likely to be correct (as the number of codepoints is <= the number of bytes).
Cheers,
Jakob
from tiny-utf8.
Hi Jakob,
I'm good, thanks, hope you're also doing well.
Yes indeed, the issue is now different, and it's definitely not the number of bytes, but the number of codepoints is absolutely incorrect. I naturally assumed the issue was in my code before zeroing in on the issue. I don't know where the number comes from, but there's definitely an issue in this part. Run the example, and you'll see.
Best regards, Vadim
from tiny-utf8.
I had a look to the code and it was a small ">" to ">=" issue 😃
Thanks a lot for pointing me to it, I hope the issue was not blocking you. I'll commit in a second...
from tiny-utf8.
Does it work for you now as well?
from tiny-utf8.
Sorry about the belated reply, Jakob! (Crazy weeks, then I forgot about it.)
I checked and indeed it works now. Thank you very much for the quick resolution.
from tiny-utf8.
No probs! You're very welcome!
from tiny-utf8.
Related Issues (20)
- Grapheme cluster HOT 1
- Compilation issue HOT 1
- Empty std::string problem HOT 3
- How to use the bidi algorithm with TinyUTF8? HOT 4
- Warnings when compiling with Clang 9 HOT 3
- Unexpected push_back result HOT 1
- Comparison operator doesn't match std::string HOT 2
- C++20 HOT 2
- GCC < 8: error: 'get_sso_capacity' was not declared in this scope HOT 3
- MSVC x86: error C3861: '_BitScanReverse64': identifier not found HOT 1
- append() method with value_type HOT 5
- incorrect string comparison behaviour after a substr operation HOT 1
- Is it possible to do case-insensitive comparison with the tiny-utf8? HOT 3
- erase-remove idiom does not work as expected HOT 8
- Support for string view? HOT 4
- `cpp_str()` impossible to be used with `u8string` HOT 2
- is it possible to make tiny-utf8 case insensitive? HOT 4
- Tricky to use with MSVC and C++20 HOT 7
- RFC: Introduce raw_size() and make size equal to length.
- Incorrect string size of the constructor of tiny-utf8(not only use LITLEN)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tiny-utf8.