lang / unicode_utils Goto Github PK
View Code? Open in Web Editor NEWUnicode algorithms for Ruby 1.9
Home Page: http://unicode-utils.rubyforge.org/
License: BSD 2-Clause "Simplified" License
Unicode algorithms for Ruby 1.9
Home Page: http://unicode-utils.rubyforge.org/
License: BSD 2-Clause "Simplified" License
irb(main):001:0> require 'unicode_utils/upcase'
=> true
irb(main):002:0> puts UnicodeUtils.upcase('абвгдеёжзийклмнопрстуфхцчшщъыьэюя')
абвгде╤жзийклмноп└┴┬├─┼╞╟╚╔╩╦╠═╬╧
=> nil
Expected result is:
АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
I substituted the Rails titlecase method for UnicodeUtils::titlecase, because of UTF-8 characters. I noticed the behavior is not exactly the same, even when no special chars are involved. For example:
Capitalized letter within word:
> "CompanyName".titlecase
=> "Company Name"
> UnicodeUtils.titlecase "CompanyName"
=> "Companyname"
Words than contain periods:
> "Company X.Y.Z.".titlecase
=> "Company X.Y.Z."
> UnicodeUtils.titlecase "Company X.Y.Z."
=> "Company X.y.z."
Is this intentional for some reason? It would be interesting if unicode_utils
preserved the original behavior and dealt only with the special chars.
I'm getting this error on https://github.com/lang/unicode_utils/blob/master/lib/unicode_utils/read_cdata.rb#L183 and
https://github.com/lang/unicode_utils/blob/master/lib/unicode_utils/read_cdata.rb#L199
I get over it when I move down the cat_buffer definition into the while block but I dont know whether this create another subtle bug.
Is there a fork/replacement that attempts to be a drop-in replacement? I am asking as for example grosser/sort_alphabetical#20 requires replacing this dead, unmaintained, outdated library.
I keep getting the message
unicode_utils requires Ruby version >= 1.9.1.
even if ruby --version shows
ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]
Hello, this library doesn't seem to be maitained anymore (no more commit) - which solution would you now recommand for unicode normalization under Ruby 1.9?
The "unicode_utils" gem seems not to have a license at all. Unless a license that specifies otherwise is included, nobody else can use, copy, distribute, or modify that library without being at risk of take-downs, shake-downs, or litigation.
I know, that this gem has a license on github, however it's missing one at rubygems and in a gemspec.
There was a UTR#30 for 'ascii folding'. While it's been withdrawn as part of the Unicode standard, many people find it useful anyway -- for instance Solr/Lucene still supports it with their ICUFoldingFilterFactory
Here are what I think are the relevant unicode ".txt" source files with mappings to implement UTR#30, from the lucene source: https://github.com/apache/lucene-solr/tree/trunk/lucene/analysis/icu/src/data/utr30
I note that unicode_utils uses these same unicode .txt mapping source files as definitions to implement the parts of unicode it does implement.
So that would probably make it pretty feasible to do UTR#30 too. Even though it's not part of unicode, some people are still finding it useful and have need of it (including myself).
Are you interested in unicode_utils supporting UTR#30? I could try to create a pull request, although it would take me a while to figure out what to actually do with the .txt mapping definition files to fit them into unicode_util properly, it's possible you could do it in only a few minutes if you were interested.
Hi,
I wanted to install this gem on a ree (Ruby Entreprise Edition) version of Ruby. But as it requires ruby >= 1.9.1, it can't be installed.
Could you do a quick fix to allow the installation on a ree version of Ruby ?
Thx !
Kulgar.
Put the documentation on dropbox or something in plaintext as a fallback, the lack of it is bringing the project to a halt.
I've installed unicode_utils on R2.0 on windows 7 x64. When I execute this:
require "time"
beginning_time = Time.now
require "unicode_utils"
end_time = Time.now
puts "Time elapsed #{(end_time - beginning_time)} seconds"
it outputs "Time elapsed 94.805422 seconds" (or something to that effect -- times vary between 93 and 101 seconds"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.