Coder Social home page Coder Social logo

icu-dotnet's Introduction

icu.net

Overview

icu-dotnet is the C# wrapper for a subset of ICU.

ICU is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.

NuGet version (icu.net) Build, Test and Pack

Usage

This library provides .NET classes and methods for (a subset of) the ICU C API. Please refer to the ICU API documentation. In icu.net you'll find classes that correspond to the C++ classes of ICU4C.

Although not strictly required it is recommended to call Icu.Wrapper.Init() at the start of the application. This will allow to use icu.net from multiple threads (c.f. ICU Initialization and Termination). Similarly, it might be beneficial to call Icu.Wrapper.Cleanup() before exiting.

Sample code:

    static class Program
    {
        public static void Main(string[] args)
        {
            Icu.Wrapper.Init();
            // Will output "NFC form of XA\u0308bc is XÄbc"
            Console.WriteLine($"NFC form of XA\\u0308bc is {Icu.Normalizer.Normalize("XA\u0308bc",
                Icu.Normalizer.UNormalizationMode.UNORM_NFC)}");
            Icu.Wrapper.Cleanup();
        }
    }

Building

To build the current version of icu-dotnet you'll need .NET 8.0 installed.

icu-dotnet can be built from the command line as well as Visual Studio or JetBrains Rider.

Running Unit Tests

You can build and run the unit tests by running:

dotnet test source/icu.net.sln

or, if wanting to run tests on just one specific .net version (v8.0 in this example):

dotnet test source/icu.net.sln -p:TargetFramework=net8.0

Linux and macOS

It is important for icu.net.dll.config to be bundled with your application when not running on Windows. If it doesn't copy reliably to the output directory, you might find adding something like the following to your csproj file will resolve the issue. Note that the version number in the path must match the version number of icu.net that is referenced in the project.

<ItemGroup>
  <None Update="$(NuGetPackageRoot)\icu.net\2.9.0\contentFiles\any\any\icu.net.dll.config">
    <CopyToOutputDirectory>Always</CopyToOutputDirectory>
  </None>
</ItemGroup>

Docker

icu-dotnet depends on libc dynamic libraries at run time. If running within Docker, you may need to install them, for example:

FROM mcr.microsoft.com/dotnet/aspnet:3.1

# Install system dependencies.
RUN apt-get update \
    && apt-get install -y \
        # icu.net dependency: libdl.so
        libc6-dev \
     && rm -rf /var/lib/apt/lists/*

...

ICU versions

Linux

icu-dotnet links with any installed version of ICU shared objects. It is recommended to install the version provided by the distribution. As of 2016, Ubuntu Trusty uses version ICU 52 and Ubuntu Xenial 55.

If the version provided by the Linux distribution doesn't match your needs, Microsoft's ICU package includes builds for Linux.

Windows

Rather than using the full version of ICU (which can be ~25 MB), a custom minimum build can be used. It can be installed by the Icu4c.Win.Min nuget package. The full version of ICU is also available as Icu4c.Win.Full.Lib and Icu4c.Win.Full.Bin.

Microsoft also makes the full version available as Microsoft.ICU.ICU4C.Runtime.

What's in the minimum build

  • Characters
  • ErrorCodes
  • Locale
  • Normalizer
  • Rules-based Collator
  • Unicode set to pattern conversions

macOS

macOS doesn't come preinstalled with all the normal icu4c libraries. They must be installed separately. One option is to use MacPorts. The icu package on MacPorts has the icu4c libraries needed for icu.net to run properly.

If the icu4c libraries are not installed in a directory that is in the system path or your application directory, you will need to set an environment variable for the OS to find them. For example:

export DYLD_FALLBACK_LIBRARY_PATH="$HOME/lib:/usr/local/lib:/usr/lib:/opt/local/lib"

If you need to set environment variables like the above, consider adding them to your .zprofile so you don't have to remember to do it manually.

Troubleshooting

  • make sure you added the nuget package icu.net and have native ICU libraries available.
  • the binaries of the nuget packages need to be copied to your output directory. For icu.net this happens by the assembly reference that the package adds to your project. The binaries of Icu4c.Win.Min are only relevant on Windows. They will get copied by the Icu4c.Win.Min.targets file included in the nuget package.

On Windows, the package installer should have added an import to the *.csproj file similar to the following:

<Import Project="..\..\packages\Icu4c.Win.Min.54.1.31\build\Icu4c.Win.Min.targets"
    Condition="Exists('..\..\packages\Icu4c.Win.Min.54.1.31\build\Icu4c.Win.Min.targets')" />

Contributing

We love contributions! The library mainly contains the functionality we need for our products. If you miss something that is part of ICU4C but not yet wrapped in icu.net, add it and create a pull request.

If you find a bug - create an issue on GitHub, then preferably fix it and create a pull request!

icu-dotnet's People

Contributors

andrew-polk avatar atlastodor avatar cambell-prince avatar cbersch avatar conniey avatar darcywong00 avatar davidmoore1 avatar ddaspit avatar ermshiperete avatar hahn-kev avatar imnasnainaec avatar jasonleenaylor avatar jeffska avatar johnthagen avatar johnthomson avatar josephmyers avatar lgtm-com[bot] avatar lyonsil avatar mark-sil avatar mccarthyrb avatar micahkimel avatar murata2makoto avatar neilmayhew avatar nightowl888 avatar papeh avatar paxerit avatar stephenmcconnel avatar t-zalewski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

icu-dotnet's Issues

Make BreakIterator more its icu4c counterpart

It would be easier to consume BreakIterator if it was set-up similar to the way it is in icu4c and icu4j, as a class, since it is being passed as method and constructor parameters and because it is meant to be an extension point where you can design your own word breaking if you need to customize the default ICU behavior.

I want to modify BreakIterator to:

  • Be a class rather than static
  • Add methods similar to the ones from icu::BreakIterator
  • Move the enumerations such as UBreakIteratorType, ULineBreakTag outside of BreakIterator.

So the public API would look like:

using System;

public BreakIterator : IDisposable
{
    // New ctor, methods and properties
    protected BreakIterator(UBreakIteratorType type, Locale locale, string text);

    // includeSpacesAndPunctuation only applies to UBreakIteratorType.Word, so most of the time it is true.
    protected BreakIterator(UBreakIteratorType type, Locale locale, string text, bool includeSpacesAndPunctuation);

    public virtual Boundary Next { get; }
    public virtual Boundary Current { get; }
    public virtual Boundary Previous { get; }
    public virtual Boundary First { get; }
    public virtual Boundary Last { get; }

    public virtual Boundary[] Boundaries { get; }
    public virtual string Text { get; }
    public Locale Locale { get; }

    public virtual void SetText(string text);

    public static BreakIterator CreateCharacterInstance(Locale locale, string text);
    public static BreakIterator CreateWordInstance(Locale locale, string text);
    public static BreakIterator CreateLineInstance(Locale locale, string text);
    public static BreakIterator CreateSentenceInstance(Locale locale, string text);
    public static BreakIterator CreateWordInstance(Locale locale, string text);

    // Existing methods
    public static IEnumerable<string> Split(UBreakIteratorType type, Locale locale, string text);
    public static IEnumerable<string> Split(UBreakIteratorType type, string locale, string text);
    public static IEnumerable<Boundary> GetWordBoundaries(Locale locale, string text, bool includeSpacesAndPunctuation);
    public static IEnumerable<Boundary> GetWordBoundaries(string locale, string text, bool includeSpacesAndPunctuation);
    public static IEnumerable<Boundary> GetBoundaries(UBreakIteratorType type, Locale locale, string text);
}

Would you be opposed to this?

Add Tizen platform support

Tizen is not supported

A Tizen C# application crashes by NullReferenceException when it attempts to initialize the icu.net. Debugging the issue led to realization that icu.net adds version suffix when loading a native function from .so file. Such suffixes are not used in the libicuuc.so.58.2 found on a Tizen device.

Describe the solution you'd like

I would like the icu.net to be able to use the Tizen provided libicuuc, effectively adding the Tizen support in the package.
If you agree, I would present a PR which would modify the NativeMethods to skip version suffix if the current platform is Tizen.

Describe alternatives you've considered

Creating a new, independent wrapper for libICU seems be unjustified. Forking is also less convenient for the users.

Additional context

Tizen is a Linux based system running on some Samsung devices (TVs, a few mobiles phones, etc).
The libicuc is already available on Tizen devices and provides great localization features. Since Tizen developers can now write C# application it would help if they could use the power of ICU.

ICU crash on sorting when one codepoint is missing in collation

Describe the bug

In Flex 9.0.8 (LT-20194) YurutiT project, Flex will crash when clicking the Choose Texts button. I discovered it was comparing "Yurutí Example Sentences" and "YURUTI KINSHIP TERMS" when it failed. The last i in the first word is NFD 69 301. This particular code point was not in the original collation rule

& i < ĩ <<< Ĩ << ĩ́ <<< Ĩ́

So I added it

& i < ĩ <<< Ĩ << í << ĩ́ <<< Ĩ́

and then the crash went away. So apparently ICU has some problem with this particular code point when it is not defined in the collation causing a crash. We are using an old ICU version. Perhaps the latest version would not have this crash. I would like to see us using a current version of ICU in Flex.

To Reproduce

Steps to reproduce the behavior:

  1. Restore Yuruti
  2. Go to Texts and Words
  3. Click the Choose Texts button in toolbar.
    It crashes with a windows error. Event viewer has:
    Application: FieldWorks.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an unhandled exception.
    Exception Info: System.AccessViolationException
    at Icu.NativeMethods.ucol_strcoll(SafeRuleBasedCollatorHandle, System.String, Int32, System.String, Int32)
    at Icu.Collation.RuleBasedCollator.Compare(System.String, System.String)
    at SIL.WritingSystems.IcuRulesCollator.Compare(System.String, System.String)
    at System.Collections.Generic.ArraySortHelper1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].InsertionSort(System._Canon[], Int32, Int32, System.Collections.Generic.IComparer1<System._Canon>)
    at System.Collections.Generic.ArraySortHelper1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].IntroSort(System._Canon[], Int32, Int32, Int32, System.Collections.Generic.IComparer1<System._Canon>)
    at System.Collections.Generic.ArraySortHelper1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].IntroSort(System._Canon[], Int32, Int32, Int32, System.Collections.Generic.IComparer1<System._Canon>)
    at System.Collections.Generic.ArraySortHelper1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].IntrospectiveSort(System._Canon[], Int32, Int32, System.Collections.Generic.IComparer1<System._Canon>)
    at System.Collections.Generic.ArraySortHelper1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Sort(System._Canon[], Int32, Int32, System.Collections.Generic.IComparer1<System._Canon>)
    at System.Array.Sort[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System._Canon[], Int32, Int32, System.Collections.Generic.IComparer1<System._Canon>) at System.Collections.Generic.List1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Sort(System.Comparison`1<System.__Canon>)
    at SIL.FieldWorks.IText.TextsTriStateTreeView.LoadTextsByGenreAndWithoutGenre()
    at SIL.FieldWorks.IText.TextsTriStateTreeView.LoadGeneralTexts()

(Ideally push a minimal solution that demonstrates the problem to https://gist.github.com)
Download YurutiT.zip from LT-20194.

Expected behavior

It should bring up the Choose Texts dialog

Screenshots

image

I was able to catch it in dnSpy 6.1.4.
image

image

Environment

  • OS: Windows 8.1
  • Exact version of icu.net 2.5.4+Branch.master.Sha.aa2e04611b4...
  • .NET Framework/Core version ??

Additional context

This is very tricky to get it to fail. Sometimes it works, but usually not.

Normalization fails for some strings

Normalizing a string that decomposes to a string that is exactly 10 bytes longer fails with the following stack trace:

Icu.WarningException : An output string could not be NUL-terminated because output length==destCapacity. 
  at Icu.ExceptionFromErrorCode.ThrowIfError (Icu.ErrorCode e, System.String extraInfo, System.Boolean throwOnWarnings) [0x00301] in /home/eberhard/Develop/icu-dotnet/source/icu.net/ErrorCode.cs:414 
  at Icu.ExceptionFromErrorCode.ThrowIfError (Icu.ErrorCode e) [0x00001] in /home/eberhard/Develop/icu-dotnet/source/icu.net/ErrorCode.cs:369 
  at Icu.NativeMethods.GetString (System.Func`3[T1,T2,TResult] lambda, System.Boolean isUnicodeString, System.Int32 initialLength) [0x00033] in /home/eberhard/Develop/icu-dotnet/source/icu.net/NativeMethods/NativeMethods.cs:426 
  at Icu.NativeMethods.GetUnicodeString (System.Func`3[T1,T2,TResult] lambda, System.Int32 initialLength) [0x00001] in /home/eberhard/Develop/icu-dotnet/source/icu.net/NativeMethods/NativeMethods.cs:414 
  at Icu.Normalization.Normalizer2.Normalize (System.String src) [0x0002c] in /home/eberhard/Develop/icu-dotnet/source/icu.net/Normalization/Normalizer2.cs:226 

Overflow exception is thrown by Transliterate method for certain characters

We are getting an Overflow exception while passing certain characters for conversion to Translieterate method. for eg: if we pass ﷺ as input below

using (var transliteratorLatn = Icu.Transliterator.CreateInstance("Any-Latn"))
{
    return transliteratorLatn.Transliterate(input);
}

If we increase the textMultiplier size to 20 instead of default 3 . i.e. if we do this
transliteratorLatn.Transliterate(input, 20);

then the it works. However we can't be always sure what that second parameter would be ?
Can you please help in resolving the issue?
Also would it have any performance impact in terms of memory consumption increasing the value of textMultiplier as we are upping the buffer value thats gonna hold the converted string?

Below is the exception stack trace
image

v2.3.3 throws System.BadImageFormatException

Hi,

I've just updated from 2.3.2 to 2.3.3 and I get the following exception:

System.BadImageFormatException : Could not load file or assembly 'icu.net, Version=2.3.3.0, Culture=neutral, PublicKeyToken=416fdd914afa6b66' or one of its dependencies. An attempt was made to load a program with an incorrect format.

For IIS to work with icu.net I need let the app pool allow 32 bit applications.
My test solution's platform is set to "ANY CPU" but fails to load icu.net as well.

Any changes from 2.3.2 ?

Normalizer failing with BUFFER_OVERFLOW_ERROR on some strings

We are using randomized testing which basically converts random bytes into chars. But I have discovered that certain randomly generated strings cause Normalizer.Normalize() to fail with BUFFER_OVERFLOW_ERROR. Here is a test that you can pop into the NormalizerTests class to see the failures happening.

        [Test]
        public void TestNormalizerOverflowError()
        {
            string normalized, input;

            input = "⒆⑵Ⓖ⒭⓫⒄⒱ⓞ";
            normalized = Normalizer.Normalize(input, Normalizer.UNormalizationMode.UNORM_NFKC);

            input = "㎞㌻㌵㍑㍑";
            normalized = Normalizer.Normalize(input, Normalizer.UNormalizationMode.UNORM_NFKC);

            input = "㌎㍊㌵㌇㌿";
            normalized = Normalizer.Normalize(input, Normalizer.UNormalizationMode.UNORM_NFKC);
        }

2.3.4 throwing System.AccessViolationException

Randomly I get the following exception:

Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.AccessViolationException
at Icu.NativeMethods.ubrk_next(IntPtr)
at Icu.RuleBasedBreakIterator.SetText(System.String)

To Reproduce

Cannot reproduce it locally.
This has been an issue on earlier versions as discussed at #56

Environment

  • OS: Windows Server 2012R2 64 bit
  • icu.net 2.3.4 with Icu4c.Win.Full.Lib 56.1.0
  • Framework Version: v4.0.30319

Windows build is broken out of the box

Describe the bug

Building the project using the instructions in README.md fails on a fresh clone on Windows using Visual Studio 2019

To Reproduce

  • Clone the project
  • Open x64_x86 Cross Tools Command Prompt for VS 2019 and cd to the repository root
  • Run msbuild /t:Test build/icu-dotnet.proj
    Expected: project builds and tests pass
    Actual: build fails with
"C:\fwrepo\icu-dotnet\build\icu-dotnet.proj" (Test target) (1) ->
"C:\fwrepo\icu-dotnet\source\icu.net.sln" (Rebuild target) (3) ->
(ValidateSolutionConfiguration target) ->
  C:\fwrepo\icu-dotnet\source\icu.net.sln.metaproj : error MSB4126: The specified solution configuration "Release|x86"
is invalid. Please specify a valid solution configuration using the Configuration and Platform properties (e.g. MSBuild
.exe Solution.sln /p:Configuration=Debug /p:Platform="Any CPU") or leave those properties blank to use the default solu
tion configuration. [C:\fwrepo\icu-dotnet\source\icu.net.sln]

System.ValueTuple 4.5.0 doesn't load on netcoreapp2.0

Describe the bug

As the title says, the System.ValueTuple library version 4.5.0 is not compatible with .NET Core 2.0. It is only compatible with .NET Core 2.1 and above.

To Reproduce

Steps to reproduce the behavior:

  1. Create a .NET Core project targeting 2.0 (netcoreapp2.0)
  2. Add a reference to icu.net NuGet package 2.5.4
  3. Attempt to utilize any of the functionality (we are using RuleBasedBreakIterator, but I don't think it matters)
  4. You will get an error similar to the following:

Result StackTrace:
at Lucene.Net.Analysis.Th.TestThaiAnalyzer.SetUp() in F:\Projects\lucenenet\src\Lucene.Net.Tests.Analysis.Common\Analysis\Th\TestThaiAnalyzer.cs:line 40
--FileNotFoundException
at Icu.NativeMethods.GetString(Func`3 lambda, Boolean isUnicodeString, Int32 initialLength)
at Icu.Locale..ctor(String localeId)
at Lucene.Net.Support.IcuBreakIterator..ctor(UBreakIteratorType type, CultureInfo locale) in F:\Projects\lucenenet\src\dotnet\Lucene.Net.ICU\Support\IcuBreakIterator.cs:line 64
at Lucene.Net.Analysis.Th.ThaiTokenizer..cctor() in F:\Projects\lucenenet\src\Lucene.Net.Analysis.Common\Analysis\Th\ThaiTokenizer.cs:line 49
Result Message:
System.TypeInitializationException : The type initializer for 'Lucene.Net.Analysis.Th.ThaiTokenizer' threw an exception.
----> System.IO.FileNotFoundException : Could not load file or assembly 'System.ValueTuple, Version=4.0.3.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51'. The system cannot find the file specified.

Expected behavior

A clear and concise description of what you expected to happen.

The package should not just compile, but also load.

Environment

  • OS: Windows 10
  • Exact version of icu.net: 2.5.4
  • .NET Framework/Core version: .NET Core 2.0

Additional context

Note that there are only 2 ways to fix this:

  1. Target .NET Core 2.1 on the consuming project
  2. Downgrade System.ValueTuple to 4.3.0, as described on StackOverflow (see the comments at the bottom)

We would appreciate a stable patch of 59.1.15 ASAP

Support loading icu libraries on MacOS

Using icu.net on a Mac crashes trying to load libdl.so.2.

I'd like to see a NativeMethods solution for Mac which uses libdl.dylib instead of libdl.so.2

Incorrect return type for Normalizer2.GetCombiningClass

When you call this method in Watch mode you get these results for á:
normalizer.GetCombiningClass(97) = x100 (needs to be x0 (0))
normalizer.GetCombiningClass(769) = xfee6 (needs to be xe6 (230))

GetCombiningClass is an ICU method defined as
virtual uint8_t icu::Normalizer2::getCombiningClass(UChar32 c) const

Notice this returns an unsigned 8-bit value. We need to change the above code to return an unsigned 8-bit value, or at least mask the int appropriately. This is the only ICU:Normalizer2 method that returns uint8_t.

See https://jira.sil.org/browse/LT-20408?focusedCommentId=225567&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-225567 for more details.

Build warning on .NET 8

Describe the bug

Build warnings after adding Icu4c.Win.Min

:\Program Files\dotnet\sdk\8.0.300\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.targets(284,5): warning NETSDK1206: Found version-specific or distribution-specific runtime identifier(s): win7-x64, win7-x86. Affected libraries: Icu4c.Win.Min. In .NET 8.0 and higher, assets for version-specific and distribution-specific runtime identifiers will not be found by default. See https://aka.ms/dotnet/rid-usage for details.

.csproj contains

<PackageReference Include="icu.net" />
<PackageReference Include="Icu4c.Win.Min" />

To Reproduce

Install Icu4c.Win.Min on a .NET 8 project

Expected behavior

No warnings

libdl.so not found

icu.net crashes on Ubuntu unless libc6-dev package is installed. Installing that package shouldn't be necessary on an end user's machine. (WS-502)

[Question] Features for icu.net

Hi,

I am currently migrating Lucene.NET to .NET Core and as a part of that, use icu.net. The other .NET icu4c library uses a Managed C++ wrapper, so your project is the most feasible for compiling/running on .NET Core. (Couchbase-lite, Orchard CMS and a few other projects are also considering using your library when they move to .NET Core.)

  • Would you be OK with creating an icu.net package that contains the full 22MB binaries?
  • Are you interested in creating a single NuGet package that targets x64 and x86. I've utilised logic from libgit4sharp to add the right paths when loading the native binaries.
  • Interest in migrating this library to and releasing .NET Core package on NuGet.org

I've already done the work, but I wanted to get your consent before I add a PR because it would require your build server to have Visual Studio 2015 (Community or higher).

System.DllNotFoundException: Unable to load shared library 'libdl.so' on Ubuntu 22.04

Describe the bug

ICU.NET crashes on Ubuntu 22.04 when trying to load ICU, because "libdl.so" cannot be found. "libdl.so" no longer exists on Ubuntu 22.04. You must use "libdl.so.2".

To Reproduce

Steps to reproduce the behavior:

  1. Call any ICU.NET method on Ubuntu 22.04

Expected behavior

The code should not throw an exception.

Environment

  • OS: Ubuntu 22.04
  • Exact version of icu.net: 2.8.1
  • .NET Framework/Core version: .NET 6

DllResolver fails with Ubuntu 22.04 (no libdl.so)

Describe the bug

Upon updating The Combine from 20.04 to 22.04, our backend encounters the following: System.DllNotFoundException : Unable to load shared library 'libdl.so' or one of its dependencies. This appears to be from the configuration in https://github.com/sillsdev/icu-dotnet/blob/master/source/icu.net/App.config that is enforced by https://github.com/sillsdev/icu-dotnet/blob/master/source/icu.net/NativeMethods/DllResolver.cs.

The error is further documented in sillsdev/TheCombine#1768

Environment

  • Ubuntu 22.04
  • icu.net 2.7.1 & 2.8.1

Expected icu*.dlls are not loaded at runtime

Description

The code for dynamically loading the icu libraries at runtime is causing some unexpected behaviour.

  • If you have a newer set of icu assemblies in your PATH because of another program, it ends up loading those assemblies rather than the installed ones from the package Icu4c.Win.Full.Lib.54.1.3-beta3. Repro below.
  • Running tests outside of VS results in failures because it is unable to load icu libraries. (ie, running D:\git\tools\nunit3-console.exe D:\git\icu.net\output\Debug\icu.net.tests.dll when from a directory that is not where icu.net.tests.dll is located.)
  • Makes it really difficult to support .NET Core loading of packages. Packages are no longer continuously downloaded and stored in the repository (under packages folder). It is stored in a NuGet cache (ie. %USERPROFILE%\.nuget\packages). As a result, when running on .NET Core, the dependencies are resolved in their cache folder and not actually copied to the program's binary output folder. The .NET runtime knows how resolve these runtime dependencies when you use [DllImport] but since we are calling a Windows function LoadLibrary, it bypasses that logic.

Repro Steps:

  1. Copy the test case below into a test file in icu.net.tests.dll
  2. Have a newer version of the icu dlls in a directory that is in your PATH.
    • In my case, I have MiKTeX installed on my machine. It comes with icu version 57.1 installed.
  3. Run the test below.
public class TestClass
{
    [Test]
    public void TestForCorrectVersion()
    {
        var version = Wrapper.IcuVersion;
        Assert.AreEqual("54.1", version);
    }   
}

Expected Behaviour

The test passes because the Icu4c.Win.Full.Lib contains v54.1 libraries.

Actual Behaviour

Test fails because it loads C:\Program Files (x86)\MiKTeX 2.9\miktex\bin\icuuc57.dll and outputs 57.1.

Proposed Solution

One solution is to revert the dynamic loading with [DllImport] and manage the native library names with an approach that LibGit2Sharp uses. They have a setup that allows for the NativeDllName.Name to be generated at build time so it is not hard-coded. Consequently, their NativeMethods looks like the one below:

internal static class NativeMethods
{
        private const string libgit2 = NativeDllName.Name;

        [DllImport(libgit2)]
        internal static extern unsafe GitError* giterr_last();
}

They have two packages similar to icu.net called LibGit2Sharp.NativeBinaries and LibGit2Sharp.

LibGit2Sharp.NativeBinaries contains:

The LibGit2Sharp project has a dependency on the LibGit2Sharp.NativeBinaries package. At build time, it has a couple of .targets that run to generate the correct native library name.

  • NativeDllName.targets: Takes the embedded resource libgit2_filename.txt and calls their custom build task to generate the NativeDllName class.
  • GenerateNativeDllNameTask: A MSbuild task that reads libgit2_filename and outputs the NativeDllName class.

Considerations

Two nuget packages of icu.net would have to be made. One for the minimal ICU build and other for the full ICU build because the dependencies are linked to the project.

Mark icu-dotnet CLSCompliant(true)

Since the assembly is not marked CLS compliant, it is not possible to consume icu-dotnet and expose any of its types publicly in a CLS compliant assembly. Not marking with [assembly: CLSCompliant(true)] means all types in the library are not CLS compliant, including interfaces and enumerations.

It would be helpful for consumers of this class who are CLS compliant if icu-dotnet were also CLS compliant (even if some of its types need to be marked CLSCompliant(false) to meet this requirement).

[Question] Can't load ICU library

I hope you don't mind asking you a question.
I've installed both icu.net and Icu4c.Win.Full.Lib

  <package id="icu.net" version="2.1.0-beta0017" targetFramework="net45" />
  <package id="Icu4c.Win.Full.Lib" version="56.1.5" targetFramework="net45" />

All dll's are being copied to the output directory:
icu,net.dll
icudt56.dll
icuin56.dll
icuio56.dll
icule56.dll
iculx56.dll
icutest55.dll
icutu55.dll
icuuc56.dll

The nugget packages created the proper .csproj configurations:
<Import Project="..\packages\Icu4c.Win.Full.Lib.56.1.5\build\Icu4c.Win.Full.Lib.targets" Condition="Exists('..\packages\Icu4c.Win.Full.Lib.56.1.5\build\Icu4c.Win.Full.Lib.targets')" />

Nevertheless, I keep getting the following exception when using a RuleBasedBreakIterator:

System.IO.FileLoadException was unhandled by user code
  FileName=icuuc
  HResult=-2146232799
  Message=Can't load ICU library
  Source=icu.net
  StackTrace:
       at Icu.NativeMethods.LoadIcuLibrary(String libraryName) in c:\JenkinsSlaveHome\slave2\workspace\IcuDotNet-Win-any-master-release\source\icu.net\NativeMethods.cs:line 138
       at Icu.NativeMethods.get_IcuCommonLibHandle() in c:\JenkinsSlaveHome\slave2\workspace\IcuDotNet-Win-any-master-release\source\icu.net\NativeMethods.cs:line 90
       at Icu.NativeMethods.uloc_canonicalize(String localeID, IntPtr name, Int32 nameCapacity, ErrorCode& err) in c:\JenkinsSlaveHome\slave2\workspace\IcuDotNet-Win-any-master-release\source\icu.net\NativeMethods.cs:line 1423
       at Icu.Locale.GetString(GetStringMethod method, String localeId) in c:\JenkinsSlaveHome\slave2\workspace\IcuDotNet-Win-any-master-release\source\icu.net\Locale.cs:line 197
       at Icu.Locale..ctor(String localeId) in c:\JenkinsSlaveHome\slave2\workspace\IcuDotNet-Win-any-master-release\source\icu.net\Locale.cs:line 26
       at ReviewPortal.TextTools.TextMeasurer.WrapText(Line line, Font font) in C:\Projects\display-text-system\cjk-linebreaking\ReviewPortal\TextTools\TextMeasurer.cs:line 159
       at ReviewPortal.Web.Controllers.DisplayController.WrapText(IEnumerable`1 textContainers) in C:\Projects\display-text-system\cjk-linebreaking\ReviewPortal.Web\Controllers\DisplayController.cs:line 207
       at lambda_method(Closure , ControllerBase , Object[] )
       at System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters)
       at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters)
       at System.Web.Mvc.Async.AsyncControllerActionInvoker.<BeginInvokeSynchronousActionMethod>b__39(IAsyncResult asyncResult, ActionInvocation innerInvokeState)
       at System.Web.Mvc.Async.AsyncResultWrapper.WrappedAsyncResult`2.CallEndDelegate(IAsyncResult asyncResult)
       at System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeActionMethod(IAsyncResult asyncResult)
       at System.Web.Mvc.Async.AsyncControllerActionInvoker.AsyncInvocationWithFilters.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3d()
       at System.Web.Mvc.Async.AsyncControllerActionInvoker.AsyncInvocationWithFilters.<>c__DisplayClass46.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3f()
  InnerException: 

Do you have any idea how to properly load the library ?
Cheers!

NativeMethods should use libdl.so.2 instead of libdl.so

Describe the bug

ICU.net fails to initialize on Ubuntu version >= 22. The exception says "System.DllNotFoundException: Unable to load shared library 'libdl.so' or one of its dependencies". Here is a stack trace:

   at Icu.NativeMethods.dlopen(String file, Int32 mode)
   at Icu.NativeMethods.GetIcuLibHandle(String basename, Int32 icuVersion)
   at Icu.NativeMethods.LoadIcuLibrary(String libraryName)
   at Icu.NativeMethods.get_IcuCommonLibHandle()
   at Icu.NativeMethods.u_init(ErrorCode& errorCode)
   at Icu.Wrapper.Init()

To Reproduce

Call Init() on Ubutunu.

Expected behavior

ICU.net should use libdl.so.2. This issue has been seen by others in different projects on SO: https://stackoverflow.com/questions/75855053/how-to-address-crash-due-to-missing-libdl-so-on-ubuntu-22.

Icu.Wrapper.Cleanup crash on net6 linux

Hello!
We are migrating to net6 and using this library in some projects.
In net6 when calling Icu.Wrapper.Cleanup the application crashes with following:
[7354115.440346] .NET ThreadPool[1337354]: segfault at 8 ip 00007f596b5e8d92 sp 00007f557effbb40 error 4 in libicui18n.so.67.1[7f596b53e000+17a000]
This only happens on linux, we have tried debian 11, alpine, ubuntu and even debian 10 with the same version of libc6-dev as we used in net5 (which doesn't crash on net5).

Environment

  • OS: debian10/debian11/ubuntu/alpine
  • icu.net 2.8.1/2.7.1
  • net6

Wrapper not using full libraries

Describe the bug

I keep getting the following exception even with the full libraries.
System.MissingMemberException: 'Do you have the full version of ICU installed? The method 'ubrk_open' is not included in the minimal version of ICU.'

To Reproduce

Icu.Wrapper.Init();
var locale = new Icu.Locale("en_EN");
var test = Icu.BreakIterator.CreateWordInstance(locale);
test.SetText("The file could not be read:");

I get the exception on the last line.

Environment

I have the following nuget packages installed via visual studio 16.8.4 windows 10 in a .net core 5 project:
icu.net 2.6.0
Icu4cWin.Full.Bin 59.1.15
Icu4cWin.Full.Lib 59.1.15

RuleBasedCollator leaks memory

RuleBasedCollator has a field of type SafeRuleBasedCollatorHandle which implements IDisposable. We have to dispose that field. For that Collator should implement IDisposable.

On Windows the tests currently throw an exception in the finalizer because of this (which currently gets ignored on Jenkins because of a bug in the NUnit task).

While these are only failures in tests and could be worked around by checking if ICU's u_cleanup method has been called prior to calling u_close on the collator handle, this shows a leakage (see icu doc).

Improve loading of packages for .NET Core

With .NET Core packages are no longer continuously downloaded and stored in the repository (under packages folder). It is stored in a NuGet cache (ie. %USERPROFILE%.nuget\packages). As a result, when running on .NET Core, the dependencies are resolved in their cache folder and not actually copied to the program's binary output folder. The .NET runtime knows how resolve these runtime dependencies when you use [DllImport] but since we are calling a Windows function LoadLibrary, it bypasses that logic.

See #20

Icu library not compatible with alpine 3.16

We are using below icu nuget packages
image

And below is the code

 using (var transliteratorLatn = Icu.Transliterator.CreateInstance("Any-Latn"))
                {
                    return transliteratorLatn.Transliterate(input, textMultiplier);
                }

Everything was working fine with alpine3.15 and icu=69.1-r1. However moving to alpine3.16 and icu=71.1-r2 the above code started to give the below exception
image

Did already raised a bug report at dotnet/dotnet-docker#3851 to fix the error in the build pipeline.

However, for the exception occuring in the code it was advised the icu.net repo is not compatible with alpine3.16. Can you please help?

Expose the Case Fold tokenizer

I need to utilize the Case Fold tokenizer ("nfkc_cf") but it's not exposed. I made the necessary changes and submitted a pull request: #87

Great if icu-dotnet can use Windows-supplied icu dlls.

Summary

Recent Windows 10 (since version 1703) come with ICU4C, in particular icuuc.dll, icuin.dll, and icudt.dll, as documented here. It's great if icu-dotnet can use them, as an option or at least when no icu dlls are provided in the folder where icu.net.dll is in.

Is your feature request related to a problem? Please describe.

I wrote several small tools using icu-dotnet and felt that the installation footprint of several MB (even when using Icu4c.Win.Min binaries) is too large when compared to the main parts of my tools (less than 50KB each)...

Describe the solution you'd like

icu-dotnet to search the icu dlls in Windows' system32 folder when it is running on Windows and doesn't find ones in the standard folders (in the current version).

Describe alternatives you've considered

  • icu-dotnet to provide a config option to only look for Windows' version of icu dlls,
  • a separate version of icu-dotnet dedicated for Windows that always use Windows' version of icu dlls, and
  • a version of icu4c.Win.Min and/or icu4c.Win.Full.Bin containing dlls that internally pass through to Windows' version of the icu dlls.

Additional context

The ICU version that Windows bundles is controlled only by Microsoft, so I think it is a good idea that icu-dotnet can use application-provided icu dlls of a particular version. Use of Windows' version of icu is for programs that are not sensitive to ICU (primarily data) version differences.

Support OSX

macOS uses dylib files instead of so and the file pattern is libicuuc.[version].dylib instead of libicuuc.so.[version], so our current approach for loading unmanaged ICU doesn't work.

The relevant code is in source/icu.net/NativeMethods/NativeMethods.cs.

See discussion on Slack.

Signature and threading issues

When getting the latest version of ICU.NET, I noticed 2 things:

  1. The signature of u_charType is incorrect: the return value is not an int, but an sbyte:
			[UnmanagedFunctionPointer(CallingConvention.Cdecl, CharSet = CharSet.Unicode)]
			internal delegate sbyte u_charTypeDelegate(int characterCode);

This will make the IsSymbol and GetCharType pass.

  1. The initialization of the ICU4C library in multithreading scenarios is incorrect. See http://userguide.icu-project.org/design#TOC-ICU-Initialization-and-Termination. This will yield crashes in multithreaded scenarios like Lucene.Net, which are extremely puzzling.
    Sadly, this isn't easily corrected: the whole NativeMethods isn't written to be thread-friendly.

AccessViolationException from BreakIterator

We are getting an intermittent AccessViolationException when using BreakIterator in a concurrent test. This issue has been a particular thorn because on .NET Core it causes the NUnit test runner to fatally crash.

The exception only happens in the case where:

  1. There is an open FileStream being used to read a file.
  2. The scenario is being run as an NUnit test, xunit seems to work fine and so does a console application.
  3. There is more than one thread calling BreakIterator.GetWordBoundaries or BreakIterator.GetBoundaries at the same time.
  4. Some particular strings are being used when calling BreakIterator.

It is a bit suspicious that it only happens when running under NUnit, but the stack trace seems to start inside of Icu.NativeMethods.ubrk_next(IntPtr bi).

[10/22/2017 9:23:22 PM Warning] System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at Icu.NativeMethods.ubrk_next(IntPtr bi) in d:\JenkinsSlaveHome\slave2\workspace\icu-dotnet_master-XUIQATPXJBEMAPQAXECBOMRZFIJEQ5ACEARUZRESUNQWYLGLR56Q\source\icu.net\NativeMethods.cs:line 1732
   at Icu.RuleBasedBreakIterator.SetText(String text) in d:\JenkinsSlaveHome\slave2\workspace\icu-dotnet_master-XUIQATPXJBEMAPQAXECBOMRZFIJEQ5ACEARUZRESUNQWYLGLR56Q\source\icu.net\RuleBasedBreakIterator.cs:line 377
   at Icu.BreakIterator.GetBoundaries(UBreakIteratorType type, Locale locale, String text, Boolean includeSpacesAndPunctuation) in d:\JenkinsSlaveHome\slave2\workspace\icu-dotnet_master-XUIQATPXJBEMAPQAXECBOMRZFIJEQ5ACEARUZRESUNQWYLGLR56Q\source\icu.net\BreakIterator.cs:line 386
   at Icu.BreakIterator.GetWordBoundaries(Locale locale, String text, Boolean includeSpacesAndPunctuation) in d:\JenkinsSlaveHome\slave2\workspace\icu-dotnet_master-XUIQATPXJBEMAPQAXECBOMRZFIJEQ5ACEARUZRESUNQWYLGLR56Q\source\icu.net\BreakIterator.cs:line 368
   at IcuAVE.AVETest.RunAVETest(CountdownEvent latch, String fileName) in f:\Users\Shad\documents\visual studio 2017\Projects\IcuAVE\IcuAVE\AVETest.cs:line 50
   at IcuAVE.AVETest.<>c__DisplayClass0_0.<TestAccessViolationException>b__0() in f:\Users\Shad\documents\visual studio 2017\Projects\IcuAVE\IcuAVE\AVETest.cs:line 35
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()
   at Icu.NativeMethods.ubrk_next(IntPtr bi) in d:\JenkinsSlaveHome\slave2\workspace\icu-dotnet_master-XUIQATPXJBEMAPQAXECBOMRZFIJEQ5ACEARUZRESUNQWYLGLR56Q\source\icu.net\NativeMethods.cs:line 1732
   at Icu.RuleBasedBreakIterator.SetText(String text) in d:\JenkinsSlaveHome\slave2\workspace\icu-dotnet_master-XUIQATPXJBEMAPQAXECBOMRZFIJEQ5ACEARUZRESUNQWYLGLR56Q\source\icu.net\RuleBasedBreakIterator.cs:line 377
   at Icu.BreakIterator.GetBoundaries(UBreakIteratorType type, Locale locale, String text, Boolean includeSpacesAndPunctuation) in d:\JenkinsSlaveHome\slave2\workspace\icu-dotnet_master-XUIQATPXJBEMAPQAXECBOMRZFIJEQ5ACEARUZRESUNQWYLGLR56Q\source\icu.net\BreakIterator.cs:line 386
   at Icu.BreakIterator.GetWordBoundaries(Locale locale, String text, Boolean includeSpacesAndPunctuation) in d:\JenkinsSlaveHome\slave2\workspace\icu-dotnet_master-XUIQATPXJBEMAPQAXECBOMRZFIJEQ5ACEARUZRESUNQWYLGLR56Q\source\icu.net\BreakIterator.cs:line 368
   at IcuAVE.AVETest.RunAVETest(CountdownEvent latch, String fileName) in f:\Users\Shad\documents\visual studio 2017\Projects\IcuAVE\IcuAVE\AVETest.cs:line 50
   at IcuAVE.AVETest.<>c__DisplayClass0_0.<TestAccessViolationException>b__0() in f:\Users\Shad\documents\visual studio 2017\Projects\IcuAVE\IcuAVE\AVETest.cs:line 35
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()

Steps to Reproduce

  1. Clone the repository from https://github.com/NightOwl888/IcuAVE
  2. Open in Visual Studio and build the solution
  3. Under Visual Studio's Test Explorer right click the IcuAVE.NUnit.TestAccessViolationException test and choose "Run Selected Tests"
  4. Navigate to the Output pane and choose "Tests" from the "Show output from" dropdown, and check for the stack trace of the AccessViolationException
  5. If the error didn't happen the first time, try again - it doesn't occur 100% of the time the test is run.

NOTE: The test never actually fails in .NET Framework, but this exception causes a fatal crash of the NUnit test runner when running under .NET Core (when merging updates from #37).

image

Create/upload a NuGet package

I am working a on C# project that utilises ICU4C and it would be a lot easier to integrate this library in my project if I can consume it as a nuget package.

I was wondering if there are any plans to:

I have done the work (to package this library as a nuget pkg) on a private branch but am wondering how to incorporate it in your build if your team wants this feature (or just upload a single package to a nuget feed).

Thanks,
Connie

Making a thread safe Collator

Is your feature request related to a problem? Please describe.

This is sort of a question/feature request.

First of all, it seems there is a difference in design between icu4c and icu4j. The collator documentation clearly states that

ICU Collator instances cannot be shared among threads. You should open them instead, and use a different collator for each separate thread. The safe clone function is supported for cloning collators in a thread-safe fashion.

However, the component I am porting that uses the icu4j RuleBasedCollator is able to store an instance as a private variable in a class and have that variable be hit by multiple threads at once.

I tried using lock (thisObj), but somehow the Collator knows that the wrong thread is calling it even if the call itself is synchronized. The only thing that works is creating a clone per thread, which is no help if you are not the one instantiating the threads.

Describe the solution you'd like

What I would like to see is a thread safe Collator and RuleBasedCollator implementation out of the box. I was able to make this happen on my first attempt, but I am not sure how this will perform.

internal class ThreadSafeCollator : Collator
{
    private readonly Collator collator;
    private readonly object syncLock = new object();

    public ThreadSafeCollator(Collator collator)
    {
        this.collator = (Collator)collator.Clone();
    }

    public override CollationStrength Strength
    {
        get
        {
            using (var helper = new CollatorThreadHelper<CollationStrength>(
                this.collator, (clone) => clone.Strength))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.Strength = value;
            }
        }
    }

    public override NormalizationMode NormalizationMode
    {
        get
        {
            using (var helper = new CollatorThreadHelper<NormalizationMode>(
                this.collator, (clone) => clone.NormalizationMode))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.NormalizationMode = value;
            }
        }
    }

    public override FrenchCollation FrenchCollation
    {
        get
        {
            using (var helper = new CollatorThreadHelper<FrenchCollation>(
                this.collator, (clone) => clone.FrenchCollation))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.FrenchCollation = value;
            }
        }
    }

    public override CaseLevel CaseLevel
    {
        get
        {
            using (var helper = new CollatorThreadHelper<CaseLevel>(
                this.collator, (clone) => clone.CaseLevel))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.CaseLevel = value;
            }
        }
    }

    [Obsolete]
    public override HiraganaQuaternary HiraganaQuaternary
    {
        get
        {
            using (var helper = new CollatorThreadHelper<HiraganaQuaternary>(
                this.collator, (clone) => clone.HiraganaQuaternary))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.HiraganaQuaternary = value;
            }
        }
    }

    public override NumericCollation NumericCollation
    {
        get
        {
            using (var helper = new CollatorThreadHelper<NumericCollation>(
                this.collator, (clone) => clone.NumericCollation))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.NumericCollation = value;
            }
        }
    }

    public override CaseFirst CaseFirst
    {
        get
        {
            using (var helper = new CollatorThreadHelper<CaseFirst>(
                this.collator, (clone) => clone.CaseFirst))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.CaseFirst = value;
            }
        }
    }

    public override AlternateHandling AlternateHandling
    {
        get
        {
            using (var helper = new CollatorThreadHelper<AlternateHandling>(
                this.collator, (col) => col.AlternateHandling))
            {
                helper.Invoke();
                return helper.Result;
            }
        }
        set
        {
            lock (syncLock)
            {
                collator.AlternateHandling = value;
            }
        }
    }

    public override object Clone()
    {
        // No need to use our helper here...
        return this.collator.Clone();
    }

    public override int Compare(string source, string target)
    {
        using (var helper = new CollatorThreadHelper<int>(
                this.collator, (col) => col.Compare(source, target)))
        {
            helper.Invoke();
            return helper.Result;
        }
    }

    public override SortKey GetSortKey(string source)
    {
        using (var helper = new CollatorThreadHelper<SortKey>(
                this.collator, (col) => col.GetSortKey(source)))
        {
            helper.Invoke();
            return helper.Result;
        }
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            this.collator?.Dispose();
        }
        base.Dispose(disposing);
    }

    internal class CollatorThreadHelper<TResult> : IDisposable
    {
        private readonly Collator clonedCollator;
        private readonly Thread thread;
        private Exception exception;

        public CollatorThreadHelper(Collator collator, Func<Collator, TResult> action)
        {
            this.clonedCollator = (Collator)collator.Clone();
            this.thread = new Thread(() =>
            {
                try
                {
                    this.Result = action(this.clonedCollator);
                }
                catch (Exception ex)
                {
                    this.exception = ex;
                }
            });
        }
        public TResult Result { get; private set; }

        public void Invoke()
        {
            thread.Start();
            thread.Join();
            if (exception != null) throw exception;
        }

        public void Dispose()
        {
            this.clonedCollator.Dispose();
        }
    }
}

It works great, but I haven't yet benchmarked it to see how it performs without all of the additional cloning. I am also not sure about the property setters, but I only needed them in one class that was already deprecated in the code I am porting.

Describe alternatives you've considered

My second attempt was to try not to call Clone() so much in order to try to make it more efficient. At first I was thinking to try to store the collator in the Thread itself using Thread.SetData, but without a local reference it would leave no way to dispose the collator.

So, I tried making a local cache based on the thread id. Unfortunately, managed and unmanaged thread ids are 2 different things, and the latter requires a lower level call. However, this didn't seem to work (maybe I missed something, though).

internal class ThreadSafeCollator : Collator
{
    private const int cleanupIntervalInSeconds = 5;

    private readonly Collator collator;
    private readonly IDictionary<int, ThreadReference> cache = new Dictionary<int, ThreadReference>();
    private readonly ReaderWriterLockSlim cacheLock = new ReaderWriterLockSlim(LockRecursionPolicy.NoRecursion);
    private readonly CancellationTokenSource cleanReferencesTask;


    public ThreadSafeCollator(Collator collator)
    {
        if (collator == null)
            throw new ArgumentNullException(nameof(collator));

        this.collator = (Collator)collator.Clone();
        this.cleanReferencesTask = new CancellationTokenSource();
        Repeat.Interval(TimeSpan.FromSeconds(cleanupIntervalInSeconds), () => CleanCollatorReferences(), cleanReferencesTask.Token);
    }

    private void CleanCollatorReferences()
    {
        cacheLock.EnterUpgradeableReadLock();
        try
        {
            var deadReferences = cache.Where(r => !r.Value.IsAlive).ToArray();
            if (deadReferences.Any())
            {
                cacheLock.EnterWriteLock();
                try
                {
                    foreach (var deadReference in deadReferences)
                        cache.Remove(deadReference);
                }
                finally
                {
                    cacheLock.ExitWriteLock();
                }
            }
        }
        finally
        {
            cacheLock.ExitUpgradeableReadLock();
        }
    }

    [DllImport("kernel32.dll")]
    static extern int GetCurrentThreadId();

    private Collator GetCurrentCollator()
    {
        cacheLock.EnterUpgradeableReadLock();
        try
        {
            //int id = Thread.CurrentThread.ManagedThreadId;
            int id = GetCurrentThreadId();
            if (cache.ContainsKey(id))
            {
                return cache[id].Collator;
            }
            else
            {
                cacheLock.EnterWriteLock();
                try
                {
                    return (cache[id] = new ThreadReference(
                        collator, Thread.CurrentThread)).Collator;
                }
                finally
                {
                    cacheLock.ExitWriteLock();
                }
            }
        }
        finally
        {
            cacheLock.ExitUpgradeableReadLock();
        }
    }

    public override CollationStrength Strength
    {
        get => GetCurrentCollator().Strength;
        set => GetCurrentCollator().Strength = value;
    }

    public override NormalizationMode NormalizationMode
    {
        get => GetCurrentCollator().NormalizationMode;
        set => GetCurrentCollator().NormalizationMode = value;
    }

    public override FrenchCollation FrenchCollation
    {
        get => GetCurrentCollator().FrenchCollation;
        set => GetCurrentCollator().FrenchCollation = value;
    }

    public override CaseLevel CaseLevel
    {
        get => GetCurrentCollator().CaseLevel;
        set => GetCurrentCollator().CaseLevel = value;
    }

    [Obsolete]
    public override HiraganaQuaternary HiraganaQuaternary
    {
        get => GetCurrentCollator().HiraganaQuaternary;
        set => GetCurrentCollator().HiraganaQuaternary = value;
    }

    public override NumericCollation NumericCollation
    {
        get => GetCurrentCollator().NumericCollation;
        set => GetCurrentCollator().NumericCollation = value;
    }

    public override CaseFirst CaseFirst
    {
        get => GetCurrentCollator().CaseFirst;
        set => GetCurrentCollator().CaseFirst = value;
    }

    public override AlternateHandling AlternateHandling
    {
        get => GetCurrentCollator().AlternateHandling;
        set => GetCurrentCollator().AlternateHandling = value;
    }

    public override object Clone()
    {
        // No need to use our helper here...
        return this.collator.Clone();
    }

    public override int Compare(string source, string target)
    {
        return GetCurrentCollator().Compare(source, target);
    }

    public override SortKey GetSortKey(string source)
    {
        return GetCurrentCollator().GetSortKey(source);
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            cacheLock.EnterWriteLock();
            try
            {
                this.cleanReferencesTask.Cancel();
                this.collator.Dispose();
                foreach (var threadReference in cache.Values)
                {
                    threadReference.Collator?.Dispose();
                }
                this.cleanReferencesTask.Dispose();
                this.cacheLock.Dispose();
            }
            finally
            {
                cacheLock.ExitWriteLock();
            }
        }
        base.Dispose(disposing);
    }

    internal class ThreadReference
    {
        private readonly WeakReference<Thread> thread;
        public ThreadReference(Collator collator, Thread thread)
        {
            this.Collator = (Collator)collator.Clone();
            this.thread = new WeakReference<Thread>(thread);
        }

        public Collator Collator { get; private set; }
        public bool IsAlive => this.thread.TryGetTarget(out Thread target);
    }

    internal static class Repeat
    {
        public static Task Interval(
            TimeSpan pollInterval,
            Action action,
            CancellationToken token)
        {
            // We don't use Observable.Interval:
            // If we block, the values start bunching up behind each other.
            return Task.Factory.StartNew(
                () =>
                {
                    while (true)
                    {
                        if (token.WaitHandle.WaitOne(pollInterval))
                            break;

                        action();
                    }
                }, token, TaskCreationOptions.LongRunning, TaskScheduler.Default);
        }
    }
}

Additional context

First of all, I am wondering if you have any other suggestion that might work.

Secondly, would you consider providing something like this? My thought is to rename the existing collator types, mark them internal, and provide facade classes with exactly the same interfaces and documentation comments as the originals. Thread safe could just be an option that is enabled with a constructor or creation method parameter. The constructor would load either the thread safe or non safe instance internally.

private readonly Collator wrappedCollator;

public RuleBasedCollator(string rules,
	NormalizationMode normalizationMode,
	CollationStrength collationStrength,
        bool threadSafe)
{
    if (threadSafe)
        this.wrappedCollator = ThreadSafeCollator(new RuleBasedCollatorImpl(rules, normalizationMode, collationStrength));
    else
        this.wrappedCollator = new RuleBasedCollatorImpl(rules, normalizationMode, collationStrength);
}

public override SortKey GetSortKey(string source)
{
      return this.wrappedCollator.GetSortKey(source);
}

Any plans to support the new Normalizer2/unorm2.h API?

I am porting over parts of Lucene that utilize ICU4J, and noticed that while Collator seems mostly complete, we are missing the new Normalizer2 API, which maps to the unorm2.h API in C.

In fact, I don't see what advantage the Normalizer in this package has over string.Normalize(NormalizationForm) and string.IsNormalized(NormalizationForm) in .NET (which seems to make it rather pointless).

Specifically, what I am interested in are:

  1. NFKC_Casefold
  2. spanQuickCheckYes
  3. FilteredNormalizer2
  4. .nrm Binary Data File Support

Do you have any plans on supporting this? If not, what are your guidelines for contributing?

Migrate code to .NET Core

A good enhancement would be if this code compiled on .NET Core so the library would run cross-platform.

I've already done work to migrate the code to run on .NETStandard1.5 but I wanted your feedback and criteria if your team wanted this feature in your repository

Allow setting up custom data with udata_setAppData

Is your feature request related to a problem? Please describe.

I want to replace my existing codebase using some earlier icu.net code with the latest nuget package. I need to load a custom *.dat file to memory for a normalizer to use. The existing code is:

 var data = new byte[] { /* utr30.dat */ };
 var unmanagedPointer = Marshal.AllocHGlobal(data.Length);
 Marshal.Copy(data, 0, unmanagedPointer, data.Length);

 ErrorCode status2;
 NativeMethods.udata_setAppData("utr30", unmanagedPointer, out status2);
 ExceptionFromErrorCode.ThrowIfError(status2);

Describe the solution you'd like

Create a .NET wrapper that would allow setting custom data.

Describe alternatives you've considered

Copy ThrowIfError from this repo and reproduce NativeMethods.udata_setAppData in my project in order for it to work. Ugly.

Support of regular expressions

Since ICU provides a very good support of Unicode regular expressions, it would be nice if icu-dotnet supports them.

I did an experimental implementation in my fork. It only supports a few functions. How do people feel?

How to use it?

I am C# developer and I would like to Hebrew, Krill or Pinyin chart to latin chars. How can I do that? is there any tutorials?

System.EntryPointNotFoundException when using icu.net.56.0.2

All of the ubrk_* exported functions are missing from icuuc56.dll.

Repro Steps

  1. Create a console application
  2. Add NuGet package icu.net 56.0.2
  3. Add code below.
  4. Run program
public class Program
{
    public static void Main(string[] args)
    {
        Console.WriteLine($"ICU Version: {Wrapper.IcuVersion}");
        var locale = new Locale("en-US");
        var text = "Hello world";
        var terms = BreakIterator.Split(BreakIterator.UBreakIteratorType.WORD, locale, text).ToArray();
        foreach (var term in terms) {
            Console.WriteLine(term); 
        }
        Console.WriteLine("Press Enter to EXIT...");
        Console.ReadLine();
    }
}

Expected

Console output below

ICU Version: 56.1
Hello
world

Actual

An unhandled exception of type 'System.EntryPointNotFoundException' occurred in icu.net.dll

Additional information: Unable to find an entry point named 'ubrk_open_56' in DLL 'icuuc56.dll'.

BreakIterator.GetBoundaries is exponentially slow depending on the size of the source text

Describe the bug

BreakIterator.GetBoundaries is exponentially slow depending on the size of the source text. In other words, the larger the size of the text parameter string is, the slower the function is, and the curve is not linear.

To Reproduce

string content = "... some large text, about 100KB ... ";
BreakIterator.GetBoundaries(BreakIterator.UBreakIteratorType.WORD, new Locale("eng"), content, false); // Takes about 10 secs.

Expected behavior

The BreakIterator.GetBoundaries to finish within milliseconds.

Environment

  • OS: Windows 10
  • Exact version of icu.net 2.6.0
  • .NET Framework 4.7

use with manual build of icu?

Is it possible to use this with a manual or self build set of dlls of ICU ?
I'm using the latest icu build 59.1 builld with VS 2017.

Thanks.

Ability to convert English number to Chinese and vice-versa by adding the com.ibm.icu.text.NumberFormat class

A solution to convert English numbers to Chinese numbers using ICU4J was provided here: https://stackoverflow.com/a/31532069

I would like to be able to do the same using .NET but I could not find the com.ibm.icu.text.NumberFormat class in your icu.net wrapper.

Note that NumberFormatInfo.NativeDigits is not able to convert English numbers to Chinese because it is more complex than simple digit substitution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.