Coder Social home page Coder Social logo

ntedgi / cld3-kotlin Goto Github PK

View Code? Open in Web Editor NEW
20.0 2.0 3.0 4.6 MB

Bindings to Google's Compact Language Detector 3 to JVM Based Languages

License: MIT License

C++ 26.50% Kotlin 69.74% C 3.76%
cld3 langugage-recognition kotlin nlp machine-learning

cld3-kotlin's Introduction

Build Status codecov codebeat badge

cld3-kotlin

WIP - Kotlin CLD3 - Google's Compact Language Detector 3

Bridge from c++ to Kotlin using Java Abstracted Foreign Function Layer

Operations Systems Support:

Job OS State Shared Objects
47.1 macOS passed dylib
47.2 Linux passed so
- windows not supported -

Usage Examples:

add maven dependencies

<repositories>
     <repository>
         <id>jitpack.io</id>
         <url>https://jitpack.io</url>
     </repository>
 </repositories>
 ...
<dependency>
     <groupId>com.github.ntedgi</groupId>
     <artifactId>cld3-kotlin</artifactId>
     <version>1.0.2</version>
</dependency>

download os shred objects and add them under src/lib/(os-name)

val ld = LangDetect()
val englishText = "This piece of text is in English";
var result = ld.detect(englishText)
assert(result.language == "English")
assert(result.isReliable)
assert(result.proportion == 1f)
val ld = LangDetect()
val englishBulgarianText = "This piece of text is in English Този текст е на Български";
val results = ld.findTopNMostFreqLangs(englishBulgarianText, 3)
val languages = results.map { it.language }
assert(languages.size == 3)
assert(languages.contains("English"))
assert(languages.contains("Bulgarian"))
assert(languages.contains("UNKNOWN"))

The Bridge Interface Implemantation:

from (C++)

std::vector<Result> FindTopNMostFreqLangs(const string &text, int num_langs);
Result FindLanguage(const string &text);

to (Kotlin)

fun findTopNMostFreqLangs(text: String, n: Int): List<LangDetectResponse> 
fun detect(text: String): LangDetectResponse 
data class LangDetectResponse(
    val probability: Float,
    val proportion: Float,
    val isReliable: Boolean,
    val language: String
)

if (this.repo.isAwesome || this.repo.isHelpful) {
  Star(this.repo);
}

cld3-kotlin's People

Contributors

dependabot[bot] avatar ntedgi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

cld3-kotlin's Issues

create shared object for windows

compile cld3 on windows
export

libc++.dll
libnative.dll
libprotobuf_lite.dll

currently not supporting windows with this exception

java.lang.UnsatisfiedLinkError : unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00

create shared object for OSX

compile cld3 on OSX
export

  • libc++.dylib
  • libnative.dylib
  • libprotobuf_lite.dylib

currently not supporting osx with this exception

java.lang.UnsatisfiedLinkError : unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00

create enum class for languages instead of native response

  const std::vector<std::pair<std::string, std::string>> gold_lang_text = {
      {"af", NNetLangIdTestData::kTestStrAF},
      {"ar", NNetLangIdTestData::kTestStrAR},
      {"az", NNetLangIdTestData::kTestStrAZ},
      {"be", NNetLangIdTestData::kTestStrBE},
      {"bg", NNetLangIdTestData::kTestStrBG},
      {"bn", NNetLangIdTestData::kTestStrBN},
      {"bs", NNetLangIdTestData::kTestStrBS},
      {"ca", NNetLangIdTestData::kTestStrCA},
      {"ceb", NNetLangIdTestData::kTestStrCEB},
      {"cs", NNetLangIdTestData::kTestStrCS},
      {"cy", NNetLangIdTestData::kTestStrCY},
      {"da", NNetLangIdTestData::kTestStrDA},
      {"de", NNetLangIdTestData::kTestStrDE},
      {"el", NNetLangIdTestData::kTestStrEL},
      {"en", NNetLangIdTestData::kTestStrEN},
      {"eo", NNetLangIdTestData::kTestStrEO},
      {"es", NNetLangIdTestData::kTestStrES},
      {"et", NNetLangIdTestData::kTestStrET},
      {"eu", NNetLangIdTestData::kTestStrEU},
      {"fa", NNetLangIdTestData::kTestStrFA},
      {"fi", NNetLangIdTestData::kTestStrFI},
      {"fil", NNetLangIdTestData::kTestStrFIL},
      {"fr", NNetLangIdTestData::kTestStrFR},
      {"ga", NNetLangIdTestData::kTestStrGA},
      {"gl", NNetLangIdTestData::kTestStrGL},
      {"gu", NNetLangIdTestData::kTestStrGU},
      {"ha", NNetLangIdTestData::kTestStrHA},
      {"hi", NNetLangIdTestData::kTestStrHI},
      {"hmn", NNetLangIdTestData::kTestStrHMN},
      {"hr", NNetLangIdTestData::kTestStrHR},
      {"ht", NNetLangIdTestData::kTestStrHT},
      {"hu", NNetLangIdTestData::kTestStrHU},
      {"hy", NNetLangIdTestData::kTestStrHY},
      {"id", NNetLangIdTestData::kTestStrID},
      {"ig", NNetLangIdTestData::kTestStrIG},
      {"is", NNetLangIdTestData::kTestStrIS},
      {"it", NNetLangIdTestData::kTestStrIT},
      {"iw", NNetLangIdTestData::kTestStrIW},
      {"ja", NNetLangIdTestData::kTestStrJA},
      {"jv", NNetLangIdTestData::kTestStrJV},
      {"ka", NNetLangIdTestData::kTestStrKA},
      {"kk", NNetLangIdTestData::kTestStrKK},
      {"km", NNetLangIdTestData::kTestStrKM},
      {"kn", NNetLangIdTestData::kTestStrKN},
      {"ko", NNetLangIdTestData::kTestStrKO},
      {"la", NNetLangIdTestData::kTestStrLA},
      {"lo", NNetLangIdTestData::kTestStrLO},
      {"lt", NNetLangIdTestData::kTestStrLT},
      {"lv", NNetLangIdTestData::kTestStrLV},
      {"mg", NNetLangIdTestData::kTestStrMG},
      {"mi", NNetLangIdTestData::kTestStrMI},
      {"mk", NNetLangIdTestData::kTestStrMK},
      {"ml", NNetLangIdTestData::kTestStrML},
      {"mn", NNetLangIdTestData::kTestStrMN},
      {"mr", NNetLangIdTestData::kTestStrMR},
      {"ms", NNetLangIdTestData::kTestStrMS},
      {"mt", NNetLangIdTestData::kTestStrMT},
      {"my", NNetLangIdTestData::kTestStrMY},
      {"ne", NNetLangIdTestData::kTestStrNE},
      {"nl", NNetLangIdTestData::kTestStrNL},
      {"no", NNetLangIdTestData::kTestStrNO},
      {"ny", NNetLangIdTestData::kTestStrNY},
      {"pa", NNetLangIdTestData::kTestStrPA},
      {"pl", NNetLangIdTestData::kTestStrPL},
      {"pt", NNetLangIdTestData::kTestStrPT},
      {"ro", NNetLangIdTestData::kTestStrRO},
      {"ru", NNetLangIdTestData::kTestStrRU},
      {"si", NNetLangIdTestData::kTestStrSI},
      {"sk", NNetLangIdTestData::kTestStrSK},
      {"sl", NNetLangIdTestData::kTestStrSL},
      {"so", NNetLangIdTestData::kTestStrSO},
      {"sq", NNetLangIdTestData::kTestStrSQ},
      {"sr", NNetLangIdTestData::kTestStrSR},
      {"st", NNetLangIdTestData::kTestStrST},
      {"su", NNetLangIdTestData::kTestStrSU},
      {"sv", NNetLangIdTestData::kTestStrSV},
      {"sw", NNetLangIdTestData::kTestStrSW},
      {"ta", NNetLangIdTestData::kTestStrTA},
      {"te", NNetLangIdTestData::kTestStrTE},
      {"tg", NNetLangIdTestData::kTestStrTG},
      {"th", NNetLangIdTestData::kTestStrTH},
      {"tr", NNetLangIdTestData::kTestStrTR},
      {"uk", NNetLangIdTestData::kTestStrUK},
      {"ur", NNetLangIdTestData::kTestStrUR},
      {"uz", NNetLangIdTestData::kTestStrUZ},
      {"vi", NNetLangIdTestData::kTestStrVI},
      {"yi", NNetLangIdTestData::kTestStrYI},
      {"yo", NNetLangIdTestData::kTestStrYO},
      {"zh", NNetLangIdTestData::kTestStrZH},
      {"zu", NNetLangIdTestData::kTestStrZU}};
const char *const TaskContextParams::kLanguageNames[] = {
    "eo", "co", "eu", "ta", "de", "mt", "ps", "te", "su", "uz", "zh-Latn", "ne",
    "nl", "sw", "sq", "hmn", "ja", "no", "mn", "so", "ko", "kk", "sl", "ig",
    "mr", "th", "zu", "ml", "hr", "bs", "lo", "sd", "cy", "hy", "uk", "pt",
    "lv", "iw", "cs", "vi", "jv", "be", "km", "mk", "tr", "fy", "am", "zh",
    "da", "sv", "fi", "ht", "af", "la", "id", "fil", "sm", "ca", "el", "ka",
    "sr", "it", "sk", "ru", "ru-Latn", "bg", "ny", "fa", "haw", "gl", "et",
    "ms", "gd", "bg-Latn", "ha", "is", "ur", "mi", "hi", "bn", "hi-Latn", "fr",
    "yi", "hu", "xh", "my", "tg", "ro", "ar", "lb", "el-Latn", "st", "ceb",
    "kn", "az", "si", "ky", "mg", "en", "gu", "es", "pl", "ja-Latn", "ga", "lt",
    "sn", "yo", "pa", "ku",

    // last element must be nullptr
    nullptr,
};

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.