Comments (4)
I think the current tokenization process should also work for Chinese. It's a simple regular expression that breaks text into words. After that, each word is stemmed. The stemming concept cannot be applied to Chinese but to Indo-European group of languages, so the stemming will simply be ignored and will do nothing to the word.
from tntsearch.
I would say that it even works for Chinese. The stemming process simply would do nothing since the stemming concept is not applicable in Chinese.
If you take a look at the demo page, try to search for: 指原の乱
I don't know what this means or if this is even Chinese, but it gives me some results.
Regarding your second question, I am not sure what you meant. If you have some text in your database then yes, it can be searched. Where else could the text be if it's not in the db?
from tntsearch.
after post this issue, I read the code ofthe project.
I think may need a Chinese Tokenizer analyzer, and then write a Chinese stemmer. If I hava time, maybe can push a request.o̖⸜((̵̵́ ̆͒͟˚̩̭ ̆͒)̵̵̀)⸝o̗
the second question, I found the answer after read the code.
typed use ipad, it is not convenient
thx for your reply.
from tntsearch.
Chinese a bit complex, the test results are not good,I think have a Chinese Tokenizer analyzer is better
from tntsearch.
Related Issues (20)
- Result not matched HOT 6
- Undefined index: docScores HOT 2
- Does it possible to reate index by array of data? HOT 4
- Dynamic properties used in "TeamTNT\TNTSearch\Indexer" HOT 3
- Depreciation : Using ${var} in strings is deprecated, use {$var} instead in PHP 8.2 HOT 1
- tntsearch Deprecated: Creation of dynamic property HOT 1
- Anyone know what this random SMS-Texts file is? HOT 2
- Diacritic-Insensitive Search Support (Czech characters) HOT 3
- Scout: Custom tokenizer indexing properly to allow dashes and periods, but searching on dashes does not work HOT 9
- Performance issues with large datasets HOT 6
- Class 'TeamTNT\TNTSearch\Engines\Exception' not found in 'vendor/teamtnt/tntsearch/src/Engines/EngineTrait.php' line 46 HOT 1
- Per-Model Fuzzy Search Configuration in Laravel Scout HOT 1
- [FEATURE] Support of PSR-16 adapter
- How to add MYSQL_ATTR_SSL_CA option? HOT 1
- $startpos adjustment may return minus value. HOT 1
- How to update index for which no index.
- Fuzziness / Fuzzy-Search not working HOT 3
- Scout Driver - Model update or save dont trigger tntsearch index update HOT 3
- new TNTGeoIndexer expects engine
- Why add 'return' in saveHitList function? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tntsearch.