Comments (3)
This change causes another issue:
reordering initRequestType() and set_sentence() causes reinitialization of lattice->theta_ to default
in set_sentence() (via clear())
from mecab.
I've had a look at the source, and I think I've tracked this down to a memory
bug in mecab itself.
LatticeImpl::set_sentence uses has_request_type() to determine whether it
should allocate new memory for the sentence or just reuse the memory passed as
its `sentence' argument. However, the various TaggerImpl::parse* methods all
call lattice->set_sentence *before* they properly set the request type in the
lattice (via TaggerImpl::initRequestType()). This means that on each call to a
tagger parse method the lattice uses the previous call's request type. On the
first call to a tagger parse method the lattice uses whatever its request_type_
is initialised to.
The end result is that when calling the tagger parse methods sometimes the
lattice incorrectly reuses the memory it has been passed instead of allocating
new memory. The python wrapper or python runtime may subsequently reallocate
that memory for other uses and it may get overwritten with new data. Then the
nodes returned by parseToNode no longer point to the surface text of the
sentence.
The fix should be to call set_sentence after the request type has been set.
I've attached a patch against the 0.996 source download for mecab. It fixes the
behaviour in this bug report.
Original comment by [email protected]
on 19 Mar 2013 at 3:46
Attachments:
from mecab.
Note this issue was fixed by #24 in 2016.
from mecab.
Related Issues (20)
- Problems when training HOT 2
- When training, speed of reading corpus is very slow
- [mecab-dict-index] error HOT 2
- Don't specify node-format option when using UniDic HOT 1
- matrix right/left dimension checking is inconsistent (compiling user dictionary/assigning user dict costs) HOT 3
- mecab-dict-gen crashes after a long time
- Memoly leak when use python-wrapper and input string is too long
- Installing mecab HOT 1
- Meet a undefined reference to '__imp__ZN5MeCab12createTaggerEPKc' when running the example.cpp HOT 2
- Mecab algorithm (Mecabアルゴリズム) HOT 1
- Words do not get divided properly when small letters (捨て仮名) are included in word HOT 3
- Tag repo please HOT 1
- Support for Ruby2.7?
- Failure initializing Tagger has no error message
- 形容詞活用形「正しく」が副詞として扱われる HOT 1
- http://creativecommons.org/licenses/by-sa/3.0/
- Max Grouping Size off-by-one error
- “'gcc' failed with exit status 1” when trying to install Mecab with PyPy docker image HOT 1
- WPATH_FORCE() not defined on windows when compiling with msvc.
- Output Format HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mecab.