Comments (6)
Oh I see, it's actually a bigger problem. You only uncompress one chunk at a time, so the contents of that buffer actually changes depending on what documents you try to access.
In that case I think we can't cache/reuse the buffer inside the segment. They are intended to be heavily shared (you could have a hundred queries all hitting that segment at one time), so I don't think it makes sense to share that buffer and coordinate access with a lock.
And that makes it seem like the current API isn't going to work very well with the stored docs compressed in chunks. I believe that was the reason we compressed docs individually in the past, even though it results in much lower compression. So, I think if you want to go with this storage format, you should also propose an API to access it efficiently.
from bluge.
cc @hengfeiyang
from bluge.
@hengfeiyang I think the issue is that Segments are shared, so when you mutate storedFieldChunkUncompressed
we have potential data races:
https://github.com/blugelabs/ice/blob/d830a812e60591ce0955fdeabd483f0ebf537ebd/read.go#L46-L47
If we must do it this way, you'll have to protect access to it with a mutex, like the fieldFSTs:
https://github.com/blugelabs/ice/blob/master/segment.go#L52-L53
Alternatively I wonder, can't we arrange this so that we decompress once, and then just reuse it? I'm not sure exactly how that code looks to be safe from races, but it doesn't make sense that we'd ever intentionally decompress the same compressed bytes again right?
from bluge.
Seems like there are 2 choices:
- uncompress at open (wasteful if never match documents in this segment), but can avoid a lock.
- uncompress on first use, needs a lock (possibly too much overhead, because we always need to load _id for matches)
You could also uncompress every time without saving, but pretty sure that isn't a useful option.
from bluge.
I prototyped one idea here: blugelabs/ice#15 But I don't love it.
from bluge.
I prototyped one idea here: blugelabs/ice#15 But I don't love it.
I modify Mutex
to RWMutex
, This will be slight, but it would can cause problem when load many many segments, need a mechanism to release unused cache.
from bluge.
Related Issues (20)
- Does bluge support cluster deployment? HOT 1
- panic while merging in unit test HOT 5
- Comparison with Bleve and others HOT 1
- index out of range when visiting stored fields HOT 4
- Date aggregations support HOT 2
- TestBug87 fails in custom implementation of search.Context HOT 2
- Question on aggregation bucket HOT 2
- makeslice len out of range
- Define logger interface
- multi index search
- Example of indexing a document with tags? HOT 1
- Difference between a NewTextField() and NewKeywordField()
- Sorting by ascending order of _score
- Indexing/Analyzing URLs, Email Addresses, etc?
- Indexing/Querying Emojis
- Is there a way to use this library more as a caching layer?
- Concurrently close writer panic HOT 1
- Pre-query for getting terms list.
- index out of range panic
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bluge.