Coder Social home page Coder Social logo

ice v2 data race about bluge HOT 6 OPEN

mschoch avatar mschoch commented on August 18, 2024
ice v2 data race

from bluge.

Comments (6)

mschoch avatar mschoch commented on August 18, 2024 1

Oh I see, it's actually a bigger problem. You only uncompress one chunk at a time, so the contents of that buffer actually changes depending on what documents you try to access.

In that case I think we can't cache/reuse the buffer inside the segment. They are intended to be heavily shared (you could have a hundred queries all hitting that segment at one time), so I don't think it makes sense to share that buffer and coordinate access with a lock.

And that makes it seem like the current API isn't going to work very well with the stored docs compressed in chunks. I believe that was the reason we compressed docs individually in the past, even though it results in much lower compression. So, I think if you want to go with this storage format, you should also propose an API to access it efficiently.

from bluge.

mschoch avatar mschoch commented on August 18, 2024

cc @hengfeiyang

from bluge.

mschoch avatar mschoch commented on August 18, 2024

@hengfeiyang I think the issue is that Segments are shared, so when you mutate storedFieldChunkUncompressed we have potential data races:

https://github.com/blugelabs/ice/blob/d830a812e60591ce0955fdeabd483f0ebf537ebd/read.go#L46-L47

If we must do it this way, you'll have to protect access to it with a mutex, like the fieldFSTs:

https://github.com/blugelabs/ice/blob/master/segment.go#L52-L53

Alternatively I wonder, can't we arrange this so that we decompress once, and then just reuse it? I'm not sure exactly how that code looks to be safe from races, but it doesn't make sense that we'd ever intentionally decompress the same compressed bytes again right?

from bluge.

mschoch avatar mschoch commented on August 18, 2024

Seems like there are 2 choices:

  1. uncompress at open (wasteful if never match documents in this segment), but can avoid a lock.
  2. uncompress on first use, needs a lock (possibly too much overhead, because we always need to load _id for matches)

You could also uncompress every time without saving, but pretty sure that isn't a useful option.

from bluge.

mschoch avatar mschoch commented on August 18, 2024

I prototyped one idea here: blugelabs/ice#15 But I don't love it.

from bluge.

hengfeiyang avatar hengfeiyang commented on August 18, 2024

I prototyped one idea here: blugelabs/ice#15 But I don't love it.

I modify Mutex to RWMutex, This will be slight, but it would can cause problem when load many many segments, need a mechanism to release unused cache.

blugelabs/ice#16

from bluge.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.