Coder Social home page Coder Social logo

Comments (7)

amitlan avatar amitlan commented on June 19, 2024

Thanks for the report!

One solution might be to have a dedicated memory context for AsyncSource, such that, whenever it needs to allocate memory for its buffer, it allocates from the dedicated context and frees the old memory previously allocated in the same context. AsyncSourceClose() should delete the context at the end, or if an error occurs.

Could you review the PR #28?

from pg_bulkload.

MasahikoSawada avatar MasahikoSawada commented on June 19, 2024

Thank you for the reply!

One solution might be to have a dedicated memory context for AsyncSource, such that, whenever it needs to allocate memory for its buffer, it allocates from the dedicated context and frees the old memory previously allocated in the same context. AsyncSourceClose() should delete the context at the end, or if an error occurs.

This is the same as what I imagined. PR #28 looks good but because I'm not familiar with pg_bulkload code I'm not sure whether current approach is safe. Is it possible that the memory context is changed unexpectedly from AsyncSource to another memory context like TupleChecker by main thread after switched to AsyncSource?

from pg_bulkload.

amitlan avatar amitlan commented on June 19, 2024

To answer your question - you'll see that we switch back to the AsyncSource context every time we need to palloc for AsyncSource's needs. We also pfree the previously allocated memory while we are in the same context. When error occurs, like in case of this issue, we simply delete the context using MemoryContextDelete(), which we know is our own.

Also, as I commented on PR #28, the problem is not caused by interaction of different threads. It's rather the lack of coordination between different modules of pg_bulkload (such as parser, reader, source). Currently, we do have a dedicated context for parallel writer named "ParallelWriter", which I think somebody designed to take care of such hazard. With this PR, we do that for AsyncSource.

I noticed however that, context is a member of base class Writer, instead of ParallelWriter. Maybe, we should do something similar in case of AsyncSource. That is, make the newly introduced context a member of base class Source, instead of AsyncSource. But then again, only AsyncSource seems to need to do palloc, not other sources like FileSource and RemoteSource.

from pg_bulkload.

MasahikoSawada avatar MasahikoSawada commented on June 19, 2024

I've now understood what happened exactly. You're right. Reading source file is done by child thread but extending read buffer is done by main thread. The main thread extends the read buffer after switched memory context to "ParallelWriter", so it's cause of SEGV.

I noticed however that, context is a member of base class Writer, instead of ParallelWriter. Maybe, we should do something similar in case of AsyncSource. That is, make the newly introduced context a member of base class Source, instead of AsyncSource. But then again, only AsyncSource seems to need to do palloc, not other sources like FileSource and RemoteSource.

ISTM that AsyncSource can have context as current your PR does, or can we use repalloc instead?

from pg_bulkload.

amitlan avatar amitlan commented on June 19, 2024

ISTM that AsyncSource can have context as current your PR does, or can we use repalloc instead?

I assume you mean, repalloc(self->buffer, newsize) within the proposed dedicated context, instead of the following stanza:

        newbuf = palloc0(newsize);
        /* copy it in new buffer from old buffer */
        pthread_mutex_lock(&self->lock);
        if (self->begin > self->end)
        {
            memcpy(newbuf, self->buffer + self->begin,
                   self->size - self->begin);
            memcpy(newbuf + self->size - self->begin, self->buffer, self->end);
            self->end = self->size - self->begin + self->end;
        }
        else
        {
            memcpy(newbuf, self->buffer + self->begin, self->end - self->begin);
            self->end = self->end - self->begin;
        }
        pfree(self->buffer);
        self->buffer = newbuf;
        self->size = newsize;
        self->begin = 0;
        pthread_mutex_unlock(&self->lock);

Due to involvement of the locking between threads here, I'm not immediately sure if that's going to be straightforward. It could be made to work with enough attention though.

from pg_bulkload.

MasahikoSawada avatar MasahikoSawada commented on June 19, 2024

I assume you mean, repalloc(self->buffer, newsize) within the proposed dedicated context, instead of the following stanza:

Yes. I guessed that we can use repalloc(self->buffer, newsize) instead of switching memory context and then executing palloc0, memcpy and pfree. But ISTM that your original PR approach is safer.

from pg_bulkload.

amitlan avatar amitlan commented on June 19, 2024

OK, merged the PR for now. Thanks for the review!

from pg_bulkload.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.