Comments (7)
Thanks for the report!
One solution might be to have a dedicated memory context for AsyncSource
, such that, whenever it needs to allocate memory for its buffer, it allocates from the dedicated context and frees the old memory previously allocated in the same context. AsyncSourceClose()
should delete the context at the end, or if an error occurs.
Could you review the PR #28?
from pg_bulkload.
Thank you for the reply!
One solution might be to have a dedicated memory context for AsyncSource, such that, whenever it needs to allocate memory for its buffer, it allocates from the dedicated context and frees the old memory previously allocated in the same context. AsyncSourceClose() should delete the context at the end, or if an error occurs.
This is the same as what I imagined. PR #28 looks good but because I'm not familiar with pg_bulkload code I'm not sure whether current approach is safe. Is it possible that the memory context is changed unexpectedly from AsyncSource to another memory context like TupleChecker by main thread after switched to AsyncSource?
from pg_bulkload.
To answer your question - you'll see that we switch back to the AsyncSource
context every time we need to palloc
for AsyncSource's needs. We also pfree
the previously allocated memory while we are in the same context. When error occurs, like in case of this issue, we simply delete the context using MemoryContextDelete()
, which we know is our own.
Also, as I commented on PR #28, the problem is not caused by interaction of different threads. It's rather the lack of coordination between different modules of pg_bulkload (such as parser, reader, source). Currently, we do have a dedicated context for parallel writer named "ParallelWriter"
, which I think somebody designed to take care of such hazard. With this PR, we do that for AsyncSource
.
I noticed however that, context
is a member of base class Writer, instead of ParallelWriter
. Maybe, we should do something similar in case of AsyncSource
. That is, make the newly introduced context
a member of base class Source
, instead of AsyncSource
. But then again, only AsyncSource
seems to need to do palloc
, not other sources like FileSource
and RemoteSource
.
from pg_bulkload.
I've now understood what happened exactly. You're right. Reading source file is done by child thread but extending read buffer is done by main thread. The main thread extends the read buffer after switched memory context to "ParallelWriter", so it's cause of SEGV.
I noticed however that, context is a member of base class Writer, instead of ParallelWriter. Maybe, we should do something similar in case of AsyncSource. That is, make the newly introduced context a member of base class Source, instead of AsyncSource. But then again, only AsyncSource seems to need to do palloc, not other sources like FileSource and RemoteSource.
ISTM that AsyncSource can have context as current your PR does, or can we use repalloc instead?
from pg_bulkload.
ISTM that AsyncSource can have context as current your PR does, or can we use repalloc instead?
I assume you mean, repalloc(self->buffer, newsize)
within the proposed dedicated context, instead of the following stanza:
newbuf = palloc0(newsize);
/* copy it in new buffer from old buffer */
pthread_mutex_lock(&self->lock);
if (self->begin > self->end)
{
memcpy(newbuf, self->buffer + self->begin,
self->size - self->begin);
memcpy(newbuf + self->size - self->begin, self->buffer, self->end);
self->end = self->size - self->begin + self->end;
}
else
{
memcpy(newbuf, self->buffer + self->begin, self->end - self->begin);
self->end = self->end - self->begin;
}
pfree(self->buffer);
self->buffer = newbuf;
self->size = newsize;
self->begin = 0;
pthread_mutex_unlock(&self->lock);
Due to involvement of the locking between threads here, I'm not immediately sure if that's going to be straightforward. It could be made to work with enough attention though.
from pg_bulkload.
I assume you mean, repalloc(self->buffer, newsize) within the proposed dedicated context, instead of the following stanza:
Yes. I guessed that we can use repalloc(self->buffer, newsize)
instead of switching memory context and then executing palloc0
, memcpy
and pfree
. But ISTM that your original PR approach is safer.
from pg_bulkload.
OK, merged the PR for now. Thanks for the review!
from pg_bulkload.
Related Issues (20)
- Cannot load unquoted CSV data?
- Build Errors Postgresql 14 - master branch HOT 2
- Compile Error in Ubuntu LTS 22.04 HOT 2
- stride - invalid command HOT 6
- new lines in fields HOT 2
- delimiter - extended string syntax
- failed to install HOT 7
- unable to load to citus distributed table
- PostgreSQL 15 support HOT 4
- Cannot Make HOT 4
- update deprecated actoins for github actions
- Make impossible on Ubuntu HOT 6
- Error on Ubuntu
- Extension version pg_bulkload mistake HOT 6
- rpmbuild issue and question
- what is recovery command's major purpose ?
- pg_bulkload on PG13 fails to handle large data duplicates HOT 8
- missing img in the doc
- PostgreSQL 16 support HOT 5
- how to used in windows10/11?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg_bulkload.