Comments (22)
Which C client version are you using?
Also, which server version?
from aerospike-client-c.
Client: 4.0.1
Server: 3.7.2 Enterprise
from aerospike-client-c.
The C client is not performing a bounds check on the returned batch index. Try adding the following code in aerospike_batch.c:163
if (offset >= records->size) {
as_error err;
as_error_update(&err, AEROSPIKE_ERR_CLIENT, "Batch index %u >= list size: %u", offset, records->size);
as_event_response_error(cmd, &err);
return true;
}
If this fixes your segv, then data was earlier corrupted by either client or server. Let us know the results.
from aerospike-client-c.
Consistently crashing at the same place even after performing the bounds check.
https://github.com/BeeswaxIO/aerospike-client-c/blob/master/src/main/aerospike/aerospike_batch.c#L171
from aerospike-client-c.
Attached are the values of all the variables during the time of the crash.
aerospike_crash.pdf
from aerospike-client-c.
Everything looks normal except for the passed in as_batch_read_records which is completely corrupted.
records = 0x7f632c17ec40
*0x7f632c17ec40 =
list = 0x7f632c0006c8
*0x7f632c0006c8 = void
capacity = 739956400
size = 32611
item_size = 740388576
flags = 32611
This argument must be created on the heap for async because the stack will drop out of scope when handing off the batch command to the event loop.
as_batch_read_records* records = as_batch_read_create(size);
Does your program initialize as_batch_read_records on the heap or stack?
from aerospike-client-c.
Our application is in C++.
records
is allocated on the heap.
Below is the relevant portion of how we are invoking the as_batch_read_async
call.
as_batch_read_records* records = ::as_batch_read_create(keys.size());
for (uint32_t i = 0; i < keys.size(); i++) {
as_batch_read_record* record = ::as_batch_read_reserve(records);
::as_key_init(&record->key, ns.c_str(), set.c_str(), keys[i].c_str());
record->read_all_bins = true;
}
BatchGetUdata* bget_udata = new BatchGetUdata();
::as_error err;
::as_status status = ::aerospike_batch_read_async(&as_client_, &err, nullptr, records, AsClient::NativeBatchgetCb, static_cast<void*>(bget_udata), nullptr);
from aerospike-client-c.
The memory dump does not show "executor" contents. Do you have this information?
typedef struct {
as_event_executor executor;
as_batch_read_records* records;
as_async_batch_listener listener;
} as_async_batch_executor;
I know "executor->records->list" has been corrupted, but I'm trying to determine if it's because "executor" or "executor->records" pointers were changed.
from aerospike-client-c.
Memory dump doesn't show the executor contents.
Though cmd->udata
seems valid.
So seems like executor->records
might be getting changed ?
from aerospike-client-c.
Yes. It's either executor->records
or executor
contents that is getting stomped.
Is this easy to reproduce?
Do you have a program that can be shared that can reproduce?
Also, I would like to try and narrow down the source of the problem.
Which OS?
What are your batch sizes?
Are you using pipelining?
Are you sharing event loops?
If not sharing event loops, can you try with libev?
It might be possible to create debug client that attempts pinpoint where the corruption occurs. This would take time though.
from aerospike-client-c.
Not easy to reproduce because the binary passes all our unit tests and integration tests.
This only happens in production, hence difficult to collect more metrics.
OS: Ubuntu 14.04 LTS
Batch Size: Most common batch size 2 or 3. Occasionally, batch size can be ~20.
Pipelining: No
Our application connects to two separate aerospike clusters. Therefore we create two clients and they both share the event loops. The event loops are not shared with the application.
We can try libev. Our application uses libevent2 and we encountered a build failure the last time we tried. But we can try again.
from aerospike-client-c.
Ok. Maybe tomorrow I will have some time to implement code to detect if the records pointer has been corrupted. This code will need to be called at various places in the C client to pinpoint the cause.
from aerospike-client-c.
Can you provide your batch callback code? I would like to see how as_batch_read_records
is used and destroyed.
from aerospike-client-c.
Application is written in C++.
Since records
and udata
are malloc'ed, we capture the value of these two pointers in the batch callback and then schedule the following operation on a separate libevent thread.
void AsClient::NativeBatchgetCb(::as_error* err, as_batch_read_records* records,
void* udata, as_event_loop* event_loop) {
BatchGetUdata* bget_udata = static_cast<BatchGetUdata*>(udata);
::as_status status =
(err && err->code != AEROSPIKE_OK) ? err->code : AEROSPIKE_OK;
folly::EventBase* evb =
(bget_udata->evb == nullptr)
? http::WorkerManager::Global()->GetNextRoundRobin()->getEventBase()
: bget_udata->evb;
if (evb != nullptr) {
evb->runInEventBaseThread([bget_udata, status, records]() mutable {
std::vector<AsGetResp> as_resps;
if (status == AEROSPIKE_OK && records != nullptr) {
::as_vector* resps_vec = &records->list;
for (uint32_t i = 0; i < resps_vec->size; i++) {
::as_batch_read_record* batch_record =
static_cast<as_batch_read_record*>(as_vector_get(resps_vec, i));
if (batch_record->result == AEROSPIKE_OK) {
AsRecord resp_record;
resp_record.Set(&batch_record->record);
as_resps.emplace_back(batch_record->result, std::move(resp_record));
} else {
as_resps.emplace_back(batch_record->result, AsRecord());
}
}
} else if (status == AEROSPIKE_ERR_CLIENT) {
if (records != nullptr) {
::as_vector* resps_vec = &records->list;
std::string keys_str = "";
for (uint32_t i = 0; i < resps_vec->size; i++) {
::as_batch_read_record* batch_record =
static_cast<as_batch_read_record*>(as_vector_get(resps_vec, i));
auto& key_as_str = batch_record->key.value.string;
folly::format(&keys_str, " {}", ::as_string_get(&key_as_str));
}
LOG(ERROR) << "BatchGet Client Error. Keys:" << keys_str;
} else {
LOG(ERROR) << "BatchGet Client Error, Records are nullptr";
}
}
bget_udata->promise.setValue(std::move(as_resps));
delete bget_udata;
::as_batch_read_destroy(records);
});
} else {
LOG(FATAL) << "Cannot find Libevent event base";
}
}
from aerospike-client-c.
I assume the following function performs as_val_reserve(rec) on the record or copies the record.
resp_record.Set(&batch_record->record);
as_batch_read_destroy
will otherwise destroy the record.
from aerospike-client-c.
We wanted to avoid an expensive copy, so we swap pointers.
Also we manage the lifetime of rec_
.
void AsClient::AsRecord::Set(::as_record* rec) {
// Ignoring the field keys in as_record because we are not using them in Get.
rec_->ttl = rec->ttl;
rec_->bins.entries = rec->bins.entries;
rec_->bins.capacity = rec->bins.capacity;
rec_->bins.size = rec->bins.size;
rec_->bins._free = rec->bins._free;
rec->bins.entries = nullptr;
rec->bins.size = 0;
rec->bins.capacity = 0;
}
from aerospike-client-c.
That should work assuming entries is destroyed when you're done with them.
from aerospike-client-c.
We don't delete entries
but instead destroy rec_
using as_record_destroy(rec_)
from aerospike-client-c.
The debug client is ready. How do you want to receive the full C client repo zip file?
from aerospike-client-c.
Yes
from aerospike-client-c.
I suggest sending an email to [email protected]. We will take it from there.
from aerospike-client-c.
This has been resolved through helpdesk.
from aerospike-client-c.
Related Issues (20)
- warning: 'MSG_NOSIGNAL' macro redefined (on macOS) HOT 1
- Deleting bins by setting values to null or empty string HOT 1
- Undefined symbols in static library HOT 2
- cmake support HOT 4
- Is this project still under maintenance HOT 1
- Unable to found benchmarks in aerospike-client-c HOT 2
- Logstash Pipeline Error
- Crash in as_node_ensure_login_shm when remote servers are restarted HOT 16
- Compile failed with "the clang compiler does not support '-march=nocona'" HOT 2
- crash inside as_cluster_tend on arm64 HOT 4
- Give a name to worker threads HOT 3
- apple silicon / arm support HOT 9
- Release tagged 4.6.24 refers to missing .gitmodules HOT 1
- On batch reads only AS_POLICY_REPLICA_SEQUENCE will actually find new nodes for requests, AS_POLICY_REPLICA_ANY will try the same node and fail again. HOT 6
- aerospike-client-c link error with extra compiler flags -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer HOT 1
- can not build without lua HOT 1
- Missing as_buffer.h in installed includes HOT 1
- scan callback failing with memory fault issue HOT 1
- Build fails for 6.4.1 with luajit HOT 1
- crash on exit in as_cluster_tender (after libuv event loop is stopped) HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from aerospike-client-c.