Coder Social home page Coder Social logo

Comments (22)

BrianNichols avatar BrianNichols commented on June 11, 2024

Which C client version are you using?
Also, which server version?

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

Client: 4.0.1
Server: 3.7.2 Enterprise

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

The C client is not performing a bounds check on the returned batch index. Try adding the following code in aerospike_batch.c:163

        if (offset >= records->size) {
            as_error err;
            as_error_update(&err, AEROSPIKE_ERR_CLIENT, "Batch index %u >= list size: %u", offset, records->size);
            as_event_response_error(cmd, &err);
            return true;
        }

If this fixes your segv, then data was earlier corrupted by either client or server. Let us know the results.

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

Consistently crashing at the same place even after performing the bounds check.
https://github.com/BeeswaxIO/aerospike-client-c/blob/master/src/main/aerospike/aerospike_batch.c#L171

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

Attached are the values of all the variables during the time of the crash.
aerospike_crash.pdf

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

Everything looks normal except for the passed in as_batch_read_records which is completely corrupted.

records = 0x7f632c17ec40
*0x7f632c17ec40 =
list = 0x7f632c0006c8
*0x7f632c0006c8 = void
capacity = 739956400
size = 32611
item_size = 740388576
flags = 32611

This argument must be created on the heap for async because the stack will drop out of scope when handing off the batch command to the event loop.

as_batch_read_records* records = as_batch_read_create(size);

Does your program initialize as_batch_read_records on the heap or stack?

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

Our application is in C++.
records is allocated on the heap.
Below is the relevant portion of how we are invoking the as_batch_read_async call.

as_batch_read_records* records = ::as_batch_read_create(keys.size());
for (uint32_t i = 0; i < keys.size(); i++) {
    as_batch_read_record* record = ::as_batch_read_reserve(records);
    ::as_key_init(&record->key, ns.c_str(), set.c_str(), keys[i].c_str());
    record->read_all_bins = true;
}
BatchGetUdata* bget_udata = new BatchGetUdata();
::as_error err;
::as_status status = ::aerospike_batch_read_async(&as_client_, &err, nullptr, records, AsClient::NativeBatchgetCb, static_cast<void*>(bget_udata), nullptr);

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

The memory dump does not show "executor" contents. Do you have this information?

typedef struct {
    as_event_executor executor;
    as_batch_read_records* records;
    as_async_batch_listener listener;
} as_async_batch_executor;

I know "executor->records->list" has been corrupted, but I'm trying to determine if it's because "executor" or "executor->records" pointers were changed.

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

Memory dump doesn't show the executor contents.
Though cmd->udata seems valid.
So seems like executor->records might be getting changed ?

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

Yes. It's either executor->records or executor contents that is getting stomped.

Is this easy to reproduce?
Do you have a program that can be shared that can reproduce?

Also, I would like to try and narrow down the source of the problem.
Which OS?
What are your batch sizes?
Are you using pipelining?
Are you sharing event loops?
If not sharing event loops, can you try with libev?

It might be possible to create debug client that attempts pinpoint where the corruption occurs. This would take time though.

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

Not easy to reproduce because the binary passes all our unit tests and integration tests.
This only happens in production, hence difficult to collect more metrics.

OS: Ubuntu 14.04 LTS
Batch Size: Most common batch size 2 or 3. Occasionally, batch size can be ~20.
Pipelining: No

Our application connects to two separate aerospike clusters. Therefore we create two clients and they both share the event loops. The event loops are not shared with the application.

We can try libev. Our application uses libevent2 and we encountered a build failure the last time we tried. But we can try again.

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

Ok. Maybe tomorrow I will have some time to implement code to detect if the records pointer has been corrupted. This code will need to be called at various places in the C client to pinpoint the cause.

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

Can you provide your batch callback code? I would like to see how as_batch_read_records is used and destroyed.

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

Application is written in C++.
Since records and udata are malloc'ed, we capture the value of these two pointers in the batch callback and then schedule the following operation on a separate libevent thread.

 void AsClient::NativeBatchgetCb(::as_error* err, as_batch_read_records* records,
                                void* udata, as_event_loop* event_loop) {
  BatchGetUdata* bget_udata = static_cast<BatchGetUdata*>(udata);
  ::as_status status =
      (err && err->code != AEROSPIKE_OK) ? err->code : AEROSPIKE_OK;
  folly::EventBase* evb =
      (bget_udata->evb == nullptr)
          ? http::WorkerManager::Global()->GetNextRoundRobin()->getEventBase()
          : bget_udata->evb;
  if (evb != nullptr) {
    evb->runInEventBaseThread([bget_udata, status, records]() mutable {
      std::vector<AsGetResp> as_resps;
      if (status == AEROSPIKE_OK && records != nullptr) {
        ::as_vector* resps_vec = &records->list;
        for (uint32_t i = 0; i < resps_vec->size; i++) {
          ::as_batch_read_record* batch_record =
              static_cast<as_batch_read_record*>(as_vector_get(resps_vec, i));
          if (batch_record->result == AEROSPIKE_OK) {
            AsRecord resp_record;
            resp_record.Set(&batch_record->record);
            as_resps.emplace_back(batch_record->result, std::move(resp_record));
          } else {
            as_resps.emplace_back(batch_record->result, AsRecord());
          }
        }
      } else if (status == AEROSPIKE_ERR_CLIENT) {
        if (records != nullptr) {
          ::as_vector* resps_vec = &records->list;
          std::string keys_str = "";
          for (uint32_t i = 0; i < resps_vec->size; i++) {
            ::as_batch_read_record* batch_record =
                static_cast<as_batch_read_record*>(as_vector_get(resps_vec, i));
            auto& key_as_str = batch_record->key.value.string;
            folly::format(&keys_str, " {}", ::as_string_get(&key_as_str));
          }
          LOG(ERROR) << "BatchGet Client Error. Keys:" << keys_str;
        } else {
          LOG(ERROR) << "BatchGet Client Error, Records are nullptr";
        }
      }
      bget_udata->promise.setValue(std::move(as_resps));
      delete bget_udata;
      ::as_batch_read_destroy(records);
    });
  } else {
    LOG(FATAL) << "Cannot find Libevent event base";
  }
}

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

I assume the following function performs as_val_reserve(rec) on the record or copies the record.

resp_record.Set(&batch_record->record);

as_batch_read_destroy will otherwise destroy the record.

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

We wanted to avoid an expensive copy, so we swap pointers.
Also we manage the lifetime of rec_.

void AsClient::AsRecord::Set(::as_record* rec) {
  // Ignoring the field keys in as_record because we are not using them in Get.
  rec_->ttl = rec->ttl;
  rec_->bins.entries = rec->bins.entries;
  rec_->bins.capacity = rec->bins.capacity;
  rec_->bins.size = rec->bins.size;
  rec_->bins._free = rec->bins._free;

  rec->bins.entries = nullptr;
  rec->bins.size = 0;
  rec->bins.capacity = 0;
}

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

That should work assuming entries is destroyed when you're done with them.

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

We don't delete entries but instead destroy rec_ using as_record_destroy(rec_)

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

The debug client is ready. How do you want to receive the full C client repo zip file?

from aerospike-client-c.

ramrengaswamy avatar ramrengaswamy commented on June 11, 2024

Yes

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

I suggest sending an email to [email protected]. We will take it from there.

from aerospike-client-c.

BrianNichols avatar BrianNichols commented on June 11, 2024

This has been resolved through helpdesk.

from aerospike-client-c.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.