Coder Social home page Coder Social logo

Comments (10)

dvyukov avatar dvyukov commented on June 6, 2024

Hi @erin2722,

Is it possible that some other sampled memory is ending up in the address space of the GuardedPageAllocator, and is therefore being validated upon deallocation when it is not intended to?

If that would be possible, that would be a serious bug that can lead to arbitrary memory corruptions.
I don't immediately see how this is possible. GuardedPageAllocator allocated that memory with mmap in Init method.

GuardedPageAllocator circumvents system-alloc's spinlock, which may be unintentional, but the mmap in system-alloc does not use MAP_FIXED, only MAP_FIXED_NOREPLACE (if available). Without MAP_FIXED overlap in hints must not lead to overlapping ranges being allocated.

If you can reproduce this at least semi-reliably, I would suggest to trace mmap's with strace of printf's to confirm/disprove possible overlapping.

from tcmalloc.

erin2722 avatar erin2722 commented on June 6, 2024

Hi @dvyukov ,

Thank you so much for the quick response! I will see what I can do in terms of reproing this, and let you know what I find.

from tcmalloc.

ckennelly avatar ckennelly commented on June 6, 2024

Since you have a core dump, I was curious if anything obvious stood out about the contents of guarded_page_allocator_ and the faulting address.

Without the call to ActivateGuardedSampling, I'd expect the begin/end address ranges of guarded_page_allocator_ to be 0 and PointerIsMine to always fail, but maybe something unusual is happening.

from tcmalloc.

dvyukov avatar dvyukov commented on June 6, 2024

When I last looked at the code I read it as: GuardedPageAllocator::Init mmaps memory and initializes begin/end, and then ActivateGuardedSampling sets the flag to start allocating guarded allocations, and they are separate.
If that's the case, we can have begin/end non-0, but no allocations, and PointerIsMine can still return true (due to some corruption presumably).

from tcmalloc.

erin2722 avatar erin2722 commented on June 6, 2024

Yup, that is also my interpretation of the code @dvyukov , which is validated by the state of the GuardedPageAllocator when the crash happens:

(gdb) f 0
#0  tcmalloc::tcmalloc_internal::GuardedPageAllocator::Deallocate (this=0x7fc1873258e0 <tcmalloc::tcmalloc_internal::Static::guardedpage_allocator_>, ptr=ptr@entry=0x438f3fe00000) at src/third_party/tcmalloc/dist/tcmalloc/guarded_page_allocator.cc:223
223	    *reinterpret_cast<char*>(ptr) = 'X';  // Trigger SEGV handler.
(gdb) p *this
$1 = {
  stacktrace_filter_ = {
    stack_hashes_with_count_ = {{
        <std::__atomic_base<unsigned long>> = {
          _M_i = 0
        }, 
      } <repeats 256 times>},
    max_slots_used_ = {
      <std::__atomic_base<unsigned long>> = {
        _M_i = 0
      }, 
    },
    replacement_inserts_ = {
      <std::__atomic_base<unsigned long>> = {
        _M_i = 0
      }, 
    }
  },
  guarded_page_lock_ = {
    lockword_ = {
      <std::__atomic_base<unsigned int>> = {
        _M_i = 0
      }, 
    }
  },
  free_pages_ = {true <repeats 128 times>, false <repeats 384 times>},
  num_alloced_pages_ = 0,
  num_alloced_pages_max_ = 0,
  num_successful_allocations_ = {
    value_ = {
      <std::__atomic_base<long>> = {
        _M_i = 0
      }, 
    }
  },
  num_failed_allocations_ = {
    value_ = {
      <std::__atomic_base<long>> = {
        _M_i = 0
      }, 
    }
  },
  data_ = 0x2d563ff86120,
  pages_base_addr_ = 0x438f3fc00000,
  pages_end_addr_ = 0x438f3fe02000,
  first_page_addr_ = 0x438f3fc02000,
  max_alloced_pages_ = 64,
  total_pages_ = 128,
  total_pages_used_ = 0,
  alloced_page_count_when_all_used_once_ = 0,
  page_size_ = 8192,
  rand_ = {
    <std::__atomic_base<unsigned long>> = {
      _M_i = 140469173639392
    }, 
  },
  initialized_ = true,
  allow_allocations_ = false,
  double_free_detected_ = true,
  write_overflow_detected_ = false
}

from tcmalloc.

erin2722 avatar erin2722 commented on June 6, 2024

We can see here that although allow_allocations_ is false, and num_successful_allocations_ is 0, the ptr argument is 0x438f3fe00000, which falls within the range of pages_base_addr_ to pages_end_addr_, causing PointerIsMine to succeed and the deallocation to go through validation.

It then detects a double free even though one is not present, because free_pages_ has been filled with true during initialization, and ReserveFreeSlot, which is what updates the free_pages_ to have false values for specific slots, will return early because allow_allocations_ is false, and so IsFreed will always return true, causing a false double-free detection.

from tcmalloc.

erin2722 avatar erin2722 commented on June 6, 2024

Just for extra info, I am using the tcmalloc version as of this commit 18777b1, and the issue started appearing after we upgraded from 093ba93.

from tcmalloc.

erin2722 avatar erin2722 commented on June 6, 2024

Hi all! After investigating, I believe that this was an issue with the porting of the tcmalloc build from bazel into our native build system-- we dropped the linkstatic=1 flags, and I think improper symbol resolution on dynamic builds was leading to this issue. Closing this issue, and thanks for the help!

from tcmalloc.

BlakeIsBlake avatar BlakeIsBlake commented on June 6, 2024

We may need to reopen this issue -- we ended up tracking down what's going on, and it's not related to linking.

TCMalloc introduced MAP_FIXED_NOREPLACE with this commit, which is broken on Linux kernel versions 4.17 and 4.18, fixed in 4.19.

This is what causes the issue seen in the beginning of the issue. In our testing on a machine with kernel version 4.18, this sequence of events can happen:

  1. The GuardedPageAllocator maps pages allocating roughly 2MB.
  2. In SampleifyAllocation, we end up creating a sampled page, which flows through to creating the first mmap region for sampled allocations and allocates 1GB for that region.
  3. If we are unlucky, the region created for sampled allocations will encompass the pages created for the GuardedPageAllocator, clobbering the GuardedPageAllocator's pages.
  4. A page that the GuardedPageAllocator believes it owns can be deallocated, tripping the check that the allocation is guarded, and ultimately causing a segfault because the GuardedPageAllocator believes it is seeing a double free.

There are a couple things we could do here, but I think the least invasive change would be to add another check into MapFixedNoReplaceFlagAvailable() to check if the currently running kernel version is susceptible to the MAP_FIXED_NOREPLACE bug.

from tcmalloc.

BlakeIsBlake avatar BlakeIsBlake commented on June 6, 2024

from tcmalloc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.