Coder Social home page Coder Social logo

the-art-of-writing-efficient-programs's Issues

Content missing...

File 11_compare_mbm_c.C has no content. Like to add a pull request, but the repo does not allow that.

Question about Chapter07 /02b_stack_cas.C

For push and pop function.

void push(const T& v) {
    // Reserve the slot for the new object by advancing the producer count.
    counts_t n = n_.load(std::memory_order_relaxed);
    if (n.pc_ == cap_) abort();
    int i = 0;
    while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_ + 1, n.cc_}, std::memory_order_acquire, std::memory_order_relaxed)) {
        if (n.pc_ == cap_) abort();
        nanosleep(i);
    };
    // Producer count advanced, slot n.pc_ + 1 is ours.
    ++n.pc_;
    new (&s_[n.pc_]) T(v);

    // Advance the consumer count to match.
    if (!n_.compare_exchange_strong(n, {n.pc_, n.cc_ + 1}, std::memory_order_release, std::memory_order_relaxed)) abort();
}
std::optional<T> pop() {
    // Decrement the consumer count.
    counts_t n = n_.load(std::memory_order_relaxed);
    if (n.cc_ == 0) return std::optional<T>(std::nullopt);
    int i = 0;
    while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_, n.cc_ - 1}, std::memory_order_acquire, std::memory_order_relaxed)) { 
        if (n.cc_ == 0) return std::optional<T>(std::nullopt);
        nanosleep(i);
    };
    // Consumer count decremented, slot n.cc_ - 1 is ours.
    --n.cc_;
    std::optional<T> res(std::move(s_[n.pc_]));
    s_[n.pc_].~T();

    // Decrement the producer count to match.
    if (!n_.compare_exchange_strong(n, {n.pc_ - 1, n.cc_}, std::memory_order_release, std::memory_order_relaxed)) abort();
    return res;
}

in
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_ + 1, n.cc_}, std::memory_order_acquire, std::memory_order_relaxed)) {
n.pc_ is already updated to n.pc_ + 1.
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_, n.cc_ - 1}, std::memory_order_acquire, std::memory_order_relaxed)) {
n.cc_ is already updated to n.cc_ -1.
why n.pc_ and n.cc_ updated again in the following:
++n.pc_;
--n.cc_;
are these update duplicate?

Chapter03 01_superscalar.C Illegal instruction (core dumped)

When compiling 01_superscalar.C with the flags listed in the book the following output appears:

$ clang++ -g -O3 -mavx2 -Wall -pedantic -I$GBENCH_DIR/include 01_superscalar.cpp $GBENCH_DIR/lib/libbenchmark.a -pthread -lrt -lm -o benchmark
$ ./benchmark

2023-11-04T17:51:30-05:00
Running ./benchmark
Run on (12 X 3900 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 0.00, 0.00, 0.00
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Illegal instruction (core dumped)

When analyzing in GDB this is the output:

Program received signal SIGILL, Illegal instruction.
BM_add (state=...) at 01_superscalar.cpp:19
19        a1 += p1[i] + p2[i];

Here is the relevant information about my system I am running this on:

$ uname -a
Linux arch-server 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
    CPU family:          6
    Model:               62
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            4
    CPU(s) scaling MHz:  40%
    CPU max MHz:         3900.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            6986.30
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
                          ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
                         c cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 s
                         se4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb s
                         tibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush
                         _l1d
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   192 KiB (6 instances)
  L1i:                   192 KiB (6 instances)
  L2:                    1.5 MiB (6 instances)
  L3:                    12 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-11
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

This is the code I am using:

 #include <stdlib.h>
 #include <string.h>
 #include <benchmark/benchmark.h>
 
 void BM_add(benchmark::State& state)
 {
     srand(1);
     const unsigned int N = state.range(0);
     std::vector<unsigned long> v1(N), v2(N);
     for(size_t i = 0; i < N; ++i){
         v1[i] = rand();
         v2[i] = rand();
     }
     unsigned long* p1 = v1.data();
     unsigned long* p2 = v2.data();
     for(auto _ : state){
         unsigned long a1 = 0;
         for(size_t i = 0; i < N; ++i){
             a1 += p1[i] + p2[i];
         }
         benchmark::DoNotOptimize(a1);
         benchmark::ClobberMemory();
     }
     state.SetItemsProcessed(N*state.iterations());
 }
 
 #define ARGS ->Arg(1 << 22)
 
 BENCHMARK(BM_add) ARGS;
 
 BENCHMARK_MAIN();

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.