packtpublishing / the-art-of-writing-efficient-programs Goto Github PK

The Art of Writing Efficient Programs, published by Packt

License: MIT License

C 99.15% C++ 0.85%

the-art-of-writing-efficient-programs's Issues

It is unclear why 04_substring_sort_a.C is slower than 03_substring_sort_a.C

The book said that by the following chapter (Chapter 3) the reason why would have been clear, however I think that an explicit explanation is missing from the chapter, remaining opaque the reason why there is such a gap between indexing using an unsigned int versus a signed int.

Links:
04_substring_sort_a.C

03_substring_sort_a.C

Content missing...

File 11_compare_mbm_c.C has no content. Like to add a pull request, but the repo does not allow that.

Question about Chapter07 /02b_stack_cas.C

For push and pop function.

void push(const T& v) {
    // Reserve the slot for the new object by advancing the producer count.
    counts_t n = n_.load(std::memory_order_relaxed);
    if (n.pc_ == cap_) abort();
    int i = 0;
    while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_ + 1, n.cc_}, std::memory_order_acquire, std::memory_order_relaxed)) {
        if (n.pc_ == cap_) abort();
        nanosleep(i);
    };
    // Producer count advanced, slot n.pc_ + 1 is ours.
    ++n.pc_;
    new (&s_[n.pc_]) T(v);

    // Advance the consumer count to match.
    if (!n_.compare_exchange_strong(n, {n.pc_, n.cc_ + 1}, std::memory_order_release, std::memory_order_relaxed)) abort();
}
std::optional<T> pop() {
    // Decrement the consumer count.
    counts_t n = n_.load(std::memory_order_relaxed);
    if (n.cc_ == 0) return std::optional<T>(std::nullopt);
    int i = 0;
    while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_, n.cc_ - 1}, std::memory_order_acquire, std::memory_order_relaxed)) { 
        if (n.cc_ == 0) return std::optional<T>(std::nullopt);
        nanosleep(i);
    };
    // Consumer count decremented, slot n.cc_ - 1 is ours.
    --n.cc_;
    std::optional<T> res(std::move(s_[n.pc_]));
    s_[n.pc_].~T();

    // Decrement the producer count to match.
    if (!n_.compare_exchange_strong(n, {n.pc_ - 1, n.cc_}, std::memory_order_release, std::memory_order_relaxed)) abort();
    return res;
}

in
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_ + 1, n.cc_}, std::memory_order_acquire, std::memory_order_relaxed)) {
n.pc_ is already updated to n.pc_ + 1.
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_, n.cc_ - 1}, std::memory_order_acquire, std::memory_order_relaxed)) {
n.cc_ is already updated to n.cc_ -1.
why n.pc_ and n.cc_ updated again in the following:
++n.pc_;
--n.cc_;
are these update duplicate?

Chapter03 01_superscalar.C Illegal instruction (core dumped)

When compiling 01_superscalar.C with the flags listed in the book the following output appears:

$ clang++ -g -O3 -mavx2 -Wall -pedantic -I$GBENCH_DIR/include 01_superscalar.cpp $GBENCH_DIR/lib/libbenchmark.a -pthread -lrt -lm -o benchmark
$ ./benchmark

2023-11-04T17:51:30-05:00
Running ./benchmark
Run on (12 X 3900 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 0.00, 0.00, 0.00
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Illegal instruction (core dumped)

When analyzing in GDB this is the output:

Program received signal SIGILL, Illegal instruction.
BM_add (state=...) at 01_superscalar.cpp:19
19        a1 += p1[i] + p2[i];

Here is the relevant information about my system I am running this on:

$ uname -a
Linux arch-server 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
    CPU family:          6
    Model:               62
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            4
    CPU(s) scaling MHz:  40%
    CPU max MHz:         3900.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            6986.30
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
                          ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
                         c cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 s
                         se4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb s
                         tibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush
                         _l1d
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   192 KiB (6 instances)
  L1i:                   192 KiB (6 instances)
  L2:                    1.5 MiB (6 instances)
  L3:                    12 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-11
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

This is the code I am using:

 #include <stdlib.h>
 #include <string.h>
 #include <benchmark/benchmark.h>
 
 void BM_add(benchmark::State& state)
 {
     srand(1);
     const unsigned int N = state.range(0);
     std::vector<unsigned long> v1(N), v2(N);
     for(size_t i = 0; i < N; ++i){
         v1[i] = rand();
         v2[i] = rand();
     }
     unsigned long* p1 = v1.data();
     unsigned long* p2 = v2.data();
     for(auto _ : state){
         unsigned long a1 = 0;
         for(size_t i = 0; i < N; ++i){
             a1 += p1[i] + p2[i];
         }
         benchmark::DoNotOptimize(a1);
         benchmark::ClobberMemory();
     }
     state.SetItemsProcessed(N*state.iterations());
 }
 
 #define ARGS ->Arg(1 << 22)
 
 BENCHMARK(BM_add) ARGS;
 
 BENCHMARK_MAIN();

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.