packtpublishing / the-art-of-writing-efficient-programs Goto Github PK
View Code? Open in Web Editor NEWThe Art of Writing Efficient Programs, published by Packt
License: MIT License
The Art of Writing Efficient Programs, published by Packt
License: MIT License
The book said that by the following chapter (Chapter 3) the reason why would have been clear, however I think that an explicit explanation is missing from the chapter, remaining opaque the reason why there is such a gap between indexing using an unsigned int versus a signed int.
Links:
04_substring_sort_a.C
File 11_compare_mbm_c.C has no content. Like to add a pull request, but the repo does not allow that.
For push and pop function.
void push(const T& v) {
// Reserve the slot for the new object by advancing the producer count.
counts_t n = n_.load(std::memory_order_relaxed);
if (n.pc_ == cap_) abort();
int i = 0;
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_ + 1, n.cc_}, std::memory_order_acquire, std::memory_order_relaxed)) {
if (n.pc_ == cap_) abort();
nanosleep(i);
};
// Producer count advanced, slot n.pc_ + 1 is ours.
++n.pc_;
new (&s_[n.pc_]) T(v);
// Advance the consumer count to match.
if (!n_.compare_exchange_strong(n, {n.pc_, n.cc_ + 1}, std::memory_order_release, std::memory_order_relaxed)) abort();
}
std::optional<T> pop() {
// Decrement the consumer count.
counts_t n = n_.load(std::memory_order_relaxed);
if (n.cc_ == 0) return std::optional<T>(std::nullopt);
int i = 0;
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_, n.cc_ - 1}, std::memory_order_acquire, std::memory_order_relaxed)) {
if (n.cc_ == 0) return std::optional<T>(std::nullopt);
nanosleep(i);
};
// Consumer count decremented, slot n.cc_ - 1 is ours.
--n.cc_;
std::optional<T> res(std::move(s_[n.pc_]));
s_[n.pc_].~T();
// Decrement the producer count to match.
if (!n_.compare_exchange_strong(n, {n.pc_ - 1, n.cc_}, std::memory_order_release, std::memory_order_relaxed)) abort();
return res;
}
in
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_ + 1, n.cc_}, std::memory_order_acquire, std::memory_order_relaxed)) {
n.pc_ is already updated to n.pc_ + 1.
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_, n.cc_ - 1}, std::memory_order_acquire, std::memory_order_relaxed)) {
n.cc_ is already updated to n.cc_ -1.
why n.pc_ and n.cc_ updated again in the following:
++n.pc_;
--n.cc_;
are these update duplicate?
When compiling 01_superscalar.C with the flags listed in the book the following output appears:
$ clang++ -g -O3 -mavx2 -Wall -pedantic -I$GBENCH_DIR/include 01_superscalar.cpp $GBENCH_DIR/lib/libbenchmark.a -pthread -lrt -lm -o benchmark
$ ./benchmark
2023-11-04T17:51:30-05:00
Running ./benchmark
Run on (12 X 3900 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x6)
L1 Instruction 32 KiB (x6)
L2 Unified 256 KiB (x6)
L3 Unified 12288 KiB (x1)
Load Average: 0.00, 0.00, 0.00
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Illegal instruction (core dumped)
When analyzing in GDB this is the output:
Program received signal SIGILL, Illegal instruction.
BM_add (state=...) at 01_superscalar.cpp:19
19 a1 += p1[i] + p2[i];
Here is the relevant information about my system I am running this on:
$ uname -a
Linux arch-server 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
CPU family: 6
Model: 62
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
Stepping: 4
CPU(s) scaling MHz: 40%
CPU max MHz: 3900.0000
CPU min MHz: 1200.0000
BogoMIPS: 6986.30
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
c cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 s
se4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb s
tibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush
_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 192 KiB (6 instances)
L1i: 192 KiB (6 instances)
L2: 1.5 MiB (6 instances)
L3: 12 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-11
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Srbds: Not affected
Tsx async abort: Not affected
This is the code I am using:
#include <stdlib.h>
#include <string.h>
#include <benchmark/benchmark.h>
void BM_add(benchmark::State& state)
{
srand(1);
const unsigned int N = state.range(0);
std::vector<unsigned long> v1(N), v2(N);
for(size_t i = 0; i < N; ++i){
v1[i] = rand();
v2[i] = rand();
}
unsigned long* p1 = v1.data();
unsigned long* p2 = v2.data();
for(auto _ : state){
unsigned long a1 = 0;
for(size_t i = 0; i < N; ++i){
a1 += p1[i] + p2[i];
}
benchmark::DoNotOptimize(a1);
benchmark::ClobberMemory();
}
state.SetItemsProcessed(N*state.iterations());
}
#define ARGS ->Arg(1 << 22)
BENCHMARK(BM_add) ARGS;
BENCHMARK_MAIN();
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.