Coder Social home page Coder Social logo

the-art-of-writing-efficient-programs's Introduction

The Art of Writing Efficient Programs

The Art of Writing Efficient Programs

This is the code repository for The Art of Writing Efficient Programs, published by Packt.

An advanced programmer's guide to efficient hardware utilization and compiler optimizations using C++ examples

What is this book about?

The great free lunch of "performance taking care of itself" is over. Until recently, programs got faster by themselves as CPUs were upgraded, but that doesn't happen anymore. The clock frequency of new processors has almost peaked. New architectures provide small improvements to existing programs, but this only helps slightly. Processors do get larger and powerful, but most of this new power is consumed by the increased number of processing cores and other “extra” computing units. To write efficient software, you now have to know how to program by making good use of the available computing resources, and this book will teach you how to do that.

This book covers the following exciting features:

  • Discover how to use the hardware computing resources in your programs effectively
  • Understand the relationship between memory order and memory barriers
  • Familiarize yourself with the performance implications of different data structures and organizations
  • Assess the performance impact of concurrent memory accessed and how to minimize it
  • Discover when to use and when not to use lock-free programming techniques
  • Explore different ways to improve the effectiveness of compiler optimizations
  • Design APIs for concurrent data structures and high-performance data structures to avoid inefficiencies

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

std::vector<double> v;
… add data to v …
std::for_each(v.begin(), v.end(),[](double& x){ ++x; });

Following is what you need for this book: This book is for experienced developers and programmers who work on performance-critical projects and want to learn different techniques to improve the performance of their code. Programmers who belong to algorithmic trading, gaming, bioinformatics, computational genomics, or computational fluid dynamics communities can learn various techniques from this book and apply them in their domain of work. Although this book uses the C++ language, the concepts demonstrated in the book can be easily transferred or applied to other compiled languages such as C, Java, Rust, Go, and more.

With the following software and hardware list you can run all code files present in the book (Chapter 1-12). For more instructions on how to run the code, click here.

Software and Hardware List

Chapter Software required OS required
1-12 C++ compiler (GCC, Clang, Visual Studio, etc.) Windows, Mac OS X, and Linux (Any)
1-12 Profiler (VTune, Perf, GoogleProf, etc.) Windows, Mac OS X, and Linux (Any)
1-12 Benchmark Library (Google Bench) Windows, Mac OS X, and Linux (Any)

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Errata

  • Page 25: The sentence "for example, if the actual string is generated from a simulation that takes ten hours, the one hundred seconds it takes to sort it is hardly worth noticing." must be read as "for example, if the actual string is generated from a simulation that takes ten hours, the one hundred milliseconds it takes to sort it is hardly worth noticing."
  • Page 45: The file name for the first code snippet must be read as 01a_compare_timer.C and the second code snippet must be read as 01a_compare_timer_a.C/01a_compare_timer_b.C.
  • Page 116: The sentence "As we have seen, under the right circumstances, the CPU can do several operations per second" must be read as "As we have seen, under the right circumstances, the CPU can do several operations per cycle".

Related products

Get to Know the Author

Fedor G. Pikus is a chief engineering scientist in the Mentor IC Segment of Siemens Digital Industries Software and is responsible for the long-term technical direction of Calibre products, the design and architecture of software, and research into new software technologies. His previous roles included senior software engineer at Google and chief software architect at Mentor Graphics. Fedor is a recognized expert in high-performance computing and C++. He has presented his works at CPPCon, SD West, DesignCon, and in software development journals, and is also an O'Reilly author. Fedor has over 25 patents and over 100 papers and conference presentations on physics, EDA, software design, and C++.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781800208117

the-art-of-writing-efficient-programs's People

Contributors

afshaank avatar fpikus avatar jubit-packt avatar packt-itservice avatar packt-pradeeps avatar packtutkarshr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

the-art-of-writing-efficient-programs's Issues

Content missing...

File 11_compare_mbm_c.C has no content. Like to add a pull request, but the repo does not allow that.

Question about Chapter07 /02b_stack_cas.C

For push and pop function.

void push(const T& v) {
    // Reserve the slot for the new object by advancing the producer count.
    counts_t n = n_.load(std::memory_order_relaxed);
    if (n.pc_ == cap_) abort();
    int i = 0;
    while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_ + 1, n.cc_}, std::memory_order_acquire, std::memory_order_relaxed)) {
        if (n.pc_ == cap_) abort();
        nanosleep(i);
    };
    // Producer count advanced, slot n.pc_ + 1 is ours.
    ++n.pc_;
    new (&s_[n.pc_]) T(v);

    // Advance the consumer count to match.
    if (!n_.compare_exchange_strong(n, {n.pc_, n.cc_ + 1}, std::memory_order_release, std::memory_order_relaxed)) abort();
}
std::optional<T> pop() {
    // Decrement the consumer count.
    counts_t n = n_.load(std::memory_order_relaxed);
    if (n.cc_ == 0) return std::optional<T>(std::nullopt);
    int i = 0;
    while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_, n.cc_ - 1}, std::memory_order_acquire, std::memory_order_relaxed)) { 
        if (n.cc_ == 0) return std::optional<T>(std::nullopt);
        nanosleep(i);
    };
    // Consumer count decremented, slot n.cc_ - 1 is ours.
    --n.cc_;
    std::optional<T> res(std::move(s_[n.pc_]));
    s_[n.pc_].~T();

    // Decrement the producer count to match.
    if (!n_.compare_exchange_strong(n, {n.pc_ - 1, n.cc_}, std::memory_order_release, std::memory_order_relaxed)) abort();
    return res;
}

in
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_ + 1, n.cc_}, std::memory_order_acquire, std::memory_order_relaxed)) {
n.pc_ is already updated to n.pc_ + 1.
while (!n.equal(n_) || !n_.compare_exchange_weak(n, {n.pc_, n.cc_ - 1}, std::memory_order_acquire, std::memory_order_relaxed)) {
n.cc_ is already updated to n.cc_ -1.
why n.pc_ and n.cc_ updated again in the following:
++n.pc_;
--n.cc_;
are these update duplicate?

Chapter03 01_superscalar.C Illegal instruction (core dumped)

When compiling 01_superscalar.C with the flags listed in the book the following output appears:

$ clang++ -g -O3 -mavx2 -Wall -pedantic -I$GBENCH_DIR/include 01_superscalar.cpp $GBENCH_DIR/lib/libbenchmark.a -pthread -lrt -lm -o benchmark
$ ./benchmark

2023-11-04T17:51:30-05:00
Running ./benchmark
Run on (12 X 3900 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 256 KiB (x6)
  L3 Unified 12288 KiB (x1)
Load Average: 0.00, 0.00, 0.00
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
Illegal instruction (core dumped)

When analyzing in GDB this is the output:

Program received signal SIGILL, Illegal instruction.
BM_add (state=...) at 01_superscalar.cpp:19
19        a1 += p1[i] + p2[i];

Here is the relevant information about my system I am running this on:

$ uname -a
Linux arch-server 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  12
  On-line CPU(s) list:   0-11
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
    CPU family:          6
    Model:               62
    Thread(s) per core:  2
    Core(s) per socket:  6
    Socket(s):           1
    Stepping:            4
    CPU(s) scaling MHz:  40%
    CPU max MHz:         3900.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            6986.30
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
                          ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_ts
                         c cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 s
                         se4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb s
                         tibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush
                         _l1d
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   192 KiB (6 instances)
  L1i:                   192 KiB (6 instances)
  L2:                    1.5 MiB (6 instances)
  L3:                    12 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-11
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Unknown: No mitigations
  Retbleed:              Not affected
  Spec rstack overflow:  Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

This is the code I am using:

 #include <stdlib.h>
 #include <string.h>
 #include <benchmark/benchmark.h>
 
 void BM_add(benchmark::State& state)
 {
     srand(1);
     const unsigned int N = state.range(0);
     std::vector<unsigned long> v1(N), v2(N);
     for(size_t i = 0; i < N; ++i){
         v1[i] = rand();
         v2[i] = rand();
     }
     unsigned long* p1 = v1.data();
     unsigned long* p2 = v2.data();
     for(auto _ : state){
         unsigned long a1 = 0;
         for(size_t i = 0; i < N; ++i){
             a1 += p1[i] + p2[i];
         }
         benchmark::DoNotOptimize(a1);
         benchmark::ClobberMemory();
     }
     state.SetItemsProcessed(N*state.iterations());
 }
 
 #define ARGS ->Arg(1 << 22)
 
 BENCHMARK(BM_add) ARGS;
 
 BENCHMARK_MAIN();

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.