Coder Social home page Coder Social logo

Comments (16)

hak8or avatar hak8or commented on September 3, 2024

As I was cleaning up an old project of mine, I started thinking it might be a decent example showing a full use case comparing CMSIS and cppreg. On an unrelated note, the minimal example to do a clock tree init and get a LED toggling at ~1hz is only ~280 bytes!

For the performance.md, my idea is to include a godbolt based example for ::write, ::chained_write.with.with, and ::is_set (so four operations total) in terms of the C++ code and assembly output using -Os. This would be used to show that the resulting instructions cannot be optimized further.

Then, a full example, possibly implementing clock tree initialization and then some UART transmissions and LED blinking, with various optimization flags (-O0,-Os,-O1,-O2, -O3, -Og) for cppreg vs CMSIS. The data would be presented in the form of two tables, each with two columns (cppreg vs CMSIS) and 6 rows (optimization levels) table. The first table would be total size of the resulting binary in bytes, and the second table somehow showing how long it takes to do the clock init and a few million loops of the LED toggling and UART sending, in milliseconds or cycles.

from cppreg.

sendyne-nicocvn avatar sendyne-nicocvn commented on September 3, 2024

Sounds good to me.

The clock tree + UART + LED would be a great example. I assume 280 bytes is without startup code (i.e., no interrupts table and SystemInit) but that would actually be clearer that way. For the data this seems like a good start. Once we collect them we can decide on how to organize them.

I created the performance branch so that we can start putting code in the repository.

from cppreg.

hak8or avatar hak8or commented on September 3, 2024

from cppreg.

hak8or avatar hak8or commented on September 3, 2024

Scenario

I decided to heavily leverage godbolt in the comparison because I feel it will be much more accessible to anyone who will want to go through the edit-compile-lookatassembly loop themselves. They won't have to download a compiler, will be easier to show comparisons, etc.

Anyways, for a decent yet simple comparison it would be good to turn a led on, and then in a loop turn the led off, wait a bit turn the led on, and loop back. This is small enough that in terms of code it fits in a screen in terms of lines of code, is easy to understand by everyone, and doesn't bog the example down in terms of setting up a lot of registers (which would be needed to do a full demo including clocking and power gating).

Progress

Here is what I am working with right now (the URL is so long that even neither google nor bit.ly can shorten it!). It looks very similar but there are two interesting differences.

I want to focus a bit on #5 before continuing on writing up a performance comparison. @sendyne-nclauvelin Do let me know if you think the comparison scenario is appropriate though so the first task can be marked as complete.

from cppreg.

hak8or avatar hak8or commented on September 3, 2024

On a somewhat related note, I wonder if it would be worthwhile to include this in a larger (many registers) comparison somehow as another metric when looking at the assembly alone is not feasible.

from cppreg.

sendyne-nicocvn avatar sendyne-nicocvn commented on September 3, 2024

Subsequent to #6, #8 and #9 I think we can resume the performance assessment. I forwarded the performance branch to the current state.

from cppreg.

hak8or avatar hak8or commented on September 3, 2024

Agreed, will aim to have it in a good state by tonight.

from cppreg.

hak8or avatar hak8or commented on September 3, 2024

I was working and spotted two "issues" that cause differences.

  1. For registers which have a read only field and you are writing to another field, the bits relevant to the read only field are being cleared. I do not see why there should be more work done (clearing it) compared to just writing the old relevant bits back. I do not recall ever seeing a register that contains a read only field which required writing back 0's to the associated field (which would make it not read only), instead just that all writes were ignored. This does introduce overhead I feel because with cppreg as-is you cannot disable it (write back old values instead of clearing). What was your line of reasoning for clearing the bits?

  2. The lack of expressing related registers (CMSIS style structures mapped to memory) sadly does introduce "overhead" in the form of the compiler sticking two close addresses in .text and then overwriting an old CPU register with the new address (introducing more register pressure), instead to just doing memory interfacing using offsets.

This is the example I am planning to use for the comparison.

Ignoring those two issues, the assembly is identical.

from cppreg.

sendyne-nicocvn avatar sendyne-nicocvn commented on September 3, 2024

For the second point this relates to #7 and I have first to create an example which isolate the issue to understand a bit better what is exactly happening. For the first point can you provide a minimal example because I am not clear on what you are describing?

from cppreg.

sendyne-nicocvn avatar sendyne-nicocvn commented on September 3, 2024

Also in your example you have:

UART::STATUS::merge_write<UART::STATUS::Enable>(1).with<UART::STATUS::Sending>(1).done();

Using:

UART::STATUS::merge_write<UART::STATUS::Enable,1>().with<UART::STATUS::Sending,1>().done();

simplifies the generated assembly (look at the L3 branch). This an important detail ... cppreg uses a faster implementation if the data are known at compile time and to indicate that you need to use the template version. This is also true for regular write calls.

At this point the only difference is how offsets between registers is managed in cppreg. Will work on that for now.

from cppreg.

hak8or avatar hak8or commented on September 3, 2024

Ah, I didn't realize that could be done via template arguments for merge writes (is that part of the new API changes you did recently?).

The minimal example is here. Looking at it again, I realized I misread the assembly originally. It looked like the BICS instruction was clearing bits that are in the readonly field, but turns out it was actually clearing the bits that were being actually written to via the merge write. In that case, I agree that your fix (via template arg) does fix the issue.

When/if the register offsets concept gets put in, then the assembly should finally match for pretty much all use cases, hopefully.

from cppreg.

sendyne-nicocvn avatar sendyne-nicocvn commented on September 3, 2024

Ah, I didn't realize that could be done via template arguments for merge writes (is that part of the new API changes you did recently?).

No this was already there. As part of #10 I added more details to the API documentation regarding this particular point.

from cppreg.

hak8or avatar hak8or commented on September 3, 2024

What do you think as of 08bc98c?

from cppreg.

sendyne-nicocvn avatar sendyne-nicocvn commented on September 3, 2024

Looks good. I actually prefer that we only present a small example rather than a lengthy and complex one. I will fix some typos and tie it with the README.

from cppreg.

hak8or avatar hak8or commented on September 3, 2024

Awesome!

from cppreg.

sendyne-nicocvn avatar sendyne-nicocvn commented on September 3, 2024

The performance comparison is now available in the master branch so I consider this issue closed.

from cppreg.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.