Comments (2)
Yes what you propose would indeed reduce the latency by 1 clock cycle: from 5 of hadd to 4=1(unpackhi)+3(add).
But at the end, the reduction code is not so important, the important part is the loop body.
And in general, level 1 BLAS routines are not so important in what we do and can gain much less from optimization, compared to level 2 and especially 3 routines, and therefore they received less attention.
What I would found the most important reason to implement your improvement would be to get rid of the dependency on SSE3 in case of targeting machines with capabilities up to SSE2. I don't know if this is the case for you. The choice to target SSE3 (i.e. the Core microarchitecture) was to have a reasonable trade-off between handiness and availability of ISAs, also on embedded devices, which usually lag a bit behind.
from blasfeo.
Sure if you want to make the changes and make a PR, I would be happy to merge it. But otherwise I would leave it as it is for now, other stuff has higher priority from my side.
Thanks anyway for the suggestion :)
from blasfeo.
Related Issues (20)
- BLASFEO_PROCESSOR_FEATURES as identifier instead of object HOT 3
- SIGSEGV using hpipm HOT 1
- Bug in blasfeo drowpe?
- Problem with blasfeo_drowpe? HOT 4
- Error on multiple definition of `BLASFEO_PROCESSOR_FEATURES'
- Need to link against math library? HOT 1
- Linker error: SHF_MERGE section size (456) must be a multiple of sh_entsize (32) HOT 4
- Tests fail to build: libblasfeo.so: undefined reference to kernel_dpack_buffer_fn HOT 1
- Are there routines for matrix norms? HOT 2
- Incorrect documentation for dtrmm in blasfeo_d_blasfeo_api.h? HOT 3
- blasfeo_dtrmm_rltn not implemented HOT 1
- Missing symbols kernel_dpack_buffer_* in the shared library HOT 2
- Tests fail: error: undefined symbol: blasfeo_sgemm HOT 1
- When can we use parameter as both input and output? HOT 1
- What are m, n, k in dgemm routines? HOT 1
- blasfeo_target.h:1:0: error: unterminated #ifndef HOT 2
- MacOS M2 compiling issue HOT 2
- Calling certain triangular matrices routines leads to `undefined symbol` error HOT 2
- `blasfeo_dtrmm_rlnn` accesses invalid memory with offset on the lower triangular matrix HOT 1
- How can I build BLASFEO and HPIPM for microcontrollers? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blasfeo.