fig1024 / op_rbf Goto Github PK
View Code? Open in Web Editor NEWOptimized Recursive Bilateral Filter
License: MIT License
Optimized Recursive Bilateral Filter
License: MIT License
if (thread_index + 1 == RBF_MAX_THREADS)
height_segment = height - thread_index * height_segment;
should be changed to:
if (thread_index + 1 == RBF_MAX_THREADS)
height_segment = height - thread_index * height_segment - 1;
Please note that I removed the variable for number of threads in the class for my implementation.
Use author's (using optimized SSE2 with optional multithreading, pipelined 2 stages)Method, the left edge of the image after the calculation has filtering, why is it?
Hi, Fig1024, your optimization is awesome, it means a lot to me. thanks!
However, i found some bugs there, and below is some of my solutions
First, according to the comment statements from https://github.com/nothings/stb.git. Maybe we should add
#define STB_IMAGE_IMPLEMENTATION
#define STB_IMAGE_WRITE_IMPLEMENTATION
at the begining of RecursiveBilateralFilter.cpp. In some situation, it couldnot pass the compile if we don't.
Second, it crash in RBFilter_SSE2.cpp,when i run your code. After debug, I found it's because out_pix4 is not defined before used in function verticalFilter
pixB = _mm_shuffle_epi8(pixB, mask_pack);
out_pix4 = _mm_srli_si128(out_pix4, 4); // shift
out_pix4 = _mm_or_si128(out_pix4, pixB);
I try to change the codes to be
if (i == 0) //
{
out_pix4 = _mm_shuffle_epi8(pixB, mask_pack);
}
else
{
pixB = _mm_shuffle_epi8(pixB, mask_pack);
out_pix4 = _mm_srli_si128(out_pix4, 4); // shift
out_pix4 = _mm_or_si128(out_pix4, pixB);
}
It seem to work. But i'm not sure it this change is right.
Can you check here,please?
Thank you.
Original Recursive Bilateral Filter implementation
Image: lw_cross.jpg, size: 1280 x 720, time ms: 70.2
Image: Thefarmhouse.jpg, size: 1440 x 1080, time ms: 119.3
Image: testGirl.jpg, size: 448 x 626, time ms: 23.6
Optimized SSE2 single threaded, single stage (non-pipelined)
Image: lw_cross.jpg, size: 1280 x 720, time ms: 86.6
Image: Thefarmhouse.jpg, size: 1440 x 1080, time ms: 142.0
Image: testGirl.jpg, size: 448 x 626, time ms: 25.6
Optimized SSE2 2x multithreading, single stage (non-pipelined)
Image: lw_cross.jpg, size: 1280 x 720, time ms: 44.3
Image: Thefarmhouse.jpg, size: 1440 x 1080, time ms: 78.4
Image: testGirl.jpg, size: 448 x 626, time ms: 13.4
Optimized SSE2 4x multithreading, single stage (non-pipelined)
Image: lw_cross.jpg, size: 1280 x 720, time ms: 24.9
Image: Thefarmhouse.jpg, size: 1440 x 1080, time ms: 41.5
Image: testGirl.jpg, size: 448 x 626, time ms: 7.5
Optimized SSE2 4x2 thread pipelined 2 stages
Image: lw_cross.jpg, size: 1280 x 720, time ms: 18.6
Image: Thefarmhouse.jpg, size: 1440 x 1080, time ms: 32.0
Image: testGirl.jpg, size: 448 x 626, time ms: 5.9
Finish
您好,我的cpu是i7-8700 [email protected]的,但是这个复现结果和您的结果差距在十多倍,请问会是什么原因呢?谢谢
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.