Comments (6)
Any particular reason you want OpenMP support? The current code is already much faster than real-time on x86 and faster ARM chips (e.g. smartphones, but not RPi yet).
from lpcnet.
Yes, we know. But it loads only 1 thread of cpu. And we are trying to parallel it with open mp. But our code is not works and ve have no idea why? We are wont to understand this promlem and overcome it. That's why we ask you about it.
This fact causes us more bewilderment. And we want to solve this problem. Though, just for fun. And in addition to further increase productivity. I think it does not hurt.
from lpcnet.
I'm not sure I understand this code, but I don't see how you can parallelize it without restructuring the data. As for my original question, parallelizing is a means to get the code to run fast enough, but the current code is already fast enough.
from lpcnet.
Yes i understand. But it interesting for me)
So now code is fully works. And lpcnet uses all cpu cores.
But in fact code is not become much faster (even a bit slower in little samples) and i have no idea why? When i looked through profiler sparse_sgemv_accum16
takes 80% of time that's while i've decided to parallel it. But performance even NOT faster 2 times... Can someone explain why it so?
static void sparse_sgemv_accum16(float *out, const float *weights, int rows, const int *idx, const float *x)
{
int i, j;
//initialization
const int *precomputed_idx[rows];
const float *precomputed_weights[rows];
for (i=0;i<rows;i+=16)
{
precomputed_weights[i] = weights;
weights += 16 * (*idx);
precomputed_idx[i] = idx++;
idx += *precomputed_idx[i];
}
#pragma omp parallel
{
const int *lc_idx;
const float *lc_weights;
float * restrict y;
__m256 vy0, vy8;
int cols;
for (i=0;i<rows;i+=16)
{
lc_weights = precomputed_weights[i];
y = &out[i];
vy0 = _mm256_loadu_ps(&y[0]);
vy8 = _mm256_loadu_ps(&y[8]);
lc_idx = precomputed_idx[i];
cols = *local_idx++;
#pragma omp critical
for (j=0;j<cols;j++)
{
int id;
__m256 vxj;
__m256 vw;
id = *lc_idx++;
vxj = _mm256_broadcast_ss(&x[id]);
vw = _mm256_loadu_ps(&lc_weights[0]);
vy0 = _mm256_fmadd_ps(vw, vxj, vy0);
vw = _mm256_loadu_ps(&lc_weights[8]);
vy8 = _mm256_fmadd_ps(vw, vxj, vy8);
lc_weights += 16;
}
_mm256_storeu_ps (&y[0], vy0);
_mm256_storeu_ps (&y[8], vy8);
}
}
}
UPD:
@SashaMN your code dose not works as #pragma omp parallel
parallels ALL cycles (OMG). And you have segmentation error after id = *local_idx++;
.
from lpcnet.
I ve parallelized main loop in functions from main and divided input data. It worked and I got a slight increase in performance but in those places where the data were divided, the bangs and noise could be heard. Therefore, this requires more complex work and not the fact that the performance gain will be good.
We can close this issue.
from lpcnet.
@jmvalin @gosha20777 faster ARM chips? which one, what is the main frequency? thank you.
Any particular reason you want OpenMP support? The current code is already much faster than real-time on x86 and faster ARM chips (e.g. smartphones, but not RPi yet).
from lpcnet.
Related Issues (20)
- Change of domain/sampling rate
- change model parameter does not work when rebuild lpcnet_demo? HOT 2
- Bug: MDense state restore crash with missing argument
- project version problem(tf2) HOT 4
- Heuristic doubling period trick by preprocessing pitch correlation values?
- Can't open input.pcm
- Is there a way to reduce the size of LPCNET_PACKET_SAMPLES and bits of per samples? HOT 1
- What does the "network size“ refer to on https://jmvalin.ca/demo/lpcnet/
- where is the gru_b_dense_feature defined?
- Does anyone have experience in jointly training of e2e LPCNet?
- Bitstream compatibility HOT 1
- P192 speed test in ARM A35 chip HOT 6
- "ValueError: axes don't match array" when applying --retrain flag to sample model file HOT 1
- I could get "nnet_data.*" files for the newly trained model. However after doing "make" and trying to generate signals with "lpcnet_demo", I find the reconstructions same as those ones of the pre-trained model. Any reason why this happens?
- bug
- How can it be so slow? HOT 1
- Training a new PLC model HOT 1
- Make errors HOT 8
- make error:undefined reference to `lpc_from_cepstrum' HOT 8
- How should the dataset of PLC algorithm be constructed?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lpcnet.