Comments (9)
Given two models, considering all problem sizes, the accuracy of model 1 p1
is 30% and for model 2 p2
is 95%.
In general, model 2 is much better than model 1. However, if an application only has the problem size that model 1 is good at, it could be better choice. The overhead of applying the models is also an important factor since the model 2 might be expensive to use.
Therefore, we need to consider specific application and overhead, not just the accuracy.
I think the overhead-adjusted accuracy can be designed as follows.
S = sum(t(x) * (g(x) XOR p(x))) - t_overhead
S: overhead-adjusted accuracy, which is the expected profit of a model.
x: problem size
sum: the total number of iterations in an application. Duplicated problem sizes can occur in the iterations.
g(x): the ground truth of choosing CPU or GPU for a particular problem size. 0 or 1 to indicate CPU or GPU.
p(x): the prediction of choosing CPU or GPU from a model. 0 or 1 to indicate CPU or GPU.
t(x): the saving time when the correct choice is made. t(x) = |t_cpu - t_gpu|.
g(x) XOR p(x): making a correct choice is 1, where the ground truth equals to the prediction. (1 XOR 1 = 1, 0 XOR 0 = 1)
from llnl-work-notes.
The design above enumerates the problem size of all the iterations. We could also use the probability of a problem size occurs in the equation instead. They means the same thing, but in different ways to represent.
S = sum(q(x) * t(x) * (g(x) XOR p(x))) - t_overhead
q(x): the probability of a certain problem size occur in the application. For example, an application may have 10 iterations that are 9 problem size 50 and 1 problem size 100.
In this case, q(50) = 0.9 and q(100) = 0.1.
sum: all the unique problem sizes. In the example above, it's 50 and 100.
from llnl-work-notes.
Assume we have the following data:
For each time of calling CPM model, the average overhead is 0.01 ms.
For each time of calling FPM model, the average overhead is 0.05 ms.
Problem size | Performance difference |CPU - GPU| (ms) |
---|---|
32 | 0.5 |
64 | 0.5 |
128 | 0.1 |
256 | 1 |
CPM can only make correct choice to problem size 32. Its accuracy is 25%.
FPM only makes the wrong choice to problem size 64. Its accuracy is 75%.
With the ground truth above, given a testing case using 2D stencil as example, there are 1000 iterations of problem size 32, 64, 128, 256 for each.
CPM: S1 = 0.25 * 0.5 - 0.01 = 0.115
FPM: S2 = 0.25 * 0.5 + 0.25 * 0.1 + 0.25 * 1 - 0.05 = 0.35
For this application, FPM is better than CPM.
Given another example, there are 2000 iterations of problem size 32 and 64 for each.
CPM: S1 = 0.5 * 0.5 - 0.01 = 0.24
FPM: S2 = 0.5 * 0.5 - 0.05 = 0.2
In this case, CPM works better than FPM.
from llnl-work-notes.
Beside the consideration of overhead and profit, this new metric is the reversed usage of balanced accuracy.
Balanced accuracy is used to fix the bias from imbalanced dataset to determine the actual performance of a model.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html#sklearn.metrics.balanced_accuracy_score
https://statisticaloddsandends.wordpress.com/2020/01/23/what-is-balanced-accuracy/
However, in our case, an application could be imbalanced. It means the application may have tons of iterations with problem size x1
and x2
, but no others. Therefore, to compare two models on this application, we have to calibrate their accuracy based on the imbalanced dataset x1
and x2
and ignore the rest problem sizes.
from llnl-work-notes.
So your S's final value is a unbounded execution time [0, infinite]. This is not portable across different applications or platforms. How do you measure a model's S to predict a set of applications on different machines?
The metric should be a ratio to be portable.
from llnl-work-notes.
g(x) XOR p(x) generates 0 or 1. This is also a problem if a model generates floating point values. Again, it is best to be a ratio against the ground truth.
from llnl-work-notes.
So your S's final value is a unbounded execution time [0, infinite]. This is not portable across different applications or platforms. How do you measure a model's S to predict a set of applications on different machines?
The metric should be a ratio to be portable.
With the ground truth of CPU and GPU performance data, we can make the right decision every time and save the most time.
We could use the ratio of actual profit of model against the optimal profit of ground truth, which is 0 to 1. 1 is not included because the optimal profit can never be reached after adding overhead.
S = (model_profit - overhead) / optimal_profit
= (sum(q(x) * | x/g_c(x) - x/g_g(x) | * sign(p_c(x) - p_g(x)) XOR sign(g_c(x) - g_g(x))) - t_overhead) / sum(q(x) * |x/g_c(x) - x/g_g(x)|)
S: overhead-adjusted accuracy, which is the percentage of optimal profit a model can get with given application and platform.
x: problem size
q(x): the probability of a certain problem size occur in the application. For example, an application may have 10 iterations that are 9 problem size 50 and 1 problem size 100. In this case, q(50) = 0.9 and q(100) = 0.1.
sum: all the unique problem sizes. In the example above, it's 50 and 100.
sign(...): get the sign of a given value, 1 for non-negative, 0 for negative.
g_c(x): the ground truth speed of CPU solving a particular problem size x. e.g. 301.45 element/ms
g_g(x): the ground truth of speed of GPU solving a particular problem size x. e.g. 8010.60 element/ms
p_c(x): the predicted speed of CPU solving a particular problem size x. e.g. element/ms
p_g(x): the predicted speed of GPU solving a particular problem size x. e.g. element/ms
t_overhead: the average overhead of calling the model
For previous two examples above:
Example 1:
CPM: S1 = (0.25 * 0.5 - 0.01) / (0.25 * 0.5 + 0.25 * 0.5 + 0.25 * 0.1 + 0.25 * 1) = 0.115 / 0.525 = 0.219
FPM: S2 = (0.25 * 0.5 + 0.25 * 0.1 + 0.25 * 1 - 0.05) / (0.25 * 0.5 + 0.25 * 0.5 + 0.25 * 0.1 + 0.25 * 1) = 0.35 / 0.525 = 0.667
Example 2:
CPM: S1 = (0.5 * 0.5 - 0.01) / (0.5 * 0.5 + 0.5 * 0.5) = 0.24 / 0.5 = 0.48
FPM: S2 = (0.5 * 0.5 - 0.05) / (0.5 * 0.5 + 0.5 * 0.5) = 0.2 / 0.5 = 0.4
With this revised score calculation, for a given application on a machine, we can directly have an impression of how good the model is. The score represents how much percentage of ideal performance the model can get.
For instance, in the example 1, CPM has a score of 0.219 and we can say this model is not good without comparing to other models. Given an application x on a machine y, if any model has a score of 0.99, we know it's a nearly perfect model in this case.
from llnl-work-notes.
The time cost of building and using a model is an unbounded value, such as 1ms, 10s, and so on. To fit this overhead into bounded metrics (0 to 1), we have to convert the value to a ratio.
Accuracy itself is a value without any units. To put the overhead that has a time unit into consideration, we have to add another variable of time cost to rule out the time unit of overhead.
Therefore, the time we can save by choosing the correct computing device should be used, which is considered as profit.
overhead-adjusted accuracy = accuracy - usage overhead percentage per iteration - building overhead percentage per iteration
usage overhead percentage per iteration: time cost of calling a model / profit based on ground truth
building overhead percentage per iteration: time cost of building a model / total iterations
There are two cases for using a model in the iterative algorithms.
-
Infinite iterations
In this case, if the model is only built once instead of built every iteration, the average building overhead equals to zero. By default, we assume there are unlimited number of iterations. -
Finite iterations
The overhead is calculated as described above.
Case 1:
Given a CPM model and infinite iterations, the accuracy is 20%, the usage overhead is 0.1%, the building overhead is 0. Then its overhead-adjusted accuracy is 19.9%.
Another FPM model, the accuracy is 80%, the usage overhead is 0.5%, the building overhead is 0.
Its overhead-adjusted accuracy is 79.5%.
Case 2:
With the same models in case 1, but CPM needs to be rebuilt every 10 iterations and its average building overhead is 10%. Now its overhead-adjusted accuracy is 9.9%.
Case 3:
With the same models in case 1, but the applications will only run 1000 iterations. The building overhead of CPM is 0.2% while the building overhead of FPM becomes 50% due to very long building time.
Now the overhead-adjusted accuracy of CPM is 19.7% and for FPM it's 29.5%.
from llnl-work-notes.
Consider the following time costs in the building time: 1. code update, 2. training, 3, postprocessing 4. curve fitting.
from llnl-work-notes.
Related Issues (20)
- Test Stencil on Pascal HOT 3
- Create a model for the computing that has multiple independent stages or portions HOT 3
- Create a metadirective guiding model named constant performance modeling HOT 1
- Test 2D stencil on Surface HOT 1
- Compare the performance of CPM HOT 1
- Compare FPM performance HOT 2
- Add CPM, FPM and ODDC related works to Overleaf
- Regenerate CPM and FPM testing results on Surface, Pascal and Lassen HOT 1
- Add the improvements to FreeCC since last publication to Overleaf HOT 1
- Add a feature in FreeCC to pass a file from container to the host HOT 1
- Find in ROSE where exactly Clang is called to create its own AST
- Revise ODDC description HOT 1
- Add Clang/LLVM background to FreeCC paper on Overleaf HOT 1
- Add ROSE background to FreeCC paper on Overleaf HOT 1
- Complete the TODO items in the FreeCC paper HOT 1
- Review the pull requests to FreeCC HOT 1
- Read the modeling papers from Dr. Liao and maybe use some of them
- Regenerate the table of detailed results HOT 1
- Rearrange the diagrams of testing results HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llnl-work-notes.