Coder Social home page Coder Social logo

Comments (9)

ouankou avatar ouankou commented on July 20, 2024

Given two models, considering all problem sizes, the accuracy of model 1 p1 is 30% and for model 2 p2 is 95%.
In general, model 2 is much better than model 1. However, if an application only has the problem size that model 1 is good at, it could be better choice. The overhead of applying the models is also an important factor since the model 2 might be expensive to use.
Therefore, we need to consider specific application and overhead, not just the accuracy.

I think the overhead-adjusted accuracy can be designed as follows.

S = sum(t(x) * (g(x) XOR p(x))) - t_overhead

S: overhead-adjusted accuracy, which is the expected profit of a model.
x: problem size
sum: the total number of iterations in an application. Duplicated problem sizes can occur in the iterations.
g(x): the ground truth of choosing CPU or GPU for a particular problem size. 0 or 1 to indicate CPU or GPU.
p(x): the prediction of choosing CPU or GPU from a model. 0 or 1 to indicate CPU or GPU.
t(x): the saving time when the correct choice is made. t(x) = |t_cpu - t_gpu|.
g(x) XOR p(x): making a correct choice is 1, where the ground truth equals to the prediction. (1 XOR 1 = 1, 0 XOR 0 = 1)

from llnl-work-notes.

ouankou avatar ouankou commented on July 20, 2024

The design above enumerates the problem size of all the iterations. We could also use the probability of a problem size occurs in the equation instead. They means the same thing, but in different ways to represent.

S = sum(q(x) * t(x) * (g(x) XOR p(x))) - t_overhead

q(x): the probability of a certain problem size occur in the application. For example, an application may have 10 iterations that are 9 problem size 50 and 1 problem size 100.
In this case, q(50) = 0.9 and q(100) = 0.1.
sum: all the unique problem sizes. In the example above, it's 50 and 100.

from llnl-work-notes.

ouankou avatar ouankou commented on July 20, 2024

Assume we have the following data:
For each time of calling CPM model, the average overhead is 0.01 ms.
For each time of calling FPM model, the average overhead is 0.05 ms.

Problem size Performance difference |CPU - GPU| (ms)
32 0.5
64 0.5
128 0.1
256 1

CPM can only make correct choice to problem size 32. Its accuracy is 25%.
FPM only makes the wrong choice to problem size 64. Its accuracy is 75%.

With the ground truth above, given a testing case using 2D stencil as example, there are 1000 iterations of problem size 32, 64, 128, 256 for each.

CPM: S1 = 0.25 * 0.5 - 0.01 = 0.115
FPM: S2 = 0.25 * 0.5 + 0.25 * 0.1 + 0.25 * 1 - 0.05 = 0.35

For this application, FPM is better than CPM.

Given another example, there are 2000 iterations of problem size 32 and 64 for each.

CPM: S1 = 0.5 * 0.5 - 0.01 = 0.24
FPM: S2 = 0.5 * 0.5 - 0.05 = 0.2

In this case, CPM works better than FPM.

from llnl-work-notes.

ouankou avatar ouankou commented on July 20, 2024

Beside the consideration of overhead and profit, this new metric is the reversed usage of balanced accuracy.
Balanced accuracy is used to fix the bias from imbalanced dataset to determine the actual performance of a model.

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html#sklearn.metrics.balanced_accuracy_score
https://statisticaloddsandends.wordpress.com/2020/01/23/what-is-balanced-accuracy/

However, in our case, an application could be imbalanced. It means the application may have tons of iterations with problem size x1 and x2, but no others. Therefore, to compare two models on this application, we have to calibrate their accuracy based on the imbalanced dataset x1 and x2 and ignore the rest problem sizes.

from llnl-work-notes.

chunhualiao avatar chunhualiao commented on July 20, 2024

So your S's final value is a unbounded execution time [0, infinite]. This is not portable across different applications or platforms. How do you measure a model's S to predict a set of applications on different machines?

The metric should be a ratio to be portable.

from llnl-work-notes.

chunhualiao avatar chunhualiao commented on July 20, 2024

g(x) XOR p(x) generates 0 or 1. This is also a problem if a model generates floating point values. Again, it is best to be a ratio against the ground truth.

from llnl-work-notes.

ouankou avatar ouankou commented on July 20, 2024

So your S's final value is a unbounded execution time [0, infinite]. This is not portable across different applications or platforms. How do you measure a model's S to predict a set of applications on different machines?

The metric should be a ratio to be portable.

With the ground truth of CPU and GPU performance data, we can make the right decision every time and save the most time.
We could use the ratio of actual profit of model against the optimal profit of ground truth, which is 0 to 1. 1 is not included because the optimal profit can never be reached after adding overhead.

S = (model_profit - overhead) / optimal_profit
= (sum(q(x) * | x/g_c(x) - x/g_g(x) | * sign(p_c(x) - p_g(x)) XOR sign(g_c(x) - g_g(x))) - t_overhead) / sum(q(x) * |x/g_c(x) - x/g_g(x)|)

S: overhead-adjusted accuracy, which is the percentage of optimal profit a model can get with given application and platform.
x: problem size
q(x): the probability of a certain problem size occur in the application. For example, an application may have 10 iterations that are 9 problem size 50 and 1 problem size 100. In this case, q(50) = 0.9 and q(100) = 0.1.
sum: all the unique problem sizes. In the example above, it's 50 and 100.
sign(...): get the sign of a given value, 1 for non-negative, 0 for negative.
g_c(x): the ground truth speed of CPU solving a particular problem size x. e.g. 301.45 element/ms
g_g(x): the ground truth of speed of GPU solving a particular problem size x. e.g. 8010.60 element/ms
p_c(x): the predicted speed of CPU solving a particular problem size x. e.g. element/ms
p_g(x): the predicted speed of GPU solving a particular problem size x. e.g. element/ms
t_overhead: the average overhead of calling the model

For previous two examples above:

Example 1:

CPM: S1 = (0.25 * 0.5 - 0.01) / (0.25 * 0.5 + 0.25 * 0.5 + 0.25 * 0.1 + 0.25 * 1) = 0.115 / 0.525 = 0.219
FPM: S2 = (0.25 * 0.5 + 0.25 * 0.1 + 0.25 * 1 - 0.05) / (0.25 * 0.5 + 0.25 * 0.5 + 0.25 * 0.1 + 0.25 * 1) = 0.35 / 0.525 = 0.667

Example 2:

CPM: S1 = (0.5 * 0.5 - 0.01) / (0.5 * 0.5 + 0.5 * 0.5) = 0.24 / 0.5 = 0.48
FPM: S2 = (0.5 * 0.5 - 0.05) / (0.5 * 0.5 + 0.5 * 0.5) = 0.2 / 0.5 = 0.4

With this revised score calculation, for a given application on a machine, we can directly have an impression of how good the model is. The score represents how much percentage of ideal performance the model can get.
For instance, in the example 1, CPM has a score of 0.219 and we can say this model is not good without comparing to other models. Given an application x on a machine y, if any model has a score of 0.99, we know it's a nearly perfect model in this case.

from llnl-work-notes.

ouankou avatar ouankou commented on July 20, 2024

The time cost of building and using a model is an unbounded value, such as 1ms, 10s, and so on. To fit this overhead into bounded metrics (0 to 1), we have to convert the value to a ratio.
Accuracy itself is a value without any units. To put the overhead that has a time unit into consideration, we have to add another variable of time cost to rule out the time unit of overhead.
Therefore, the time we can save by choosing the correct computing device should be used, which is considered as profit.

overhead-adjusted accuracy = accuracy - usage overhead percentage per iteration - building overhead percentage per iteration

usage overhead percentage per iteration: time cost of calling a model / profit based on ground truth
building overhead percentage per iteration: time cost of building a model / total iterations

There are two cases for using a model in the iterative algorithms.

  1. Infinite iterations
    In this case, if the model is only built once instead of built every iteration, the average building overhead equals to zero. By default, we assume there are unlimited number of iterations.

  2. Finite iterations
    The overhead is calculated as described above.

Case 1:
Given a CPM model and infinite iterations, the accuracy is 20%, the usage overhead is 0.1%, the building overhead is 0. Then its overhead-adjusted accuracy is 19.9%.
Another FPM model, the accuracy is 80%, the usage overhead is 0.5%, the building overhead is 0.
Its overhead-adjusted accuracy is 79.5%.

Case 2:
With the same models in case 1, but CPM needs to be rebuilt every 10 iterations and its average building overhead is 10%. Now its overhead-adjusted accuracy is 9.9%.

Case 3:
With the same models in case 1, but the applications will only run 1000 iterations. The building overhead of CPM is 0.2% while the building overhead of FPM becomes 50% due to very long building time.
Now the overhead-adjusted accuracy of CPM is 19.7% and for FPM it's 29.5%.

from llnl-work-notes.

ouankou avatar ouankou commented on July 20, 2024

Consider the following time costs in the building time: 1. code update, 2. training, 3, postprocessing 4. curve fitting.

from llnl-work-notes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.