While taking a more thorough look into the Linear Regression implementation, I'm seeing that Accuracy tends to report as 0%. Here is the code that is currently being used in (dev branch) Learner.cs:
// testing
object[] test = GetTestExamples(testingSlice, examples);
double accuracy = 0;
for (int j = 0; j < test.Length; j++)
{
// items under test
object o = test[j];
// get truth
var truth = Ject.Get(o, descriptor.Label.Name);
// if truth is a string, sanitize
if (descriptor.Label.Type == typeof(string))
truth = StringHelpers.Sanitize(truth.ToString());
// make prediction
var features = descriptor.Convert(o, false).ToVector();
var p = model.Predict(features);
var pred = descriptor.Label.Convert(p);
// assess accuracy
if (truth.Equals(pred))
accuracy += 1;
}
// get percentage correct
accuracy /= test.Length;
Then this is consumed later in Learner.Best
:
var q = from m in models
where m.Accuracy == (models.Select(s => s.Accuracy).Max())
select m;
return q.FirstOrDefault();
So basically, it iterates through the training slice, makes the prediction, and then assesses the success of the prediction against the truth. But currently, it only has one implementation of assessment: truth.Equals(pred)
. This then is consumed in the Learner.Best()
being getting the one with the highest (max) value of Accuracy
.
This approach means that unless two doubles are exactly equal (not likely except for possibly trivial data) that LinearRegression will always produce 0% Accuracy.
I wanted to abstract this out, but I wanted to get thoughts on how to approach this, as there are a lot of possible routes forward.
We could...
- Pass in more parameters.
- With or without creating overloads for convenience.
- Create a simple enum and pass this in as a single parameter.
- Avoids getting ridiculous with just shoving parameters.
- Create some kind of
TestOption
object/hierarchy and pass this in.
- Current implementation would be a descendant like
TruthEqualsPredictionTestOption
.
- This would also be the default to avoid breaking changes.
- Change
Learner
singleton implementation from static class to a singleton instance, in which case we could subclass Learner
with overrides for different methods.
I personally waver between the TestOption
approach and the Learner
changes. Each has its pros and cons.
With the TestOption
approach, we can easily keep from having breaking changes. But we would then have to change the Learner.Best()
method depending on what the options instance is, and we end up with a switch statement, or worse, an if-then-else chain.
With the Learner
singleton changes, we could more cleanly address the various capabilities of the Learner
class. But this would probably entail breaking changes. I could actually write an ILearnerThing
interface that has a default implementation that uses the current static class as-is, and this would avoid breaking changes. However, going forward, we would have a fragmented approach to using the library. Also, this would possibly (probably?) incorporate using DI of some sort which brings along with it more design decisions, i.e. complexity.
So, those are my thoughts. The goal is simply to get some accuracy with LinearRegression and do it in such a way that if we get a good statistician personage (or maybe one of you already is), it gives them easy access to a more robust assessment of accuracy without getting too YAGNI.