Your task today is to create a new class, MeanRegressor
, which implements a similar interface to a sklearn
regressor like sklearn.linear_model.LinearRegression
. However, unlike a more sophisticated model implementation, your model does not actually take any of the feature variables into account, it just always predicts the mean of the target variable of the training data.
- Build a class
MeanRegressor
that can be initialized - Write a method
MeanRegressor#fit(X, y)
:X
is a two-dimensional matrix (nested NumPy array, nested Python list, or Pandas dataframe) of data rows and features. Your model will be ignoring it.y
is a list (NumPy array or Python list) representing the target variable- The model should determine the mean of
y
and store it, to be used in thepredict
method - This method does not return anything
- Write a method
MeanRegressor#predict(X)
:X
is a two-dimensional matrix. Your model will be ignoring its features, and only using the count of rows.- This method returns the mean of the training data for each row of
X
, i.e. a list containing the same number repeated as many times as necessary.
- Write a method
MeanRegressor#score(X, y)
:X
is a two-dimensional matrix andy
is a list of target variables- This method will compute the R2 for how well the features of
X
are able to predict the target ofy
. As a reminder, R2 is calculated as 1 -residual sum of squares
/total sum of squares
, whereresidual sum of squares
is the sum of all ((y_true - y_pred)2) andtotal sum of squares
is the sum of all ((y_true - y_pred.mean())2). So, if you are scoring with the samey
that was used for thefit
, you should expect the score to be exactly zero.
These requirements have test coverage. To run the tests, run pytest
in bash from the root of this repository. There should be an initial print-out saying that you failed all 5 tests. Re-run pytest
as you implement the requirements, and eventually they should all pass.
You will need at least one attribute (AKA member variable) to make your MeanRegressor
work. One option would be storing some information about model fit that you manually create, and another option would be instantiating a DummyRegressor and passing user inputs into it.
If you have the previous requirements working and there is time remaining, consider these additional goals:
- Check whether
X
andy
are valid inputs and raise appropriate exceptions if they are not. For example, if the user tries to runfit
withX
having 5 rows andy
having 10 target variables, produce a readable/understandable error that explains why this is invalid input. - See if you can reuse code from your Mod 2 project, and feed the King County housing data into your
MeanRegressor
. How does this "dummy" model compare to your previous final model, in terms of R^2 and residuals? Explore this with data visualizations.