learn-co-curriculum / evaluating-regression-lines Goto Github PK

View Code? Open in Web Editor NEW

5.0 19.0 24.0 222 KB

License: Other

Jupyter Notebook 85.57% Python 14.43%

evaluating-regression-lines's Introduction

Evaluating regression lines

Learning Objectives

Understand what is meant by the errors of a regression line
Understand how to calculate the error at a given point
Understand how to calculate RSS and why we use it as a metric to evaluate a regression line
Understand the difference between RSS and its variation, the RMSE

Introduction

So far we have seen how lines and formulas can estimate outputs given an input. We can describe any straight line with two different variables:

$m$ - the slope of the line, and
$b$ - the y-intercept

So far we have been rather fast and loose with choosing a line to estimate our output - we simply drew a line between the first and last points of our data set. Well today, we go further. Here, we take our first step toward training our model to match our data.

The first step in training is to calculate our regression line's accuracy -- that is, how well our regression line matches our actual data. Calculating a regression line's accuracy is the topic of this lesson.

In future lessons, we will improve upon our regression line's accuracy, so that it better predicts an output.

Determining Quality

The first step towards calculating a regression line to predict an output is to calculate how well any regression line matches our data. We need to calculate how accurate our regression line is.

Let's find out what this means. Below we have data that represents the budget and revenue of four shows, with x being the budget and y being the revenue.

first_show = {'x': 0, 'y': 100}
second_show = {'x': 100, 'y': 150}
third_show = {'x': 200, 'y': 600}
fourth_show = {'x': 400, 'y': 700}

shows = [first_show, second_show, third_show, fourth_show]
shows

Run code above with shift + enter

An initial regression line

As we did in the last lab, let's draw a not-so-great regression line simply by drawing a line between our first and last points. We can use our build_regression_line function to do so. You can view the code directly here.

Eventually, we'll improve this regression line. But first we need to see how good or bad a regression line is.

from linear_equations import build_regression_line
x_values = list(map(lambda show: show['x'],shows))
y_values = list(map(lambda show: show['y'],shows))
regression_line = build_regression_line(x_values, y_values)
regression_line

We can plot our regression line as the following using the plotting functions that we wrote previously:

from graph import m_b_trace, plot, trace_values
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)
data_trace = trace_values(x_values, y_values)
regression_trace = m_b_trace(regression_line['m'], regression_line['b'], x_values)
plot([regression_trace, data_trace])

So that is what our regression line looks like. And this the line translated into a function.

def sample_regression_formula(x):
    return 1.5(x) + 100

Assessing the regression line

Ok, so now that we see what our regression line looks like, let's highlight how well our regression line matches our data.

Let's interpret the chart above. That first red line shows that our regression formula does not perfectly predict that first show.

Our actual data -- the first blue dot -- shows that when $x = 100$, $y = 150$.

However, our regression line predicts that at $x = 100$, $y = 250$.

So our regression line is off by 100, indicated by the length of the red line.

Each point where our regression line's estimated differs from the actual data is called an error. And our red lines display the size of this error. The length of the red line equals the size of the error.

The error equals the difference between the actual value and the value expected by our model (that is, our regression line).
error = actual - expected

Now let's put this formula into practice. The error is the actual value minus the expected value. So at point $x = 100$, the actual $y$ is 150. And at point x = 100, the expected value of $y$ is $250$. So:

error = $150 - 250 = -100$.

If we did not have a graph to display this, we could calculate this error by using our formula for the regression line.

Our regression formula is $y = 1.5x + 100$.
Then when $x$ equals 100, the formula predicts $y = 1.5 * 100 + 100 = 250$.
And we have the actual data of (100, 150). So
actual - expected $ = 150 -250 = -100$.

Refining our Terms

Now that we have explained how to calculate an error given a regression line and data, let's learn some mathematical notation that let's us better express these concepts.

We want to use notation to distinguish between two things: our expected $y$ values and our actual $y$ values.

Expected values

So far we have defined our regression function as $y = mx + b$. Where for a given value of $x$, we can calculate the value of $y$. However, this is not totally accurate - as our regression line is not calculating the actual value of $y$ but the expected value of $y$. So let's indicate this, by changing our regression line formula to look like the following:

$\hat{y} = \hat{m}x + \hat{b}$

Those little dashes over the $y$, $m$ and $b$ are called hats. So our function reads as y-hat equals m-hat multiplied by $x$ plus b-hat. These hats indicate that this formula does not give us the actual value of $y$, but simply our estimated value of $y$. The hats also say that this estimated value of $y$ is based on our estimated values of $m$ and $b$.

Note that $x$ is not a predicted value. This is because we are providing a value of $x$, not predicting it. For example, we are providing an show's budget as an input, not predicting it. So we are providing a value of $x$ and asking it to predict a value of $y$.

Actual values

Now remember that we were given some real data as well. This means that we do have actual points for $x$ and $y$, which look like the following.

first_show = {'x': 0, 'y': 100}
second_show = {'x': 100, 'y': 150}
third_show = {'x': 200, 'y': 600}
fourth_show = {'x': 400, 'y': 700}

shows = [first_show, second_show, third_show, fourth_show]
shows

So how do we represent our actual values of $y$? Here's how: $y$. No extra ink is needed.

Ok, so now we know the following:

$y$: actual y
$\hat{y}$: estimated y

Finally, we use the Greek letter $\varepsilon$, epsilon, to indicate error. So we say that

$\varepsilon = y - \hat{y}$.

We can be a little more precise by saying we are talking about error at any specific point, where $y$ and $\hat{y}$ are at that $x$ value. This is written as:

$\varepsilon {i}$ = $y{i}$ - $\hat{y}_{i}$

Those little $i$s represent an index value, as in our first, second or third movie. Now, applying this to a specific point of say when $ x = 100 $, we can say:

$\varepsilon {x=100} = y{x=100}$ - $\hat{y}_{x=100} = 150 - 250 = -100$

Calculating and representing total error

We now know how to calculate the error at a given value of $x$, $x_i$, by using the formula, $\varepsilon_i$ = $y_i - \hat{y_i}$. Again, this is helpful at describing how well our regression line predicts the value of $y$ at a specific point.

However, we want to see well our regression describes our dataset in general - not just at a single given point. Let's move beyond calculating the error at a given point to describing the total error of the regression line across all of our data.

As an initial approach, we simply calculate the total error by summing the errors, $y - \hat{y}$, for every point in our dataset.

Total Error = $\sum_{i=1}^{n} y_i - \hat{y_i}$

This isn't bad, but we'll need to modify this approach slightly. To understand why, let's take another look at our data.

The errors at $x = 100$ and $x = 200$ begin to cancel each other out.

$\varepsilon_{x=100}= 150 - 250 = -100$
$\varepsilon_{x=200} = 600 - 400 = 200$
$\varepsilon_{x=100} + \varepsilon_{x=200} = -100 + 200 = 100 $

We don't want the errors to cancel each other out! To resolve this issue, we square the errors to ensure that we are always summing positive numbers.

${\varepsilon_i^2}$ = $({y_i - \hat{y_i}})^2$

So given a list of points with coordinates (x, y), we can calculate the squared error of each of the points, and sum them up. This is called our ** residual sum of squares ** (RSS). Using our sigma notation, our formula RSS looks like:

$ RSS = \sum_{i = 1}^n ({y_i - \hat{y_i}})^2 = \sum_{i = 1}^n \varepsilon_i^2 $

Residual Sum of Squares is just what it sounds like. A residual is simply the error -- the difference between the actual data and what our model expects. We square each residual and add them together to get RSS.

Let's calculate the RSS for our regression line and associated data. In our example, we have actual $x$ and $y$ values at the following points:

$ (0, 100), (100, 150), (200, 600), (400, 700) $.

And we can calculate the values of $\hat{y} $ as $\hat{y} = 1.5 *x + 100 $, for each of those four points. So this gives us:

$RSS = (100 - 100)^2 + (150 - 250)^2 + (600 - 400)^2 + (700 - 700)^2$

which reduces to

$RSS = 0^2 + (-100)^2 + 200^2 + 0^2 = 50,000$

Now we have one number, the RSS, that represents how well our regression line fits the data. We got there by calculating the errors at each of our provided points, and then squaring the errors so that our errors are always positive.

Root Mean Squared Error

Root Mean Squared Error (RMSE), is just a variation on RSS. Essentially, it tries to answer the question of what is the "typical" error of our model versus each data point. To do this, it scales down the size of that large RSS number by taking the square root of the RSS divided by the number of data points:

$ RSS = \sum_{i = 1}^n ({y_i - \hat{y_i}})^2$
$RMSE = \sqrt{\frac{RSS}{n}} $

Where n equals the number of elements in the data set.

Now let's walk through the reasoning for each step.

Taking the mean

The first thing that makes our RSS large is the fact that we square each error. Remember that we squared each error, because we didn't want positive errors and negative errors to cancel out. Remember, we said that each place where we had a negative error, as in :

$actual - expected = -100$

We would square the error, such that $(-100)^2 = 10,000$.

Remember that we square each of our errors and add them together, which led to:

$RSS = 0^2 + (-100)^2 + 200^2 + 0^2 = 50,000$

We then take the mean to get the average squared error (also called "mean squared error" or "MSE" for short:

$MSE = \frac{50,000}{4}=12,500$

We do this because with each additional data point in our data set, our error will tend to increase. So with increasing dataset size, RSS also increases. To counteract the effect of RSS increasing with the dataset size and not just accuracy, we divide by the size of the dataset.

Taking the square root

The last step in calculating the RMSE, is to take the square root of the MSE:

$RMSE = \sqrt{12,500} = 111.8$

In general, the RMSE is calculated as:
$ RMSE = \sqrt{\frac{\sum_{i = 1}^n ({y_i - \hat{y_i}})^2}{n}} $

So the RMSE gives a typical estimate of how far each measurement is from the expectation. So this is "typical error" as opposed to an overall error.

Summary

Before this lesson, we simply assumed that our regression line made good predictions of $y$ for given values of $x$. In this lesson, we learned a metric that tells us how well our regression line fits our actual data. To do this, we started looking at the error at a given point, and defined error as the actual value of $y$ minus the expected value of $y$ from our regression line. Then we were able to determine how well our regression line describes the entire dataset by squaring the errors at each point (to eliminate negative errors), and adding these squared errors. This is called the Residual Sum of Squares (RSS). This is our metric for describing how well our regression line fits our data. Lastly, we learned how the RMSE tells us the "typical error" by dividing the square root of the RSS by the number of elements in our dataset.

evaluating-regression-lines's People

Contributors

Stargazers

Watchers

evaluating-regression-lines's Issues

Calculating and representing total error: typo in m

"Let's calculate the RSS for our regression line and associated data. In our example, we have actual 𝑥 and 𝑦 values at the following points:

(0,100),(100,150),(200,600),(400,700) .
And we can calculate the values of 𝑦̂ as 𝑦̂ =1.5∗𝑥+100 , for each of those four points. "

In the excerpt above, shouldn't m = 0.5 or 4.5?

m=(150-100)/100
print(m)
0.5

m = (600-150)/100
print(m)
4.5

m=(700-600)/200
print(m)
0.5

I'm new to the concept and still learning, so apologies if I'm missing something here.

error un functions line_function_trace and m_b_trace

In: evaluating-regression-lines/graph.py:

line 10: def line_function_trace(line_function, x_values, mode = 'line', name = 'line function')
Instead of: mode = 'line'
should be: mode = 'lines'

line 23: def m_b_trace(m, b, x_values, mode = 'line', name = 'line function')
Instead of: mode = 'line'
should be: mode = 'lines'

Calculating and Representing Error

In this section there seems to be a typo error in the par that shows the calculation

(0−0)2+(150−250)2+(600−400)2+(700−700)2RSS=(0−0)2+(150−250)2+(600−400)2+(700−700)2

the first error says 0-0 but the actual value is 100 not 0.

RMSE formula incorrect?

I think the formula given for RMSE in the Evaluating Regression Lines lessons is wrong. I'm pretty sure the square root needs to include the denominator, not just the numerator.

Error in plotting regression line

it looks like m_b_trace is taking in mode = 'line' instead of mode = 'lines'

ValueError:
Invalid value of type 'builtins.str' received for the 'mode' property of scatter
Received value: 'line'

The 'mode' property is a flaglist and may be specified
as a string containing:
  - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
    (e.g. 'lines+markers')
    OR exactly one of ['none'] (e.g. 'none')

adding errors

εx=100=150−250=−100
εx=200=600−400=200
εx=100+εx=200=−150+200=50

shouldn't the last line read:
εx=100+εx=200=−100+200=100

mode = 'line' error

I think in:

def line_function_trace(line_function, x_values, mode = 'line', name = 'line function'):

mode needs to = 'lines'... python seems like it doesn't like 'line'

Function Syntax Errors

File: evaluating-regression-lines/graph.py

def line_function_trace(line_function, x_values, mode = 'line', name = 'line function'):
values = line_function_data(line_function, x_values)
values.update({'mode': mode, 'name': name})
return values

def m_b_trace(m, b, x_values, mode = 'line', name = 'line function'):
values = m_b_data(m, b, x_values)
values.update({'mode': mode, 'name': name})
return values

(mode = 'line') should be (mode = 'lines')

Typo

"Under heading refining our terms" the following section has a typo that I've put in bold and italics:

Note that 𝑥 is not a predicted value. This is because we are providing a value of 𝑥 , not predicting it. For example, we are providing an show's budget as an input, not predicting it. So we are providing a value of 𝑥 and asking it to predict a value of 𝑦 .

line = lines? Plotly update

Hi folks,

In the pre-written def's for this lesson, plotly returns a ValueError that the mode 'line' doesn't exists.

I pasted in the def's associated from the git page with 'lines' instead and it seems to work.

Error message from provided code

From this pre-written code towards the top of the lesson: from graph import m_b_trace, plot, trace_values from plotly.offline import iplot, init_notebook_mode init_notebook_mode(connected=True) data_trace = trace_values(x_values, y_values) regression_trace = m_b_trace(regression_line['m'], regression_line['b'], x_values) plot([regression_trace, data_trace])

I get the following error message (on Chrome v70.0.3538.110, Windows 10) and am unable to produce the graph:
ValueErrorTraceback (most recent call last)
in ()
4 data_trace = trace_values(x_values, y_values)
5 regression_trace = m_b_trace(regression_line['m'], regression_line['b'], x_values)
----> 6 plot([regression_trace, data_trace])

~/evaluating-regression-lines-data-science-intro-000/graph.py in plot(traces, layout)
28 def plot(traces, layout = {}):
29 if not isinstance(traces, list): raise TypeError('first argument must be a list. Instead is', traces)
---> 30 plotly.offline.iplot({'data': traces, 'layout': layout})
31
32 def build_layout(x_range = None, y_range = None, options = {}):

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/offline/offline.py in iplot(figure_or_data, show_link, link_text, validate, image, filename, image_width, image_height, config)
334 config.setdefault('linkText', link_text)
335
--> 336 figure = tools.return_figure_from_figure_or_data(figure_or_data, validate)
337
338 # Though it can add quite a bit to the display-bundle size, we include

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/tools.py in return_figure_from_figure_or_data(figure_or_data, validate_figure)
1467
1468 try:
-> 1469 figure = Figure(**figure).to_dict()
1470 except exceptions.PlotlyError as err:
1471 raise exceptions.PlotlyError("Invalid 'figure_or_data' argument. "

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/graph_objs/_figure.py in init(self, data, layout, frames)
312 respective traces in the data attribute
313 """
--> 314 super(Figure, self).init(data, layout, frames)
315
316 def add_area(

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/basedatatypes.py in init(self, data, layout_plotly, frames)
114
115 # ### Import traces ###
--> 116 data = self._data_validator.validate_coerce(data)
117
118 # ### Save tuple of trace objects ###

/opt/conda/envs/learn-env/lib/python3.6/site-packages/_plotly_utils/basevalidators.py in validate_coerce(self, v)
1953 invalid_els.append(v_el)
1954 else:
-> 1955 trace = self.class_maptrace_type
1956 res.append(trace)
1957 else:

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/graph_objs/_scatter.py in init(self, arg, cliponaxis, connectgaps, customdata, customdatasrc, dx, dy, error_x, error_y, fill, fillcolor, hoverinfo, hoverinfosrc, hoverlabel, hoveron, hovertext, hovertextsrc, ids, idssrc, legendgroup, line, marker, mode, name, opacity, r, rsrc, selected, selectedpoints, showlegend, stream, t, text, textfont, textposition, textpositionsrc, textsrc, tsrc, uid, unselected, visible, x, x0, xaxis, xcalendar, xsrc, y, y0, yaxis, ycalendar, ysrc, **kwargs)
2083 self.marker = marker if marker is not None else _v
2084 _v = arg.pop('mode', None)
-> 2085 self.mode = mode if mode is not None else _v
2086 _v = arg.pop('name', None)
2087 self.name = name if name is not None else _v

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/basedatatypes.py in setattr(self, prop, value)
2691 prop in self._validators):
2692 # Let known properties and private properties through
-> 2693 super(BasePlotlyType, self).setattr(prop, value)
2694 else:
2695 # Raise error on unknown public properties

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/graph_objs/_scatter.py in mode(self, val)
828 @mode.setter
829 def mode(self, val):
--> 830 self['mode'] = val
831
832 # name

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/basedatatypes.py in setitem(self, prop, value)
2663 # ### Handle simple property ###
2664 else:
-> 2665 self._set_prop(prop, value)
2666
2667 # Handle non-scalar case

/opt/conda/envs/learn-env/lib/python3.6/site-packages/plotly/basedatatypes.py in _set_prop(self, prop, val)
2893 # ------------
2894 validator = self._validators.get(prop)
-> 2895 val = validator.validate_coerce(val)
2896
2897 # val is None

/opt/conda/envs/learn-env/lib/python3.6/site-packages/_plotly_utils/basevalidators.py in validate_coerce(self, v)
1506 validated_v = self.vc_scalar(v)
1507 if validated_v is None:
-> 1508 self.raise_invalid_val(v)
1509
1510 v = validated_v

/opt/conda/envs/learn-env/lib/python3.6/site-packages/_plotly_utils/basevalidators.py in raise_invalid_val(self, v)
214 typ=type_str(v),
215 v=repr(v),
--> 216 valid_clr_desc=self.description()))
217
218 def raise_invalid_elements(self, invalid_els):

ValueError:
Invalid value of type 'builtins.str' received for the 'mode' property of scatter
Received value: 'line'

The 'mode' property is a flaglist and may be specified
as a string containing:
  - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
    (e.g. 'lines+markers')
    OR exactly one of ['none'] (e.g. 'none')

readme code causes error

editing the graph.py file from mode = 'line' to 'lines' did not fix the issue.

from graph import m_b_trace, plot, trace_values
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)
data_trace = trace_values(x_values, y_values)
regression_trace = m_b_trace(regression_line['m'], regression_line['b'], x_values)
plot([regression_trace, data_trace])

ValueError Traceback (most recent call last)
in ()
4 data_trace = trace_values(x_values, y_values)
5 regression_trace = m_b_trace(regression_line['m'], regression_line['b'], x_values)
----> 6 plot([regression_trace, data_trace])

~/Flatiron Machine Learning/graph.py in plot(traces, layout)
28 def plot(traces, layout = {}):
29 if not isinstance(traces, list): raise TypeError('first argument must be a list. Instead is', traces)
---> 30 plotly.offline.iplot({'data': traces, 'layout': layout})
31
32 def build_layout(x_range = None, y_range = None, options = {}):

/anaconda3/lib/python3.6/site-packages/plotly/offline/offline.py in iplot(figure_or_data, show_link, link_text, validate, image, filename, image_width, image_height, config)
337 config.setdefault('linkText', link_text)
338
--> 339 figure = tools.return_figure_from_figure_or_data(figure_or_data, validate)
340
341 # Though it can add quite a bit to the display-bundle size, we include

/anaconda3/lib/python3.6/site-packages/plotly/tools.py in return_figure_from_figure_or_data(figure_or_data, validate_figure)
1477
1478 try:
-> 1479 figure = Figure(**figure).to_dict()
1480 except exceptions.PlotlyError as err:
1481 raise exceptions.PlotlyError("Invalid 'figure_or_data' argument. "

/anaconda3/lib/python3.6/site-packages/plotly/graph_objs/_figure.py in init(self, data, layout, frames, skip_invalid)
378 is invalid AND skip_invalid is False
379 """
--> 380 super(Figure, self).init(data, layout, frames, skip_invalid)
381
382 def add_area(

/anaconda3/lib/python3.6/site-packages/plotly/basedatatypes.py in init(self, data, layout_plotly, frames, skip_invalid)
137 # ### Import traces ###
138 data = self._data_validator.validate_coerce(data,
--> 139 skip_invalid=skip_invalid)
140
141 # ### Save tuple of trace objects ###

/anaconda3/lib/python3.6/site-packages/_plotly_utils/basevalidators.py in validate_coerce(self, v, skip_invalid)
2096 else:
2097 trace = self.class_map[trace_type](
-> 2098 skip_invalid=skip_invalid, **v_copy)
2099 res.append(trace)
2100 else:

/anaconda3/lib/python3.6/site-packages/plotly/graph_objs/_scatter.py in init(self, arg, cliponaxis, connectgaps, customdata, customdatasrc, dx, dy, error_x, error_y, fill, fillcolor, groupnorm, hoverinfo, hoverinfosrc, hoverlabel, hoveron, hovertext, hovertextsrc, ids, idssrc, legendgroup, line, marker, mode, name, opacity, orientation, r, rsrc, selected, selectedpoints, showlegend, stackgaps, stackgroup, stream, t, text, textfont, textposition, textpositionsrc, textsrc, tsrc, uid, unselected, visible, x, x0, xaxis, xcalendar, xsrc, y, y0, yaxis, ycalendar, ysrc, **kwargs)
2324 self['marker'] = marker if marker is not None else _v
2325 _v = arg.pop('mode', None)
-> 2326 self['mode'] = mode if mode is not None else _v
2327 _v = arg.pop('name', None)
2328 self['name'] = name if name is not None else _v

/anaconda3/lib/python3.6/site-packages/plotly/basedatatypes.py in setitem(self, prop, value)
2773 # ### Handle simple property ###
2774 else:
-> 2775 self._set_prop(prop, value)
2776
2777 # Handle non-scalar case

/anaconda3/lib/python3.6/site-packages/plotly/basedatatypes.py in _set_prop(self, prop, val)
3009 return
3010 else:
-> 3011 raise err
3012
3013 # val is None

/anaconda3/lib/python3.6/site-packages/plotly/basedatatypes.py in _set_prop(self, prop, val)
3004 validator = self._validators.get(prop)
3005 try:
-> 3006 val = validator.validate_coerce(val)
3007 except ValueError as err:
3008 if self._skip_invalid:

/anaconda3/lib/python3.6/site-packages/_plotly_utils/basevalidators.py in validate_coerce(self, v)
1551 validated_v = self.vc_scalar(v)
1552 if validated_v is None:
-> 1553 self.raise_invalid_val(v)
1554
1555 v = validated_v

/anaconda3/lib/python3.6/site-packages/_plotly_utils/basevalidators.py in raise_invalid_val(self, v)
242 typ=type_str(v),
243 v=repr(v),
--> 244 valid_clr_desc=self.description()))
245
246 def raise_invalid_elements(self, invalid_els):

ValueError:
Invalid value of type 'builtins.str' received for the 'mode' property of scatter
Received value: 'line'

The 'mode' property is a flaglist and may be specified
as a string containing:
  - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
    (e.g. 'lines+markers')
    OR exactly one of ['none'] (e.g. 'none')

Intro to RSS

Hello!

I think there's an error with one of the example formulas in the "Sigma Notation" lesson. In the section on 'Calculating and representing total error,' the RSS that illustrates the difference between the y and ŷ of the given points shows the wrong inputs for y₀. Currently, the full formula shows (apologies for the lack of formatting):

RSS=(0−0)2+(150−250)2+(600−400)2+(700−700)2

I believe the first parenthesis should be updated to (100-100)2. Please let me know if I'm missing something!

Thanks!
Jim Jacisin

evaluation box at bottom is like out of nowhere

it's just there

the line_function_trace function is broken

it's not working because the mode is writting as mode = 'line',
plotly doesnt accept 'line', it accepts 'lines'

Evaluating regression lines

The code for the m_b_trace function that is imported in from a previous lesson is incorrect.

The m_b_trace code for that lesson is as follows:
def m_b_trace(m, b, x_values, mode = 'line', name = 'line function'):
values = m_b_data(m, b, x_values)
values.update({'mode': mode, 'name': name})
return values

Running this raises the following error:

ValueError:
Invalid value of type 'builtins.str' received for the 'mode' property of scatter
Received value: 'line'

The 'mode' property is a flaglist and may be specified
as a string containing:
  - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
    (e.g. 'lines+markers')
    OR exactly one of ['none'] (e.g. 'none')

This can be fixed by altering the code to the following:
def m_b_trace(m, b, x_values, mode = 'lines', name = 'line function'):
values = m_b_data(m, b, x_values)
values.update({'mode': mode, 'name': name})
return values

But this needs to be done manually in the evaluating regression lines lesson. The code for m_b_data function also needs to be manually input rather than imported as the lesson suggests.

ValueError from the provided code

When shift+entering the example coding, I received a value error.

ValueError:
Invalid value of type 'builtins.str' received for the 'mode' property of scatter
Received value: 'line'
The 'mode' property is a flaglist and may be specified
as a string containing:
  - Any combination of ['lines', 'markers', 'text'] joined with '+' characters
    (e.g. 'lines+markers')
    OR exactly one of ['none'] (e.g. 'none')

It came from trying to plot the regression line. This was the code provided.

from graph import m_b_trace, plot, trace_values from plotly.offline import iplot, init_notebook_mode init_notebook_mode(connected=True) data_trace = trace_values(x_values, y_values) regression_trace = m_b_trace(regression_line['m'], regression_line['b'], x_values) plot([regression_trace, data_trace])

Thanks for any help looking into this matter.

learn-co-curriculum / evaluating-regression-lines Goto Github PK

evaluating-regression-lines's Introduction

Evaluating regression lines

Learning Objectives

Introduction

Determining Quality

An initial regression line

Assessing the regression line

Refining our Terms

Expected values

Actual values

Calculating and representing total error

Root Mean Squared Error

Taking the mean

Taking the square root

Summary

evaluating-regression-lines's People

Contributors

Stargazers

Watchers

Forkers

evaluating-regression-lines's Issues

Recommend Projects

Recommend Topics

Recommend Org