What is the best fit line in a scatter plot?

Paper and Pencil Solution:

See the bottom of this page for how to also do "my choice" line of best fit on the calculator.

1.  Prepare a scatter plot of the data on graph paper.

2.  Using a strand of spaghetti, position the spaghetti so that the plotted points are as close to the strand as possible. Bibs, the cat, is assisting with the spaghetti alignment for this example.

What is the best fit line in a scatter plot?

3.  Find two points that you think will be on the "best-fit" line.  Perhaps you chose the points
(9, 260) and (30, 530).  Different people may choose different points. 

4.  Calculate the slope of the line through your two points (rounded to three decimal places).        

            

What is the best fit line in a scatter plot?

5.  Write the equation of the line.  This equation can now be used to predict information that was not plotted in the scatter plot.  For example, you can use the equation to find the total calories based upon 22 grams of fat.          

Equation:

What is the best fit line in a scatter plot?

or  y = 12.857x + 144.287

 

Prediction based on 22 grams of fat:

What is the best fit line in a scatter plot?

Different people may choose different points and arrive at different equations.  All of these equations are "correct", but which one is actually the "best"?  To determine the actual "best" fit, we will use a graphing calculator.

Graphing Calculator Solution:

1.  Enter the data into Lists and Spreadsheets. Column A will be labeled "fat" (for the fat grams) and column B will be labeled "cal" (for the calories).
See Lists and Spreadsheets for how to enter data.

What is the best fit line in a scatter plot?

2.  Graph the scatter plot. From

What is the best fit line in a scatter plot?
, choose #5 Data and Statistics. Hit ENTER.
Using the Nav Pad, arrow to the bottom of the screen and choose the x-variable to be "fat". Then move to the left side of the screen and choose the y-variable to be "cal". You will now see your scatter plot.

What is the best fit line in a scatter plot?

What is the best fit line in a scatter plot?

3. NOW, have the calculator determine the BEST "line of best fit".
A "line of best fit" is referred to as a linear regression.
From

What is the best fit line in a scatter plot?
, choose #3 Actions, #5 Regression, and #1Show Linear (mx + b). The graph will appear with the "line of best fit" equation (the linear regression equation).

What is the best fit line in a scatter plot?

What is the best fit line in a scatter plot?


What is the best fit line in a scatter plot?

Note: You can "grab" any of the points and move them. The equation will change accordingly, as will the

spreadsheet entries.

Important Note:

If your calculator is set to FLOAT, this linear regression equation will be rounded to the nearest integer.

If you calculator is set to FLOAT 6 (the default), this linear regression equation will contain 6 decimal places which may contain inaccurate trailing zeros.

Graphing calculator "line of best fit": y = 11.7313x + 193.852
(using default settings on calculator - minus the trailing zeros)

Visualization and understanding with python

One of my favorite and niche chart is scatterplot! If we are in the field of Data Science and have a vast range of statistical analyses to perform, then scatterplot is our friendly one. Scatterplots are extremely useful to focus on the relationship between two numeric, quantitative series, and a common one in both technical and non-technical fields.

What is a scatterplot?

A scatterplot shows the relationship between quantitative variables using the X and Y-axis. These plots are often used to understand data than to communicate with. Unlike Line plots, Scatterplots show dots to focus on individual data points. Scatterplots are best used to:
1.Unveil any patterns
2. Find the relationship between two sets of data

Read a Scatterplot

While using a scatterplot, we have to use data wisely for our audience. We may need to break the data to explain how to read it.

1. Scanning of each axis: When data contains multiple variables it may difficult for our audience to determine which variable represents which axis.

2. Visualise section wise: We can create sections by grouping the points into quadrants. This is an important aspect to look at the natural breaks and groupings exist. This will help us to make sense of the comparison.

3. To identify the Shape: While plotting it’s better to summarize the individual points into a unified shape. Some questions we have to ask ourselves like:
i) Are all the dots are moving in the same direction?
ii) Is it like an exponential curve?
iii)Do the dots are increasing with my eyes along the axis?
Now, we shall try to explore the patterns of weight to height ratio from a database using Python, Pandas, and Jupyter Notebook to understand Scatterplot visually.

What is the best fit line in a scatter plot?

In the above notebook, we are using Dataset to understand the height (y-axis) depending on the weight(x-axis) of a team of school students.

What is the best fit line in a scatter plot?

Best Fit Line

The line of best fit or best-fit line(“trend” line), is a straight line that may pass through the center of the data points, none of the points, or all of the points.on the scatterplot.

As we know that the equation of a straight line is :

where m is the slope of the line and b is the y-intercept
we already have our X and y values, so now we need to calculate m and b. The formulas for these can be written as:

m = ( ((mean(x)*mean(y))- mean(x*y))/
((mean(x)*mean(x))- mean(x*x)))

b =b = mean(y)-m*mean(x)

Denominator is

denom=X.dot(X) — X.mean()*X.sum()

What is the best fit line in a scatter plot?

The above green line passes through the data points is called the Best fit line of the data points.

Conclusion: While analyzing a process, the line through the data points may be controversial. The implementing line may generate confusion if the trend of underlying data is ambiguous.