5. Write the equation of the line. This equation can now be used to predict information that was not plotted in the scatter plot. For example, you can use the equation to find the total calories based upon 22 grams of fat.
Different people may choose different points and arrive at different equations. All of these equations are "correct", but which one is actually the "best"? To determine the actual "best" fit, we will use a graphing calculator. Graphing Calculator Solution:
Visualization and understanding with python One of my favorite and niche chart is scatterplot! If we are in the field of Data Science and have a vast range of statistical analyses to perform, then scatterplot is our friendly one. Scatterplots are extremely useful to focus on the relationship between two numeric, quantitative series, and a common one in both technical and non-technical fields. What is a scatterplot? A scatterplot shows the relationship between quantitative variables using the X and Y-axis. These plots are often used to understand data than to communicate with. Unlike Line plots, Scatterplots show dots to focus on individual data points. Scatterplots are best used to: Read a ScatterplotWhile using a scatterplot, we have to use data wisely for our audience. We may need to break the data to explain how to read it. 1. Scanning of each axis: When data contains multiple variables it may difficult for our audience to determine which variable represents which axis. 2. Visualise section wise: We can create sections by grouping the points into quadrants. This is an important aspect to look at the natural breaks and groupings exist. This will help us to make sense of the comparison. 3. To identify the Shape: While plotting it’s better to summarize the individual points into a unified shape. Some questions we have to ask ourselves like: In the above notebook, we are using Dataset to understand the height (y-axis) depending on the weight(x-axis) of a team of school students. Best Fit Line The line of best fit or best-fit line(“trend” line), is a straight line that may pass through the center of the data points, none of the points, or all of the points.on the scatterplot. As we know that the equation of a straight line is : where m is the slope of the line and b is the y-intercept m = ( ((mean(x)*mean(y))- mean(x*y))/ b =b = mean(y)-m*mean(x) Denominator is denom=X.dot(X) — X.mean()*X.sum() The above green line passes through the data points is called the Best fit line of the data points. Conclusion: While analyzing a process, the line through the data points may be controversial. The implementing line may generate confusion if the trend of underlying data is ambiguous. |