What is scatter plot in statistics?

A 3D scatter plot allows the visualization of multivariate data. This scatter plot takes multiple scalar variables and uses them for different axes in phase space. The different variables are combined to form coordinates in the phase space and they are displayed using glyphs and coloured using another scalar variable.

A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. If the points are coded (color/shape/size), one additional variable can be displayed. The data are displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.

Overview[edit]

A scatter plot can be used either when one continuous variable is under the control of the experimenter and the other depends on it or when both continuous variables are independent. If a parameter exists that is systematically incremented and/or decremented by the other, it is called the control parameter or independent variable and is customarily plotted along the horizontal axis. The measured or dependent variable is customarily plotted along the vertical axis. If no dependent variable exists, either type of variable can be plotted on either axis and a scatter plot will illustrate only the degree of correlation (not causation) between two variables.

A scatter plot can suggest various kinds of correlations between variables with a certain confidence interval. For example, weight and height would be on the y-axis, and height would be on the x-axis. Correlations may be positive (rising), negative (falling), or null (uncorrelated). If the dots' pattern slopes from lower left to upper right, it indicates a positive correlation between the variables being studied. If the pattern of dots slopes from upper left to lower right, it indicates a negative correlation. A line of best fit (alternatively called 'trendline') can be drawn to study the relationship between the variables. An equation for the correlation between the variables can be determined by established best-fit procedures. For a linear correlation, the best-fit procedure is known as linear regression and is guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is guaranteed to generate a correct solution for arbitrary relationships. A scatter plot is also very useful when we wish to see how two comparable data sets agree to show nonlinear relationships between variables. The ability to do this can be enhanced by adding a smooth line such as LOESS. Furthermore, if the data are represented by a mixture model of simple relationships, these relationships will be visually evident as superimposed patterns.

The scatter diagram is one of the seven basic tools of quality control.

Scatter charts can be built in the form of bubble, marker, or/and line charts.

Example[edit]

For example, to display a link between a person's lung capacity, and how long that person could hold their breath, a researcher would choose a group of people to study, then measure each one's lung capacity (first variable) and how long that person could hold their breath (second variable). The researcher would then plot the data in a scatter plot, assigning "lung capacity" to the horizontal axis, and "time holding breath" to the vertical axis.

A person with a lung capacity of 400 cl who held their breath for 21.7 s would be represented by a single dot on the scatter plot at the point (400, 21.7) in the Cartesian coordinates. The scatter plot of all the people in the study would enable the researcher to obtain a visual comparison of the two variables in the data set and will help to determine what kind of relationship there might be between the two variables.

For a set of data variables (dimensions) X1, X2, ... , Xk, the scatter plot matrix shows all the pairwise scatter plots of the variables on a single view with multiple scatterplots in a matrix format. For k variables, the scatterplot matrix will contain k rows and k columns. A plot located on the intersection of row and jth column is a plot of variables Xi versus Xj. This means that each row and column is one dimension, and each cell plots a scatter plot of two dimensions.

A generalized scatter plot matrix offers a range of displays of paired combinations of categorical and quantitative variables. A mosaic plot, fluctuation diagram, or faceted bar chart may be used to display two categorical variables. Other plots are used for one categorical and one quantitative variables.

Before we take up the discussion of linear regression and correlation, we need to examine a way to display the relation between two variables x and y. The most common and easiest way is a scatter plot. The following example illustrates a scatter plot.

Example

In Europe and Asia, m-commerce is popular. M-commerce users have special mobile phones that work like electronic wallets as well as provide phone and Internet services. Users can do everything from paying for parking to buying a TV set or soda from a machine to banking to checking sports scores on the Internet. For the years 2000 through 2004, was there a relationship between the year and the number of m-commerce users? Construct a scatter plot. Let x = the year and let y = the number of m-commerce users, in millions.

(year)(# of users)20000.5200220.0200333.0200447.0

Table showing the number of m-commerce users (in millions) by year.

What is scatter plot in statistics?

Scatter plot showing the number of m-commerce users (in millions) by year.

Creating a Scatter Plot

  1. Enter your X data into list L1 and your Y data into list L2.
  2. Press 2nd STATPLOT ENTER to use Plot 1. On the input screen for PLOT 1, highlight On and press ENTER. (Make sure the other plots are OFF.)
  3. For TYPE: highlight the very first icon, which is the scatter plot, and press ENTER.
  4. For Xlist:, enter L1 ENTER and for Ylist: L2 ENTER.
  5. For Mark: it does not matter which symbol you highlight, but the square is the easiest to see. Press ENTER.
  6. Make sure there are no other equations that could be plotted. Press Y = and clear any equations out.
  7. Press the ZOOM key and then the number 9 (for menu item “ZoomStat”) ; the calculator will fit the window to the data. You can press WINDOW to see the scaling of the axes.

try it

Amelia plays basketball for her high school. She wants to improve to play at the college level. She notices that the number of points she scores in a game goes up in response to the number of hours she practices her jump shot each week. She records the following data:

X (hours practicing jump shot)Y (points scored in a game)515722928103111331236

Construct a scatter plot and state if what Amelia thinks appears to be true.

What is scatter plot in statistics?

Yes, Amelia’s assumption appears to be correct. The number of points Amelia scores per game goes up when she practices her jump shot more.


A scatter plot shows the direction of a relationship between the variables. A clear direction happens when there is either: High values of one variable occurring with high values of the other variable or low values of one variable occurring with low values of the other variable. High values of one variable occurring with low values of the other variable.

You can determine the strength of the relationship by looking at the scatter plot and seeing how close the points are to a line, a power function, an exponential function, or to some other type of function. For a linear relationship there is an exception. Consider a scatter plot where all the points fall on a horizontal line providing a “perfect fit.” The horizontal line would in fact show no relationship.

When you look at a scatterplot, you want to notice the overall pattern and any deviations from the pattern. The following scatterplot examples illustrate these concepts.

What is scatter plot in statistics?
What is scatter plot in statistics?
What is scatter plot in statistics?

In this chapter, we are interested in scatter plots that show a linear pattern. Linear patterns are quite common. The linear relationship is strong if the points are close to a straight line, except in the case of a horizontal line where there is no relationship. If we think that the points show a linear relationship, we would like to draw a line on the scatter plot. This line can be calculated through a process called linear regression. However, we only calculate a regression line if one of the variables helps to explain or predict the other variable. If x is the independent variable and y the dependent variable, then we can use a regression line to predicty for a given value of x

Concept Review

Scatter plots are particularly helpful graphs when we want to see if there is a linear relationship among data points. They indicate both the direction of the relationship between the x variables and the y variables, and the strength of the relationship. We calculate the strength of the relationship between an independent variable and a dependent variable using linear regression.

What is scatter plot and example?

Scatter Plots. A Scatter (XY) Plot has points that show the relationship between two sets of data. In this example, each dot shows one person's weight versus their height. (The data is plotted on the graph as "Cartesian (x,y) Coordinates")

What are the 3 types of scatter plots?

There are three types of scatter plots or charts: U-shaped, linear and exponential. These are the three most important ones: positive, negative, or no correlation.

What is a scatter plot on a graph?

A scatter chart, also called a scatter plot, is a chart that shows the relationship between two variables. They are an incredibly powerful chart type, allowing viewers to immediately understand a relationship or trend, which would be impossible to see in almost any other form.

What is a scatter plot also called?

Summary. A scatter plot is a chart type that is normally used to observe and visually display the relationship between variables. It is also known as a scattergram, scatter graph, or scatter chart.