Businesses collect data. If your business is like most businesses you have a lot of data, and only a small proportion of the data is used in a meaningful way. However it is not uncommon to find that the data required for a Business Improvement Project does not exist.
How much data should we collect? As a rule of thumb 30 data points is a good starting point, however there are tools that allow you to calculate sample sizes easily. The sample size is also determined by the data type; continuous data (data measured on a continuous scale i.e.: weight, speed etc) requires smaller sample sizes than attribute data. Attribute data is data that has a quality characteristic (or attribute) that meets or does not meet product specification.
Suppose you want to investigate the quality of a bag of M&Ms. You could rate each M&M for defects such as chips in the candy coating, legibility of the “m” printed on each candy, or flat spot.
Attributes are often evaluated as either pass or fail, or are compared with visual standards.
Once we have collected the data we need to start analysing the data. There are two types of analysis we usually complete. Graphical Analysis and a Statistical Analysis. There are many graphical tools that can be used. The type of tool to be used is determined by the type of data you have and the groups of data you are wanting to compare.
Graphical Analysis
We always start with graphical analysis. Its is important to ‘see’ what is happening in the data rather than relying on statistical analysis and numbers to tell us. When you have two sets of data with the same average and the same standard deviation, the numbers tell us the deviations are the same but they are very different when you look at the data on a time series graph.
5 Steps to complete a graphical analysis
1. Review the data performance over time. We like to start with a ‘Time Series Chart’. The Time Series chart allows us to understand how the data behaves over time. This is where Statistical Process Control Charts (SPC) are particularly useful. SPC charts allow us to differentiate between common cause and special cause variation. Common Cause variation is normally fluctuations that occur in the process. However special cause variation is variation that indicate something ‘special’ or different has occurred in our process.
For instance, a production line has an operator that is not feeling well and is replaced, and the new operator is less experienced which affects the quality of the process output, then we will see a deviation in the output. If this deviation is not expected to happen then there is little chance the cause of deviation could be detected. By using SPC charts we have a good chance of detecting that deviation.
We want to be looking for deviations and then investigating why the deviations occurred. Note that steps three and four below are not used for every data set. The steps are only used if the data types are applicable for that analysis.
2. Review the data’s Average and Standard Deviation. Next we would look at the data distribution by completing a Histogram and calculating the mean and standard deviation. Usually we are using Excel or Minitab to determine this. Usually we are comparing either two data sets or a single data set and a target. By comparing the means and standard deviation we can determine how different our data sets are.
3. Determine Differences: Pareto charts are other commonly used chart techniques we use to determine differences. We are assuming that with Pareto charts wide acceptance and use that readers understand the Pareto Chart. Pareto Charts are not the only tool that can be used to determine differences.
Box plots are non-numeric (or non-parametric): they display variation in samples of a data without making any assumptions of the underlying statistical distribution. The spacings between the different parts of the box indicate the degree of dispersion (spread) and skewness in the data, and show outliers. This is one of our favourite graphical tools. We are looking to find differences in our data sets. For instance: we might be looking at monthly sales data for 3 different regions, Asia, North America and Europe. Differences can be used to guide our investigation searching out reasons for the distributions. Note: Box plots can be created for an individual variable or for variables by group (sales region in the example above).
4.Determine Relationships: Scatter Plots are used to investigate relationships. Scatter Plots are used when both the input and output variable are continuous in nature. The data is displayed as a collection of points, each having the value of one variable determining the position on the horizontal axis and the value of the other variable ,determining the position on the vertical axis. If a relationship exists we would complete a correlation analysis or better still a regression analysis.
5. Determine Hypotheses: Use the findings from the Differences and Relationships graphical analysis, we now develop a Hypothesis Test Plan that will analyse to determine Hypothesis (or theories) about what is causing relationships or differences in your data. Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. Examples of typical hypothesis (or theories) could be Sales are higher in Region A than region B, Shift B produces more than shift A, The variation in paint thickness is greater using spray gun A than Spray Gun B, Expenditure During January to June is lower than July to December.
Statistical Analysis is an area that many students find difficult, however the theory is not as complicated as people anticipate and the practical application is made easier with the software programs we have available today. We like to use Minitab.
If you apply the five steps outlined in this article you will have a better understanding what is happening in your process. It might force you to ask more questions, thereby increasing your knowledge and getting you closer to understanding what are the critical levers driving performance in the process.