# Using Regression Analysis in market research

When measuring the health of customer relationships, three metrics are at the core of most studies: customer satisfaction, customer loyalty (likelihood of choosing supplier at next purchase) and customer advocacy (likelihood of recommending supplier to others).

However, these metrics alone are not enough. They provide a snapshot of customer health but don’t in and of themselves reveal how to improve the position. Two approaches can take our understanding to that next level.

One option is to ask customers directly why they are or aren’t satisfied, loyal or advocates. This can be revealing but often people struggle to provide accurate guidance on their motivations:

- They may never have contemplated their motivations, giving superficial responses
- They may find their motivations hard to articulate
- They may give undue weight to ‘rational’ factors such as price, especially in B2B markets

So rather than asking customers directly, an alternative approach is to apply a statistical method called Regression Analysis to *deduce* what really matters.

**Regression Analysis explained**

Regression Analysis comes in a variety of ‘flavours’ each best suited for a particular situation, e.g. Linear Regression, Stepwise Regression, Ridge Regression. Regardless of the flavour though, ‘variables’ – things that can vary or change – are always at the core. More specifically:

- The ‘dependent’ variable is the thing we’re interested in moving, e.g. customer satisfaction score or Net Promoter Score (NPS)
- ‘Independent’ variables are things that we think might drive a change in the dependent variable, e.g. we could hypothesise that high quality customer service leads to high levels of overall satisfaction

Regression Analysis looks for relationships between these variables. To do so it ‘freezes’ all independent variables bar one and then identifies the impact a change in this one variable has on the dependent variable. This is then repeated for each independent variable in turn. The result is that we’re able to identify the power of each independent variable in moving the dependent variable.

**Interpreting the Regression Analysis output**

You could run this analysis yourself using software such as Excel or SPSS, or you might choose to use a professional statistician. Either way, you’ll need to interpret the output and four numbers are especially important here.

The first two numbers relate to the regression model itself:

- Is the model really telling us anything? The F-value measures the statistical significance of the model. Typically an F-value with a significance less than 0.05 is considered statistically meaningful and therefore we can be confident that the outputs from the analysis are not due to chance alone
- How accurate is the model? The R-Squared (or the Adjusted R-Squared) shows how much of the movement in the dependent variable is explained by the independent variables. For example, an R-Squared value of 0.8 means that 80% of the movement in the dependent variable can be explained by the independent variables tested. That means it would be highly predictive and could be said to be accurate

The other two critical numbers when interpreting a Regression Analysis relate to each of the independent variables:

- Does the variable really matter? Like the F-value, the P-value is a measure of statistical significance, but this time it indicates if the effect of the independent variable (rather than the model as a whole) is statistically significant. Again, a value lower than 0.05 is what you’re looking for
- How much impact does the variable have? If multiple independent variables have been tested (as is often the case), the coefficient tells you how much the dependent variable is expected to increase by when the independent variable under consideration increases by one and all other independent variables are held at the same value. Sometimes the co-efficient is replaced with a standardised co-efficient which shows the relative contribution of each independent variable in moving the dependent variable

**Regression Analysis in market research – an example**

So that’s an overview of the theory. Let’s now take a look at Regression Analysis in action using a real-life example.

Our goal in this study for a supplier of business software was to advise them on how to improve levels of customer satisfaction. To do so, we first conducted a series of in-depth interviews with delighted, content and dis-satisfied customers to identify all the things which could potentially influence levels of satisfaction. We complemented this with some internal workshops with customer facing staff to tap into their beliefs about what makes customers happy.

Using these insights as a basis we then created a structured survey which, amongst other things, asked 350 customers to rate their satisfaction in three respects using a 1 – 10 scale:

- Overall satisfaction with the supplier
- Satisfaction in regard to four high-level factors – product quality, consultancy on product use, technical support and quality of the relationship
- Satisfaction in regard to various sub-areas within these high-level factors, e.g. we broke technical support down into things like speed of response, expertise of the call handler, attitude of the call handler and ease of solving the issue

We first wanted to test a critical assumption – does customer satisfaction actually matter? After all, in many markets customers will remain loyal even if unhappy because the cost or effort of change is too high relative to the benefit (see here for further discussion of this). To establish this, we ran a simple correlation analysis between overall satisfaction and claimed loyalty. This resulted in a correlation co-efficient (R) of 0.79 which suggests that there is indeed a positive relationship between the two (as a rule of thumb, a correlation of between 0.5 and 0.7 suggests a strong relationship and anything above 0.7 suggests a very strong relationship).

Confident that improving overall levels of customer satisfaction would most likely yield commercial benefits, we then needed to understand how to achieve this. Here enters Regression Analysis. Using ‘overall satisfaction’ as the dependent variable and the four high-level factors as independent variables, we first sought identify where the broad focus should be.

Before interpreting the output of our analysis, we needed to establish if the model was reliable and accurate. It passed with flying colours on both counts:

- The F-value was 0.00000000004. Anything under 0.05 is significant so this result shows that the model is highly reliable
- The Adjusted R-Squared was 0.87. Again, that gives confidence as it means that the model explains 87% of the movement in overall satisfaction

Happy that the model was reliable and accurate, we then turned to what it told us. Let’s take a look at how the four high level factors turned out:

High-level factor |
Co-efficient |
P-value |

Satisfaction with product | 0.46 | <0.05 |

Satisfaction with relationship | 0.20 | <0.05 |

Satisfaction with consultancy | 0.20 | <0.05 |

Satisfaction with technical support | 0.09 | <0.05 |

We can see that all of the factors have some impact on overall satisfaction and the P-values (all under 0.05) show that this finding is significant in a statistical sense. It’s also clear that ensuring satisfaction with the product itself is absolutely critical – for every 1-point increase in satisfaction with the product on our 1 – 10 scale, overall satisfaction increases by almost half of one point (0.46). Contrast this with technical support where the same 1-point increase only delivers a 0.09 boost in overall satisfaction – around a fifth less than a 1-point increase in product satisfaction would deliver.

We then ran a second Regression Analysis to identify how specifically to realise this gain. What areas of the product should we focus on to increase overall satisfaction?

Once again the first check was to make sure the generated model was accurate and reliable. With an F-value well under 0.05 and an Adjusted R-Squared of 0.9 it was. The outputs for the six product factors tested were as follows:

Product factor |
Co-efficient |
P-value |

Reliability | 0.32 | <0.05 |

Functionality | 0.22 | <0.05 |

Value for money | 0.17 | <0.05 |

Ease of use | 0.10 | <0.05 |

Ease of integration | 0.09 | <0.05 |

At this point we know an awful lot. We know that:

- The more satisfied a customer is, the more likely they are to remain a customer
- Satisfaction with the product itself is most powerful in driving overall satisfaction
- Satisfaction with the product is in turn driven by its reliability, functionality and value

We now need to look at one more thing – are there actually low levels of satisfaction in these areas and, if so, where is remedial action most needed? To establish this, we can plot the importance (as measured by the co-efficient) of the high-level factors and the sub-factors against the satisfaction of customers in these areas.

This exercise shows the value of looking at customer satisfaction in the context of what matters most. After all, whilst it would be ideal to excel in every single area, in real-life limited budgets and resources mean that investment needs to be prioritised.

If we’d simply measured satisfaction in the four high-level areas, the conclusion would be to focus on technical support as this is a clear area of weakness. However, having complemented this understanding with a Regression Analysis we can see that the investment should really be in improving product quality as this is far more influential in driving customer satisfaction (which in turn is linked to loyalty and therefore commercial success). Likewise, investments in improving product quality should focus on enhancing reliability even though ease of integration is poor.

Read more about our approach to business-to-business (B2B) customer satisfaction surveys.