In this class, we will practice administering one of the classic questionnaires in usability, the SUS (System Usability Scale).
Questionnaires like the SUS and TLX are widely used in assessing how users perceive a user interface. Questionnaires are useful in gaining numerical information from a medium to large group of users quickly. They can be particularly useful in comparing different interactive systems, situations or user types as you can use statistical techniques to assess differences between samples.
In today’s tutorial, you will do a mock survey with a user interface and the SUS. You will collect data together with others in the class and calculate descriptive statistics, generate plots, and perform significance testing using Python.
NOTE: Bring your computer to class!
In this class, you will:
The tutor will bring up the pre-class responses on the big screen and lead you in a discussion. Some questions might be:
Your tutor will allocate a technology for you to evaluate, along with a task to complete using that technology. For example:
Your tutor will also give you: - A participant identifier (participant ID) - A paper copy of the SUS questionnaire
In pairs: 1. Take turns acting as the user and the researcher. 2. As the researcher: - Welcome the user to the study. - Ask for their consent to participate. - Give the task instructions (e.g., “Please complete the task, then fill in the SUS questionnaire to rate your experience”). 3. Make sure the user: - Completes the task. - Records their answers on the SUS questionnaire. 4. Ensure the participant ID is written on the completed questionnaire — you will need this for the data entry step.
Your tutor will provide a shared spreadsheet for the
whole class to enter results.
This will allow us to compare SUS scores across groups to see which
technology had better or worse usability.
Individually:
Go to Google Colaboratory and start a New Notebook: https://colab.google/
In Colab, drag the class data spreadsheet file into the Files pane.
In a new code cell, load your data into a DataFrame:
import pandas as pd, numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# --- Load ---
df = pd.read_csv("sus_dummy_two_groups.csv") # replace with your file name
SUS = [f"SUS{i}" for i in range(1, 11)]
df # show the DataFrameRecode the positively worded SUS items (items 1, 3, 5, 7, and 9) by subtracting 1 from each response, so that their values range from 0 (“Strongly Disagree”) to 4 (“Strongly Agree”):
POS = ["SUS1","SUS3","SUS5","SUS7","SUS9"]
df[POS] = df[POS] - 1Reverse code the negatively worded items (for
the SUS, these are the even-numbered items: 2, 4, 6, 8, 10).
This makes the scale consistent so that higher numbers always indicate
better usability.
NEG = ["SUS2","SUS4","SUS6","SUS8","SUS10"]
df[NEG] = 5 - df[NEG]Calculate the SUS score for each
participant
Make sure all 10 items have been recoded to the 0–4 scale before this
step. We’ll remove any rows with missing items, then sum the items
(0–40) and scale to 0–100.
# Remove rows with missing SUS items
df = df.dropna(subset=SUS)
# Sum (0–40) and scale to 0–100
df["SUS_score"] = df[SUS].sum(axis=1) * 2.5
# (Optional) quick check of the results
print(df["SUS_score"].describe())Get the descriptive statistics
Find the minimum, maximum, mean, and standard deviation of the SUS
scores for each group.
# --- Descriptive statistics ---
print("\nDescriptive stats by group:")
print(df.groupby("group")["SUS_score"].describe().round(2))Plot a histogram of your data
Look at the shape of the distribution for each group.
Is the data evenly spread, skewed, or clustered?
# --- Histogram ---
df["SUS_score"].hist(by=df["group"], bins=10, edgecolor="black", layout=(1, 2))
plt.suptitle("Distribution of SUS Scores by Group")
plt.show()Create a boxplot
Compare the median, quartiles, and range of SUS scores for each
group.
Look for any outliers (points that sit far from the rest of the
data).
# --- Boxplot ---
df.boxplot(column="SUS_score", by="group")
plt.title("SUS Scores by Group")
plt.suptitle("")
plt.ylabel("SUS (0–100)")
plt.show()Compare the findings
Use Welch’s t-test to check whether there is a statistically significant
difference in SUS scores between the two groups.
The output will also show which group had the higher average
score.
Interpretation guide:
p < 0.05: The difference is considered
statistically significant (unlikely due to chance).p >= 0.05: The difference is not
statistically significant (could be due to random variation).# --- Between-groups comparison (Welch's t-test) ---
groups = [g["SUS_score"].dropna().values for _, g in df.groupby("group")]
if len(groups) == 2:
g1, g2 = groups
group_names = list(df["group"].unique())
# Welch's t-test
t = stats.ttest_ind(g1, g2, equal_var=False)
# Means for each group
mean_g1, mean_g2 = np.mean(g1), np.mean(g2)
print(f"Welch's t-test: t = {t.statistic:.2f}, p = {t.pvalue:.3f}")
print(f"Mean SUS for {group_names[0]}: {mean_g1:.2f}")
print(f"Mean SUS for {group_names[1]}: {mean_g2:.2f}")
# Interpret significance
if t.pvalue < 0.05:
print("Result: Statistically significant difference (p < 0.05).")
else:
print("Result: No statistically significant difference (p >= 0.05).")
# Which group scored higher
if mean_g1 > mean_g2:
print(f"{group_names[0]} had higher usability scores.")
elif mean_g2 > mean_g1:
print(f"{group_names[1]} had higher usability scores.")
else:
print("Both groups had the same average score.")
else:
print("Need exactly two groups for comparison.")Summarise your findings in plain language
After running the t-test and checking your descriptive statistics, write
a short summary that anyone could understand.
Post your summary in the class thread!
Example reporting template:
The mean SUS score for Group 1 (Technology A) was 82.3, which falls in the “Excellent” range.
The mean SUS score for Group 2 (Technology B) was 71.5, which falls in the “Good” range.
A Welch's t-test found that the difference was statistically significant (t(24) = 2.30, p = 0.03), indicating that Technology A was rated as significantly more usable for the given task.
This suggests that, for this context and task, Technology A may offer a better user experience than Technology B.Your tutor will lead you in a discussion about what you learned from using the SUS and the analysis process.
This activity is a quick introduction to using the
System Usability Scale (SUS) and basic statistical comparison in
Python.
In real usability studies, statistical tests have specific
conditions and assumptions that must be checked before deciding
which analysis is appropriate.
For example:
We have skipped these detailed checks in this exercise to focus on learning the mechanics of:
In practice, you should: