Table 2

Metrics used in comparison of condition-suggestion accuracy

Abbrev.	Full name	Description
M1 (%)	M1 (Matching-1) accuracy	% of cases where the top-1 condition-suggestion matches the gold-standard main diagnosis.7
M3 (%)	M3 (Matching-3) accuracy	% of cases where the top-3 condition-suggestions contain the gold-standard main diagnosis.7
M5 (%)	M5 (Matching-5) accuracy	% of cases where the top-5 condition-suggestions contain the gold-standard main diagnosis
COMP (%)	Comprehensiveness	Ratio of the (number of gold standard differentials matched by the suggested differentials) to the (number of gold standard differentials for the vignette), expressed as a mean across all vignettes.13
RELE (%)	Relevance	Ratio of the (number of the suggested differentials that match with any of the gold standard differentials for the vignette) to the(number of differentials provided by the tested-GP or the symptom assessment app for the vignette), expressed as a mean across all vignettes.13