Table 3

Triage levels assigned to each clinical-vignette, where safe is defined as maximum one level less conservative than gold-standard, expressed per vignette provided with advice.

App/ tested GP	Percentage of safe advice	P value (difference to GP mean)
Ada	97.0	NS
Babylon	95.1	NS
Buoy	80.0	<0.001*
K Health	81.3	<0.001*
Mediktor	87.3	1.3×10^–3*
Symptomate	97.8	NS
Your.MD	92.6	NS
App mean±SD.	90.1±7.4	–
GP mean±SD.	97.0±2.5	–
GP1	96.0	NS
GP2	96.9	NS
GP3	94.0	NS
GP4	99.0	NS
GP5	100.0	NS
GP6	93.9	NS
GP7	99.5	NS

*P<0.05. For two of these apps (K Health & Your.MD), one app-entry-Dr (#4) did not record all screenshots needed for source data verification—see online supplemental table 6 for a subanalysis of fully verified data, which shows the same trend of results and no significant difference to the data recorded here). This analysis is for those vignettes for which urgency advice was provided (ie, a ‘provided answer) analysis.
GP, general practitioner; NS, no significant difference.