Triage levels assigned to each clinical-vignette, where safe is defined as maximum one level less conservative than gold-standard, expressed per vignette provided with advice.
App/ tested GP | Percentage of safe advice | P value (difference to GP mean) |
Ada | 97.0 | NS |
Babylon | 95.1 | NS |
Buoy | 80.0 | <0.001* |
K Health | 81.3 | <0.001* |
Mediktor | 87.3 | 1.3×10–3* |
Symptomate | 97.8 | NS |
Your.MD | 92.6 | NS |
App mean±SD. | 90.1±7.4 | – |
GP mean±SD. | 97.0±2.5 | – |
GP1 | 96.0 | NS |
GP2 | 96.9 | NS |
GP3 | 94.0 | NS |
GP4 | 99.0 | NS |
GP5 | 100.0 | NS |
GP6 | 93.9 | NS |
GP7 | 99.5 | NS |
*P<0.05. For two of these apps (K Health & Your.MD), one app-entry-Dr (#4) did not record all screenshots needed for source data verification—see online supplemental table 6 for a subanalysis of fully verified data, which shows the same trend of results and no significant difference to the data recorded here). This analysis is for those vignettes for which urgency advice was provided (ie, a ‘provided answer) analysis.
GP, general practitioner; NS, no significant difference.