RT Journal Article SR Electronic T1 Cohort profile for development of machine learning models to predict healthcare-related adverse events (Demeter): clinical objectives, data requirements for modelling and overview of data set for 2016–2018 JF BMJ Open JO BMJ Open FD British Medical Journal Publishing Group SP e070929 DO 10.1136/bmjopen-2022-070929 VO 13 IS 8 A1 Artemova, Svetlana A1 von Schenck, Ursula A1 Fa, Rui A1 Stoessel, Daniel A1 Nowparast Rostami, Hadiseh A1 Madiot, Pierre-Ephrem A1 Januel, Jean-Marie A1 Pagonis, Daniel A1 Landelle, Caroline A1 Gallouche, Meghann A1 Cancé, Christophe A1 Olive, Frederic A1 Moreau-Gaudry, Alexandre A1 Prieur, Sigurd A1 Bosson, Jean-Luc YR 2023 UL http://bmjopen.bmj.com/content/13/8/e070929.abstract AB Purpose In-hospital health-related adverse events (HAEs) are a major concern for hospitals worldwide. In high-income countries, approximately 1 in 10 patients experience HAEs associated with their hospital stay. Estimating the risk of an HAE at the individual patient level as accurately as possible is one of the first steps towards improving patient outcomes. Risk assessment can enable healthcare providers to target resources to patients in greatest need through adaptations in processes and procedures. Electronic health data facilitates the application of machine-learning methods for risk analysis. We aim, first to reveal correlations between HAE occurrence and patients’ characteristics and/or the procedures they undergo during their hospitalisation, and second, to build models that allow the early identification of patients at an elevated risk of HAE.Participants 143 865 adult patients hospitalised at Grenoble Alpes University Hospital (France) between 1 January 2016 and 31 December 2018.Findings to date In this set-up phase of the project, we describe the preconditions for big data analysis using machine-learning methods. We present an overview of the retrospective de-identified multisource data for a 2-year period extracted from the hospital’s Clinical Data Warehouse, along with social determinants of health data from the National Institute of Statistics and Economic Studies, to be used in machine learning (artificial intelligence) training and validation. No supplementary information or evaluation on the part of medical staff will be required by the information system for risk assessment.Future plans We are using this data set to develop predictive models for several general HAEs including secondary intensive care admission, prolonged hospital stay, 7-day and 30-day re-hospitalisation, nosocomial bacterial infection, hospital-acquired venous thromboembolism, and in-hospital mortality.No data are available. The data in this cohort data set are not currently publicly available as they contain the complete individual medical files of all patients included in the data set, albeit de-identified, so for ethical and data privacy reasons they cannot be exported to others. Nevertheless, procedures regarding the conditions and accessing of selected de-identified data for academic use by third parties (with a signed data access agreement) are currently in discussion. Requests should be made to Grenoble-Alpes University Hospital (contact Professor Jean-Luc Bosson: jlbosson@chu-grenoble.fr).