*Result*: Artificial Intelligence Models for Predicting Triage in Emergency Departments: Seven-Month Retrospective Comparative Study of Natural Language Processing, Large Language Model, and Joint Embedding Predictive Architectures.

Title:
Artificial Intelligence Models for Predicting Triage in Emergency Departments: Seven-Month Retrospective Comparative Study of Natural Language Processing, Large Language Model, and Joint Embedding Predictive Architectures.
Authors:
Lansiaux E; Emergency Department, Lille University Hospital, Lille, France., Azzouz R; Centre Antipoison, Lille University Hospital, Lille, France.; ULR 2694-METRICS, Lille University, Lille, France., Chazard E; ULR 2694-METRICS, Lille University, Lille, France.; Department of Public Health, EA 2694, Lille University, Lille, France., Vromant A; Emergency Department, Hôpital Pitié-Salpêtrière, Assistance Publique des Hôpitaux de Paris, Paris, France., Wiel E; Emergency Department, Lille University Hospital, Lille, France.; ULR 2694-METRICS, Lille University, Lille, France.
Source:
JMIR medical informatics [JMIR Med Inform] 2026 Mar 10; Vol. 14, pp. e83318. Date of Electronic Publication: 2026 Mar 10.
Publication Type:
Journal Article; Comparative Study
Language:
English
Journal Info:
Publisher: JMIR Publications Country of Publication: Canada NLM ID: 101645109 Publication Model: Electronic Cited Medium: Internet ISSN: 2291-9694 (Electronic) Linking ISSN: 22919694 NLM ISO Abbreviation: JMIR Med Inform Subsets: MEDLINE
Imprint Name(s):
Original Publication: Toronto : JMIR Publications, [2013]-
Contributed Indexing:
Keywords: FRENCH scale; Joint Embedding Predictive Architecture; artificial intelligence; clinical decision support; emergency department; large language model; natural language processing; triage
Entry Date(s):
Date Created: 20260310 Date Completed: 20260311 Latest Revision: 20260310
Update Code:
20260311
DOI:
10.2196/83318
PMID:
41805589
Database:
MEDLINE

*Further Information*

*Background: Triage errors in emergency departments (EDs), including undertriage and overtriage, pose significant risks to patient safety and resource allocation. With increasing patient volumes and staffing challenges, artificial intelligence (AI) integration into triage protocols has gained attention as a potential solution.
Objective: This study aims to develop and compare 3 AI models-natural language processing (NLP), large language model (LLM), and Joint Embedding Predictive Architecture (JEPA)-for predicting triage outcomes according to the French Emergency Nurses Classification in Hospital (FRENCH) scale and to assess their performance relative to nurse triage and clinical expert consensus.
Methods: We conducted a retrospective analysis of prospectively collected data from adult patients triaged at Roger Salengro Hospital ED (Lille, France) over 7 months (June-December 2024). Three AI models were developed: TRIAGEMASTER (NLP with Doc2Vec + MLP), URGENTIAPARSE (LLM with FlauBERT + Extreme Gradient Boosting [XGBoost]), and EMERGINET (JEPA with variance-invariance-covariance regularization). Of 73,236 ED visits, 657 (0.90%) had complete audio recordings and structured data. Data were split 80:20 into training and validation sets with stratification. Gold-standard labels were established by senior clinician consensus (minimum 5 years of ED experience). The primary outcome was concordance with the gold-standard FRENCH triage level, assessed using weighted κ, Spearman correlation, F1-score, area under the receiver operating characteristic (AUC-ROC) curve, mean absolute error (MAE), and root mean square error (RMSE). Secondary analyses evaluated Groupes d'Etude Multicentrique des Services d'Accueil (GEMSA) prediction and performance by input data type.
Results: URGENTIAPARSE demonstrated superior performance, with a composite z score of 2.514 compared with EMERGINET (0.438), TRIAGEMASTER (-3.511), and nurse triage (-4.343). URGENTIAPARSE achieved an F1-score of 0.900 (95% CI 0.876-0.924), an AUC-ROC of 0.879 (95% CI 0.851-0.907), a weighted κ of 0.800 (P<.001), a Spearman correlation of 0.802 (P<.001), an MAE of 0.228, and an RMSE of 0.790. Exact agreement was 90.0%, with near-agreement (+1 or -1 level) of 92.8%. However, training showed perfect accuracy (1.0) with poor validation performance (~0.5), indicating overfitting. EMERGINET achieved moderate performance (F1-score=0.731, AUC 0.686), while TRIAGEMASTER and nurse triage performed poorly (F1-score=0.618 and 0.303, respectively). For GEMSA prediction, URGENTIAPARSE maintained superiority (κ=0.863, Spearman=0.864, P<.001). Class 1 (highest acuity) was underrepresented (4/657, 0.61%), limiting undertriage risk assessment.
Conclusions: The LLM-based architecture (URGENTIAPARSE) demonstrated the highest accuracy for ED triage prediction among the tested models, outperforming traditional NLP, JEPA, and current nurse triage practices. However, severe overfitting, extreme selection bias (657/73,236, 0.90%, inclusion), a monocentric design, and sparse high-acuity representation limit clinical applicability. Before deployment, the model requires regularization, external validation across diverse EDs, prospective testing, and comprehensive safety evaluation, particularly for undertriage detection. Integration of AI triage support systems shows promise but demands rigorous validation, bias mitigation, and transparent uncertainty quantification to ensure patient safety.
(©Edouard Lansiaux, Ramy Azzouz, Emmanuel Chazard, Amélie Vromant, Eric Wiel. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 10.03.2026.)*