Research · Feasibility Study

Multi-Source School-Health Signal Integration for Early Detection of Pediatric Mental-Health Deterioration

A synthetic-data feasibility study

12,000 synthetic students · four signal streams, one of them real-time · every claim cited to real literature

Abstract

School nurses, billing systems, and attendance offices each see a different facet of a child in distress — recurrent stomachaches 1Ref 1Shannon RA, Bergren MD, Matthews A (2010).Frequent visitors: Somatization in school-age children and implications for school nurses.Children with medically unexplained recurrent somatic complaints (headache, stomachache) are disproportionate users of school-health resources; these complaints are linked to anxiety, depression, adverse childhood experiences, and school stress., a behavioral-health referral that never converts to a visit 2Ref 2Grupp-Phelan J, Delgado SV, Kelleher KJ (2007).Failure of psychiatric referrals from the pediatric emergency department.Only ~11% of children screening positive for mental-health problems completed psychiatric follow-up — referral non-completion is a large, measurable gap, even when access barriers are removed., a creeping pattern of absence 3Ref 3Viner RM, Pearce A, Hope S (2026).The impact of school absence on mental health in children and young people: Analysis of an English national birth cohort.Persistent absence (>10% of the school year) associated with roughly doubled odds of later mental-health problems (Millennium Cohort; OR ~2.0-2.26 across ages 7/11/14).. Anyone holding those records could fuse them. The signal almost no one holds is the language of the encounter itself, captured in near-real-time as it is spoken4Ref 4Tierney AA, Gayre G, Hoberman B, Mattern B, Ballesca M, Kipnis P, et al (2024).Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation.Establishes the mechanism our approach depends on: ambient AI scribes transcribe patient-clinician conversations in real time at scale (millions of encounters at Kaiser Permanente), making encounter language capturable; the study evaluates documentation burden, not distress scoring.. On a fully synthetic cohort of 12,000 students calibrated to national prevalence 5Ref 5Centers for Disease Control and Prevention (2024).Data and Statistics on Children's Mental Health (National Survey of Children's Health, 2022-2023).US base rates, ages 3-17: anxiety 11%, behavior/conduct 8%, depression 4%; ~21% ever diagnosed with a mental/emotional/behavioral condition. Underlying instrument: NSCH (CAHMI/HRSA)., we fuse the three record streams with a fourth — a per-encounter distress-language score — and ask what the real-time stream adds. The answer is most of the warning. The records anyone could assemble are weak and late on their own (records-only AUROC 0.59, median lead just 15d). The real-time encounter signal lifts mean lead time by 38.84 days (95% CI [35.28, 42.45], paired p = 1.7e-63) and 60-day AUROC from 0.52 to 0.74. The full model reaches AUROC 0.84 — squarely in the band of real EHR crisis-prediction6Ref 6Su C, Aseltine R, Doshi R, Chen K, Rogers SC, Wang F (2020).Machine learning for suicide risk prediction in children and adolescents with electronic health records.Longitudinal EHR (41,721 patients aged 10-18); models reaching AUROC 0.81-0.86 across prediction windows, detecting 53-62% of positive subjects at 90% specificity. THE realism tether for our planted-signal strength. — and detects 77% of crises a median of 60 days out. The encounter is also the only stream that exposes localized contagion: a community event seen as a school-level surge before the cluster of crises that follows, invisible to records-only. The records give little; the encounter — and fusing it for coverage — is the result.

The gap

A signal that exists, scattered across systems nobody joins

Youth mental-health emergencies have climbed steeply — emergency-department visits for mental-health reasons among young people roughly doubled as a share of all visits over the last decade, and fewer than one in five of those visits were seen by a mental-health professional 7Ref 7Bommersbach TJ, McKean AJ, Olfson M, Rhee TG (2023).National trends in mental health-related emergency department visits among youth, 2011-2020.Youth mental-health ED visits rose from 7.7% to 13.1% of all visits; suicide-related visits ~5x; fewer than 20% were evaluated by a mental-health professional.. Many children return in crisis within months 8Ref 8Cushing AM, Liberman DB, Pham PK, et al. (2023).Mental health revisits at US pediatric emergency departments.13.2% of pediatric mental-health ED patients revisit within 6 months; MH ED visits grew 8.0%/yr vs 1.5%/yr for others — supports escalation/recurrence framing and the value of earlier detection.. The tragedy is that the months before a crisis are rarely silent: the child is often already visible to the adults around them, just not to any single system that could act.

Foundation-model healthcare has moved quickly — Claude for Healthcare launched in January 2026 as regulated workflow software 9Ref 9Anthropic (2026).Claude for Healthcare.Claude for Healthcare launched January 11, 2026 (HIPAA-ready; CMS Coverage Database / ICD-10 / NPI / PubMed integrations; prior-authorization review skill) — positioned as regulated workflow software, not a chatbot., building on an industry interoperability pledge 10Ref 10Anthropic (2025).Anthropic signs CMS Health Tech Ecosystem pledge to advance healthcare interoperability.Anthropic signed the CMS Health Tech Ecosystem pledge (July 30, 2025), positioning the Model Context Protocol as the interoperability bridge. — yet the pediatric, behavioral, school-health corner of the map remains the province of academic centers and point solutions. The bottleneck is not models; it is a feedback signal. This study is about one such signal that has gone unmeasured: the pre-crisis trajectory written across a school's own records.

The four streams

Somatization, the referral gap, withdrawal — and the words themselves

Nurse visits. Children who somatize — recurrent headache, abdominal pain, fatigue — are disproportionate visitors to the school health office, and those complaints track anxiety and depression1Ref 1Shannon RA, Bergren MD, Matthews A (2010).Frequent visitors: Somatization in school-age children and implications for school nurses.Children with medically unexplained recurrent somatic complaints (headache, stomachache) are disproportionate users of school-health resources; these complaints are linked to anxiety, depression, adverse childhood experiences, and school stress., prospectively, years ahead 11Ref 11Shanahan L, Zucker N, Copeland WE, Costello EJ, Angold A (2015).Childhood somatic complaints predict generalized anxiety and depressive disorders during young adulthood in a community sample.Childhood somatic complaints prospectively predict adult generalized anxiety and depressive disorders (Great Smoky Mountains Study) — somatic markers as early, not merely concurrent, signals.. The complaint a nurse codes as R51 may be the earliest written trace of distress 12Ref 12Egger HL, Costello EJ, Erkanli A, Angold A (1999).Somatic complaints and psychopathology in children and adolescents: Stomach aches, musculoskeletal pains, and headaches.Maps which somatic complaints associate with which disorders (stomachaches/headaches ↔ anxiety in girls; musculoskeletal pain ↔ depression) — justifies the R-series symptom selection..

Billing & referrals. When a system does notice, it refers — but the referral often never converts to a completed visit. In one pediatric-ED study only about one in nine children who screened positive completed psychiatric follow-up 2Ref 2Grupp-Phelan J, Delgado SV, Kelleher KJ (2007).Failure of psychiatric referrals from the pediatric emergency department.Only ~11% of children screening positive for mental-health problems completed psychiatric follow-up — referral non-completion is a large, measurable gap, even when access barriers are removed.. The kind of gap matters: after a psychiatric hospitalization, youth who missed a 7-day follow-up carried roughly double the suicide risk over the next six months, and the highest-risk youth were the least likely to attend13Ref 13Brent DA, Goldstein TR, Benton TD (2020).Bridging gaps in follow-up appointments after hospitalization and youth suicide.Among 139,694 Medicaid youth after psychiatric hospitalization, attending a 7-day follow-up appointment was associated with about HALF the suicide risk over the next 6 months vs non-attenders, and the highest-risk youth were least likely to attend — justifying a small, non-zero predictive value for a referral/care-completion-gap signal anchored to acute post-discharge continuity. 14Ref 14Hugunin J, Davis M, Larkin C, Baek J, Skehan B, Lapane KL (2022).Established outpatient care and follow-up after acute psychiatric service use among youth and young adults.Lacking established outpatient care strongly predicted failure to obtain timely follow-up after a youth psychiatric hospitalization (aOR up to 2.81) or ED visit (aOR up to 4.06); only 28.6-42.7% received 7-day follow-up — evidence that care-continuity gaps are a measurable risk marker.. We keep this the weakest of the signals and anchor it to that acute post-discharge continuity, because generic therapy non-completion does not robustly predict worse outcomes 15Ref 15O'Keeffe S, Martin P, Target M, Midgley N (2019).Prognostic implications for adolescents with depression who drop out of psychological treatment.CAVEAT: found insufficient evidence that adolescents who dropped out of depression psychotherapy had worse outcomes than completers — so generic therapy non-completion does NOT robustly predict harm, and our referral-gap signal is kept small and anchored to acute/post-discharge continuity rather than routine dropout..

Attendance. Chronic absenteeism — missing at least 10% of school days, the federal threshold 16Ref 16U.S. Department of Education (2024).Chronic Absenteeism.Chronic absenteeism = missing at least 10% of school days (~18 days/year), excused or unexcused — the ESSA / Dept. of Education threshold (NOT a CDC definition). — is associated with roughly doubled odds of later mental-health problems in a national birth cohort 3Ref 3Viner RM, Pearce A, Hope S (2026).The impact of school absence on mental health in children and young people: Analysis of an English national birth cohort.Persistent absence (>10% of the school year) associated with roughly doubled odds of later mental-health problems (Millennium Cohort; OR ~2.0-2.26 across ages 7/11/14)., and internalizing symptoms predict the severity of that absence 17Ref 17Fornander MJ, Kearney CA (2020).Internalizing symptoms as predictors of school absenteeism severity at multiple levels: Ensemble and classification and regression tree analysis.Internalizing symptoms and somatic complaints predict school-absenteeism severity — empirical grounding for the planted somatic→attendance temporal ordering.. Withdrawal is often the earliest, quietest of the record streams.

Encounter language. The first three streams are records — anyone who held them could join them. The fourth is different: the language a child uses in the visit itself. Linguistic markers track internalizing distress with surprising specificity — elevated absolutist words in anxiety, depression, and suicidal-ideation text 18Ref 18Al-Mosaiwi M, Johnstone T (2018).In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation.Grounds our use of absolutist words as the most-severe tier of the distress-language score, since their elevated use was shown to be a marker specific to anxiety, depression, and suicidal-ideation forums (online forum text, not clinical encounters)., heightened first-person-singular focus in the depression-vulnerable 19Ref 19Rude SS, Gortner EM, Pennebaker JW (2004).Language use of depressed and depression-vulnerable college students.Supports the self-focus/withdrawal markers in our distress-language score by showing that currently- and formerly-depressed college students used more first-person-singular pronouns in writing samples (student essays, not clinical encounters). — and everyday language can predict a depression diagnosis in the months before it is recorded 20Ref 20Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preotiuc-Pietro D, et al (2018).Facebook language predicts depression in medical records.Supports the premise that everyday language predicts a forthcoming depression diagnosis, with accuracy rising in the months before onset, while noting this is population-level prediction from social-media (Facebook) posts rather than clinical-encounter language or a deployed tool.. In the clinic, NLP over unstructured notes surfaces adolescent risk that structured codes miss 21Ref 21Carson NJ, Mullin B, Sanchez MJ, Lu F, Yang K, Menezes M, et al (2019).Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records.Anchors the claim that NLP over unstructured adolescent clinical notes surfaces behavioral-health risk beyond structured codes, by recovering note phrases associated with past-year suicide attempt in hospitalized adolescents (outcome is suicidal behavior, in a small inpatient sample)., and the affective tone of a note carries prognostic signal of its own 22Ref 22McCoy TH Jr, Castro VM, Roberson AM, Snapper LA, Perlis RH (2016).Improving Prediction of Suicide and Accidental Death After Discharge From General Hospitals With Natural Language Processing.Supports scoring affective/sentiment language in clinical notes as a risk signal, by showing positive-valence language in narrative discharge notes was associated with ~30% lower post-discharge suicide risk (adult general-hospital discharges, not adolescent telehealth encounters).. What makes this newly usable at scale is ambient documentation: AI scribes now transcribe the patient–clinician conversation in real time across millions of encounters 4Ref 4Tierney AA, Gayre G, Hoberman B, Mattern B, Ballesca M, Kipnis P, et al (2024).Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation.Establishes the mechanism our approach depends on: ambient AI scribes transcribe patient-clinician conversations in real time at scale (millions of encounters at Kaiser Permanente), making encounter language capturable; the study evaluates documentation burden, not distress scoring.. We distill each encounter into a single distress-language score (a severity-ordered marker, neutral → somatic → withdrawal → hopelessness → absolutist) that rises earliest of the four — and that no records-only system can reconstruct after the fact.

And the encounter is more than the child's words: it carries the clinician's own read of them. A provider's brief mood-and-safety assessment is itself an expert label— and targeted, adaptive clinical screens flag which youth actually need a mental-health intervention far more accurately (AUROC ~0.87) than a blunt, retrospective ACE score (~0.58) 23Ref 23King CA, Brent D, Grupp-Phelan J, et al (2021).Prospective development and validation of the Computerized Adaptive Screen for Suicidal Youth.An adaptive, clinician-/tablet-administered screen (CASSY) predicts youth suicide attempts within ~3 months at AUC ~0.87-0.89 — flagging who needs a mental-health intervention far more accurately than a blunt, retrospective ACE score (individual-level prediction AUC ~0.58). Grounds the clinician's encounter assessment as a targeted predictive 'label'.. The Lead-Time Telescope below surfaces that pairing — the child's language andthe clinician's labeling note — as you scrub the timeline.

The data

Synthetic by design, calibrated to reality

No real student appears anywhere in this study. We generate 12,000 synthetic students over three academic years, following the open-source Synthea idiom for synthetic EHRs24Ref 24Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S (2018).Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record.The open-source synthetic-patient generator (Apache-2.0) whose Generic Module Framework idioms and FHIR/CSV output our generator follows; an authored GMF module ships as a credibility artifact. and validating against the fidelity / utility / privacy framework of EHR-Safe 25Ref 25Yoon J, Mizrahi M, Ghalaty NF, et al. (2023).EHR-Safe: Generating high-fidelity and privacy-preserving synthetic electronic health records.The three-axis validation framework we adopt — fidelity (KS / distribution), utility (train-on-synthetic/test-on-real), privacy (membership-inference, re-identification, attribute-inference).. Diagnosed-condition rates are calibrated to the National Survey of Children's Health 26Ref 26U.S. Census Bureau / HRSA MCHB (2024).National Survey of Children's Health (NSCH) — public-use datasets.The nationally representative survey whose 2022-23 weighted prevalence estimates calibrate the synthetic cohort's marginals. and match within 0.53 percentage points (anxiety 11%, depression 4%, conduct 8%, ADHD 11%) 5Ref 5Centers for Disease Control and Prevention (2024).Data and Statistics on Children's Mental Health (National Survey of Children's Health, 2022-2023).US base rates, ages 3-17: anxiety 11%, behavior/conduct 8%, depression 4%; ~21% ever diagnosed with a mental/emotional/behavioral condition. Underlying instrument: NSCH (CAHMI/HRSA).. Crucially, crisis is driven by a deterioration process that surfaces in the streams — not by demographics: a classifier given only age, sex, income proxy, and prior diagnoses predicts crisis at AUROC 0.50 (chance). Each encounter also yields a synthetic distress-language score — a lagged child of the same latent, never of the label — and a subset of schools carry community events that sweep correlated clustersof crises, the structure we test for below. To keep the encounter honest, some students simply “talk distressed” at baseline without heading to crisis, so the language stream is a confounded signal, not a giveaway. The signal lives in behavior over time, which is exactly the thesis.

What is assumed versus cited. The direction of each stream — that it precedes deterioration — is grounded in real prospective evidence: somatic complaints years ahead11Ref 11Shanahan L, Zucker N, Copeland WE, Costello EJ, Angold A (2015).Childhood somatic complaints predict generalized anxiety and depressive disorders during young adulthood in a community sample.Childhood somatic complaints prospectively predict adult generalized anxiety and depressive disorders (Great Smoky Mountains Study) — somatic markers as early, not merely concurrent, signals., chronic absence at roughly double the later-MH odds 3Ref 3Viner RM, Pearce A, Hope S (2026).The impact of school absence on mental health in children and young people: Analysis of an English national birth cohort.Persistent absence (>10% of the school year) associated with roughly doubled odds of later mental-health problems (Millennium Cohort; OR ~2.0-2.26 across ages 7/11/14)., everyday language months before a recorded diagnosis 20Ref 20Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preotiuc-Pietro D, et al (2018).Facebook language predicts depression in medical records.Supports the premise that everyday language predicts a forthcoming depression diagnosis, with accuracy rising in the months before onset, while noting this is population-level prediction from social-media (Facebook) posts rather than clinical-encounter language or a deployed tool.. The specific lead-time offsets — how many days before the crisis each stream turns on — are synthetic design parameters, not measured values; no literature establishes a per-day pre-crisis onset for these signals, and we do not present these day-counts as findings. We make the conservative choice deliberately: the record streams are modeled as late and weak — they deviate only weeks before the crisis, so we never claim early warning we cannot cite — while the encounter is modeled as the early signal, on the strength of the months-ahead text-prediction evidence and the fact that it is a direct, real-time report rather than a lagging administrative trace. Every internal evaluation parameter (the training landmarks, the alert threshold) is a methodological choice, not a clinical number.

Method

Lead time, not hindsight

Every arm of the analysis uses the same recipe; only the input streams change. Models are scored out-of-fold with patient-level splits, on features drawn from strictly trailing windows, at landmarks pooled across several horizons before each (real or onset-matched) event — so a late stream (the records, which deviate only weeks out) and an early one (the encounter) are both learnable in one model 27Ref 27Do D-K, Rockenschaub P, Boie SD, Kumpf O, Volk H-D, Balzer F, von Dincklage F, Lichtner G (2026).The impact of evaluation strategy on sepsis prediction model performance metrics in intensive care data: Retrospective cohort study.Evaluation-strategy choices (continuous-windowed vs fixed-horizon, onset-matching, silencing window) move early-warning metrics more than the model does — the basis for our continuous + onset-matched + silenced evaluation. (Domain: ICU sepsis; cited for methodology.). We audit against the timing / association / aggregation leakage taxonomy 28Ref 28Davis SE, Matheny ME, Balu S, Sendak MP (2023).A framework for understanding label leakage in machine learning for health care.The timing / association / aggregation label-leakage taxonomy our feature construction and patient-level split are audited against. — the crisis code itself can never enter a feature — and a shift-invariance unit test gates every release. The hero metric is the distribution of lead time at a fixed false-alarm operating point 29Ref 29Gupta A, Chauhan R, Saravanan G, Shreekumar A (2024).Improving sepsis prediction in intensive care with SepsisAI: A clinical decision support system with a focus on minimizing false alarms.Template for reporting lead-time as a distribution and selecting a false-alarm-controlled operating point (warning vs alert tiers).ADULT ICU study (Beth Israel + Emory). Cited ONLY for its reporting methodology — lead-time as a distribution plus a false-alarm-controlled operating point via multi-warning windowing. NOT cited as pediatric evidence.; we count a lead only when the alert is sustained as the event approaches, so a weak or noisy stream cannot earn a phantom long lead from a single spurious threshold crossing. Fidelity metrics follow the synthetic-data literature 30Ref 30van der Schaar Lab (2023).synthcity: a library for generating and evaluating synthetic tabular data.Reference taxonomy for synthetic-data metrics (KS / Jensen-Shannon / MMD fidelity, distinguishability detection-AUC, TSTR utility, privacy, survival-KM distance)..

Results

Records give weeks; the encounter gives months

15d

Records-only lead

weak & late · AUROC 0.59

+38.84d

Real-time encounter lift

vs. records-only · 95% CI [35.28, 42.45]

60d

Full-model lead

detects 77% of crises

0.84

Full AUROC @ 30 days

in the real EHR-prediction band

On its own, each record stream sees little. Alone, attendance, nurse visits, and the referral gap each detect only a minority of crises, and only a couple of weeks ahead (Table 1); a records-only fusion — everything anyone holding the records could build — lands at AUROC 0.59 with a median of just 15 days of warning and 41% of crises caught. That is the ceiling for the records. The real-time encounter changes the picture: alone it reaches AUROC 0.83 and a median 75 days of lead. The full model — encounter plus records — detects 77% of crises (vs 41% for records-only) at a median 60 days: statistically tied with the encounter on lead while catching materially more children — a coverage gain of +9 percentage points over the best single stream (95% CI [6, 12] pp). A Shapley decomposition of lead time names the division of labor: the encounter contributes essentially all of it (Shapley 37.41d), while the record streams contribute coverage — the children who never present in language — not lead.

Figure 1. The encounter owns the lead; records are weak and late. Reverse Kaplan–Meier curves: the fraction of crises already detected as a function of how many days before the crisis the alert fires. The full model (violet) and the encounter alone (blue) detect crises far earlier than the records-only fusion (dashed grey) or any single record stream, which stay low and near the event. Held-out, onset-matched, fixed false-alarm operating point; lead counted only while the alert is sustained.

Signal sources	Median lead	Detection	AUROC@30	@60	@90
Encounter	75d	67%	0.832	0.762	0.657
Attendance	15d	34%	0.579	0.508	0.520
Nurse	15d	18%	0.510	0.517	0.523
Billing	30d	21%	0.517	0.503	0.501
Records only (no encounter)· records-only	15d	41%	0.590	0.516	0.527
Full fusion (+ encounter)· full fusion	60d	77%	0.837	0.742	0.655

Held-out, patient-level, onset-matched. Lead time and detection at a fixed 90%-specificity operating point. The records-only fusion (what any health system could build) and the full fusion (with Gale's real-time encounter) are shaded; the gap between them is the encounter's contribution.

The table is the summary; the story is in the trajectory. Below, scrub a single synthetic student and watch the full model's risk climb toward a crisis it has not yet seen — months ahead, driven by the real-time encounter. The default student presents only in the encounter: the child a records-only model never sees at all.

← earlier in the school record2024-04-30crisis / present day →

Fused model — estimated risk over time12.0% · crisis day

Encounter — distress language (real-time)0.6

Attendance — absences (trailing 30d) · quiet for this student0.0

Nurse — somatic visits (trailing 30d) · quiet for this student0.0

Billing — referrals (cumulative) · quiet for this student0.0

Encounter · school counselor · 2024-04-30distress 0.59 · withdrawal

Patient

“Everybody keeps saying I seem different. I don't laugh at stuff anymore. Things that used to be funny just aren't.”

Behavioral-health provider · School Counselor, LPC

Hana, 15, peer-observed change in affect with self-reported loss of capacity for enjoyment. Teachers note reduced participation and a 'muted' presence. Affect flat in session. Normalizing a mood screen as routine care; offered to sit together with the PHQ-A and gently asked whether the heaviness has ever made her wish she wasn't around.

Illustrative & synthetic — no real encounter exists. The provider's note is the expert label Gale uniquely captures — increasingly a therapist'sread, as Gale's care is mostly behavioral health. The model reads a per-encounter distress-language score, not these words.

The dark vertical line is the scrubber — drag it (or anywhere on any chart, or the slider below) and all five charts move together. The red line is the crisis; the solid black line marks the model's first sustained alert — this child would have been flagged 77 days before the crisis. Each encounter below shows the child's words andthe provider's labeling note — often a therapist's, since Gale's care is mostly behavioral health. Synthetic student; held-out model score.

Figure 2. The Lead-Time Telescope. One synthetic student at a time. The violet line is the held-out full-model risk, the dashed line the alert threshold, the red line the crisis. The four lanes are the underlying streams — note how a child may present in only one, and the encounter-language readout shows what the encounter stream is reading. The default student presents only in the encounter: a records-only model never sees them.

Signal streams:

Full fusion (+ encounter)

77%

crises caught

60d

median lead

AUROC 0.837

All four streams — the full system. The encounter sets the lead; the records add the coverage.

A counter-intuitive but central point: adding streams does not buy earlier warning — the encounter already owns the lead, because the record streams are late. What they buy is coverage: toggle each one on and the curve lifts toward the dashed full-fusion line as more children — the ones who don't present in language — get caught (watch crises caught, not the lead). Turn the encounter off and you land on the records-only model: a weak, late warning. (Held-out, onset-matched evaluation, recomputed live per subset.)

Figure 3. Re-run the ablation yourself. Toggle the signal streams off and watch the lead-time curve fall. Turn off the encounter and you land on the records-only model any health system could build — weak and late; the gap to the dashed full-fusion line is exactly what Gale's real-time stream adds. The same held-out, onset-matched evaluation behind Figure 1, recomputed live per subset.

The real-time advantage

Anyone can fuse the records. Only the visit yields the language.

The three record streams are, in principle, joinable by anyone who holds them — a health system, a district, a state. The encounter is not: it exists only at the moment of care, and only if someone is capturing it. That is the structural advantage. Holding the records-only model fixed and adding the real-time distress-language score lifts mean lead time by 38.84 days (95% CI [35.28, 42.45], paired p = 1.7e-63) and raises 60-day AUROC from 0.52 to 0.74. It is also the earliest stream — the largest single share of the lead-time gain (Shapley 37.41d) — because language shifts before a referral is written or an absence accumulates.

A calibration note, in the interest of the scrutiny this study invites: the real-world evidence for clinical text is that it adds incremental value over structured data 21Ref 21Carson NJ, Mullin B, Sanchez MJ, Lu F, Yang K, Menezes M, et al (2019).Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records.Anchors the claim that NLP over unstructured adolescent clinical notes surfaces behavioral-health risk beyond structured codes, by recovering note phrases associated with past-year suicide attempt in hospitalized adolescents (outcome is suicidal behavior, in a small inpatient sample). 22Ref 22McCoy TH Jr, Castro VM, Roberson AM, Snapper LA, Perlis RH (2016).Improving Prediction of Suicide and Accidental Death After Discharge From General Hospitals With Natural Language Processing.Supports scoring affective/sentiment language in clinical notes as a risk signal, by showing positive-valence language in narrative discharge notes was associated with ~30% lower post-discharge suicide risk (adult general-hospital discharges, not adolescent telehealth encounters)., not the large lift our synthetic encounter signal shows. We plant a deliberately strong real-time signal to make the architecture legible; the defensible real-world claim is narrower — that encounter language is earlier, richer, and uniquely real-time, not that it dominates the record. The mechanism that makes it newly capturable at population scale — ambient AI documentation4Ref 4Tierney AA, Gayre G, Hoberman B, Mattern B, Ballesca M, Kipnis P, et al (2024).Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation.Establishes the mechanism our approach depends on: ambient AI scribes transcribe patient-clinician conversations in real time at scale (millions of encounters at Kaiser Permanente), making encounter language capturable; the study evaluates documentation burden, not distress scoring. — is, however, already real.

Localized contagion

A community event, seen as a school-level surge

Adolescent distress is not independent across children. Suicide and self-harm cluster in space and time31Ref 31Haw C, Hawton K, Niedzwiedz C, Platt S (2013).Suicide clusters: a review of risk factors and mechanisms.Anchors that point (spatiotemporal) and mass suicide clusters are a real phenomenon, disproportionately affecting adolescents/young adults, justifying our model of localized correlated crisis clusters., spread through peer ties and suggestion 32Ref 32Abrutyn S, Mueller AS (2014).Are Suicidal Behaviors Contagious in Adolescence? Using Longitudinal Data to Examine Suicide Suggestion.Anchors the social-contagion assumption behind community spread by showing, in longitudinal adolescent data, that exposure to a role model's suicide attempt triggers new suicidal thoughts/attempts, with effects fading over time and stronger in girls., and respond to shared external events 33Ref 33Niederkrotenthaler T, Voracek M, Herberth A, Till B, Strauss M, Etzersdorfer E, et al (2010).Role of media reports in completed and prevented suicide: Werther v. Papageno effects.Supports that a shared external event can transiently shift suicide risk across a population (the Werther/Papageno media-contagion finding), an analogue to the school-event-driven distress surge we model (mechanism is media exposure, not peer ties). — which is why communities are advised to watch for and respond to clusters as they emerge 34Ref 34Ivey-Stephenson AZ, Ballesteros MF, Trinh E, Stone DM, Crosby AE (2024).CDC Guidance for Community Response to Suicide Clusters, United States, 2024.Supports the operational framing that communities (including schools) detect and respond to suicide clusters - groups of suicides/attempts closer in time and space than expected - the real-world counterpart to our school-level surge detector.. We model this directly: a fraction of schools experience a community event that transiently lifts distress across their students, sweeping a correlated cluster of crises. Detecting that surge early is a spatiotemporal-scan problem 35Ref 35Kulldorff M (1997).A spatial scan statistic.Provides the established spatiotemporal cluster-detection methodology (the spatial scan statistic, basis of SaTScan) that our school-level real-time surge scan operationalizes for detecting localized crisis clusters. — and it is exactly the kind of signal that lives in the real-time stream, not the records.

Full model (with encounter) Records-only model

21d

median surge lead (full)

88%

event clusters detected (full)

detected (records-only)

One synthetic school with a community event; the curve is its weekly risk index. Across all 8 event schools, the encounter-inclusive model detected the emerging cluster a median of 21 days before the first crisis, while the records-only model — flat, dashed — detected 0% of them. The community surge lives in the real-time language, not the records.

Figure 4. Catching the cluster before it crests. The weekly school-level risk index for one synthetic school with a community event. The full model (blue) surges as the event propagates through encounter language; the records-only model (dashed) stays flat. Across all 8 event schools, the full model detected the emerging cluster a median of 21 days before the first crisis; records-only detected none.

The operational point is that one alert is addressable to two audiences at once: the school (a counseling surge, a postvention plan) and the health system (clinicians prepared for what is coming through the door). A records-only model cannot raise it — by the time absences and referrals accumulate into a detectable cluster, the crises have already happened. The language moves first.

Discussion

A pre-crisis signal that does not exist in any model's training set

The reason this matters beyond one school district is the shape of the data. Clinical outcomes are slow, delayed, and siloed; the feedback loop that made coding models improve so quickly — write code, run the tests, learn — has no obvious analogue in medicine. But a school's own encounters contain a dense, high-frequency, pre-crisis trajectory: visits, referrals, and attendance — and, captured in real time, the language of the encounter itself — observed weekly, with a downstream outcome. That is a feedback signal of a kind the current healthcare-AI stack does not have. This study is a feasibility argument that the signal is real, learnable, and — when fused — early.

(Framing note: the coding-feedback analogy above is the authors' synthesis, not a claim attributed to any company.)

Ethics & limitations

What a model like this could get wrong

A system that watches children's nurse visits, absences, and discipline records is not neutral. It risks surveillance, labeling, and false-positive stigma, and it can amplify existing inequities — in who gets an absence excused, who gets referred, who gets disciplined. At the chosen operating point the model still raises roughly 4.7alerts per true crisis caught; those are real children a counselor would review. We report this number, and the model's mild over-confidence (calibration slope 0.81), rather than hide them — communicating uncertainty in numbers, not just words, does not erode trust 36Ref 36van der Bles AM, van der Linden S, Freeman ALJ, Mitchell J, Galvao AB, Zaval L, Spiegelhalter DJ (2019).Communicating uncertainty about facts, numbers and science.Numeric uncertainty ranges do not substantially erode trust, whereas verbal-only uncertainty does — the basis for our words-AND-numbers risk communication.. Most fundamentally: this is a feasibility demonstration on synthetic data. It is not a medical device, it makes no claim of clinical validity, and it should route any real decision to a human, never act on a child automatically.

Of 100 students the model would flag at this operating point, about 21 are truly heading toward a crisis; the rest are false alarms a human must review. We show this, rather than hide it.

References

Every number above, sourced

1.Shannon RA, Bergren MD, Matthews A (2010). Frequent visitors: Somatization in school-age children and implications for school nurses. The Journal of School Nursing, 26(3): 169-182. doi:10.1177/1059840509356777 ✓Children with medically unexplained recurrent somatic complaints (headache, stomachache) are disproportionate users of school-health resources; these complaints are linked to anxiety, depression, adverse childhood experiences, and school stress.
2.Grupp-Phelan J, Delgado SV, Kelleher KJ (2007). Failure of psychiatric referrals from the pediatric emergency department. BMC Emergency Medicine, 7: 12. doi:10.1186/1471-227X-7-12 ✓Only ~11% of children screening positive for mental-health problems completed psychiatric follow-up — referral non-completion is a large, measurable gap, even when access barriers are removed.
3.Viner RM, Pearce A, Hope S (2026). The impact of school absence on mental health in children and young people: Analysis of an English national birth cohort. PLOS ONE, 20. doi:10.1371/journal.pone.0336137 ✓Persistent absence (>10% of the school year) associated with roughly doubled odds of later mental-health problems (Millennium Cohort; OR ~2.0-2.26 across ages 7/11/14).
4.Tierney AA, Gayre G, Hoberman B, Mattern B, Ballesca M, Kipnis P, et al (2024). Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation. NEJM Catalyst, 5(3). doi:10.1056/CAT.23.0404 ✓Establishes the mechanism our approach depends on: ambient AI scribes transcribe patient-clinician conversations in real time at scale (millions of encounters at Kaiser Permanente), making encounter language capturable; the study evaluates documentation burden, not distress scoring.
5.Centers for Disease Control and Prevention (2024). Data and Statistics on Children's Mental Health (National Survey of Children's Health, 2022-2023). cdc.gov. link ✓US base rates, ages 3-17: anxiety 11%, behavior/conduct 8%, depression 4%; ~21% ever diagnosed with a mental/emotional/behavioral condition. Underlying instrument: NSCH (CAHMI/HRSA).
6.Su C, Aseltine R, Doshi R, Chen K, Rogers SC, Wang F (2020). Machine learning for suicide risk prediction in children and adolescents with electronic health records. Translational Psychiatry, 10: 413. doi:10.1038/s41398-020-01100-0 ✓Longitudinal EHR (41,721 patients aged 10-18); models reaching AUROC 0.81-0.86 across prediction windows, detecting 53-62% of positive subjects at 90% specificity. THE realism tether for our planted-signal strength.
7.Bommersbach TJ, McKean AJ, Olfson M, Rhee TG (2023). National trends in mental health-related emergency department visits among youth, 2011-2020. JAMA, 329(17): 1469-1477. doi:10.1001/jama.2023.4809 ✓Youth mental-health ED visits rose from 7.7% to 13.1% of all visits; suicide-related visits ~5x; fewer than 20% were evaluated by a mental-health professional.
8.Cushing AM, Liberman DB, Pham PK, et al. (2023). Mental health revisits at US pediatric emergency departments. JAMA Pediatrics, 177(2): 168-176. doi:10.1001/jamapediatrics.2022.4885 ✓13.2% of pediatric mental-health ED patients revisit within 6 months; MH ED visits grew 8.0%/yr vs 1.5%/yr for others — supports escalation/recurrence framing and the value of earlier detection.
9.Anthropic (2026). Claude for Healthcare. anthropic.com/news. link ✓Claude for Healthcare launched January 11, 2026 (HIPAA-ready; CMS Coverage Database / ICD-10 / NPI / PubMed integrations; prior-authorization review skill) — positioned as regulated workflow software, not a chatbot.
10.Anthropic (2025). Anthropic signs CMS Health Tech Ecosystem pledge to advance healthcare interoperability. anthropic.com/news. link ✓Anthropic signed the CMS Health Tech Ecosystem pledge (July 30, 2025), positioning the Model Context Protocol as the interoperability bridge.
11.Shanahan L, Zucker N, Copeland WE, Costello EJ, Angold A (2015). Childhood somatic complaints predict generalized anxiety and depressive disorders during young adulthood in a community sample. Psychological Medicine, 45(8): 1721-1730. doi:10.1017/S0033291714002840 ✓Childhood somatic complaints prospectively predict adult generalized anxiety and depressive disorders (Great Smoky Mountains Study) — somatic markers as early, not merely concurrent, signals.
12.Egger HL, Costello EJ, Erkanli A, Angold A (1999). Somatic complaints and psychopathology in children and adolescents: Stomach aches, musculoskeletal pains, and headaches. Journal of the American Academy of Child & Adolescent Psychiatry, 38(7): 852-860. doi:10.1097/00004583-199907000-00015 ✓Maps which somatic complaints associate with which disorders (stomachaches/headaches ↔ anxiety in girls; musculoskeletal pain ↔ depression) — justifies the R-series symptom selection.
13.Brent DA, Goldstein TR, Benton TD (2020). Bridging gaps in follow-up appointments after hospitalization and youth suicide. JAMA Network Open. doi:10.1001/jamanetworkopen.2019.17468 ✓Among 139,694 Medicaid youth after psychiatric hospitalization, attending a 7-day follow-up appointment was associated with about HALF the suicide risk over the next 6 months vs non-attenders, and the highest-risk youth were least likely to attend — justifying a small, non-zero predictive value for a referral/care-completion-gap signal anchored to acute post-discharge continuity.
14.Hugunin J, Davis M, Larkin C, Baek J, Skehan B, Lapane KL (2022). Established outpatient care and follow-up after acute psychiatric service use among youth and young adults. Psychiatric Services. doi:10.1176/appi.ps.202100469 ✓Lacking established outpatient care strongly predicted failure to obtain timely follow-up after a youth psychiatric hospitalization (aOR up to 2.81) or ED visit (aOR up to 4.06); only 28.6-42.7% received 7-day follow-up — evidence that care-continuity gaps are a measurable risk marker.
15.O'Keeffe S, Martin P, Target M, Midgley N (2019). Prognostic implications for adolescents with depression who drop out of psychological treatment. Journal of the American Academy of Child & Adolescent Psychiatry. doi:10.1016/j.jaac.2018.11.019 ✓CAVEAT: found insufficient evidence that adolescents who dropped out of depression psychotherapy had worse outcomes than completers — so generic therapy non-completion does NOT robustly predict harm, and our referral-gap signal is kept small and anchored to acute/post-discharge continuity rather than routine dropout.
16.U.S. Department of Education (2024). Chronic Absenteeism. ed.gov (Supporting Students). link ✓Chronic absenteeism = missing at least 10% of school days (~18 days/year), excused or unexcused — the ESSA / Dept. of Education threshold (NOT a CDC definition).
17.Fornander MJ, Kearney CA (2020). Internalizing symptoms as predictors of school absenteeism severity at multiple levels: Ensemble and classification and regression tree analysis. Frontiers in Psychology. link ✓Internalizing symptoms and somatic complaints predict school-absenteeism severity — empirical grounding for the planted somatic→attendance temporal ordering.
18.Al-Mosaiwi M, Johnstone T (2018). In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation. Clinical Psychological Science, 6(4): 529-542. doi:10.1177/2167702617747074 ✓Grounds our use of absolutist words as the most-severe tier of the distress-language score, since their elevated use was shown to be a marker specific to anxiety, depression, and suicidal-ideation forums (online forum text, not clinical encounters).
19.Rude SS, Gortner EM, Pennebaker JW (2004). Language use of depressed and depression-vulnerable college students. Cognition & Emotion, 18(8): 1121-1133. doi:10.1080/02699930441000030 ✓Supports the self-focus/withdrawal markers in our distress-language score by showing that currently- and formerly-depressed college students used more first-person-singular pronouns in writing samples (student essays, not clinical encounters).
20.Eichstaedt JC, Smith RJ, Merchant RM, Ungar LH, Crutchley P, Preotiuc-Pietro D, et al (2018). Facebook language predicts depression in medical records. Proceedings of the National Academy of Sciences, 115(44): 11203-11208. doi:10.1073/pnas.1802331115 ✓Supports the premise that everyday language predicts a forthcoming depression diagnosis, with accuracy rising in the months before onset, while noting this is population-level prediction from social-media (Facebook) posts rather than clinical-encounter language or a deployed tool.
21.Carson NJ, Mullin B, Sanchez MJ, Lu F, Yang K, Menezes M, et al (2019). Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records. PLOS ONE, 14(2): e0211116. doi:10.1371/journal.pone.0211116 ✓Anchors the claim that NLP over unstructured adolescent clinical notes surfaces behavioral-health risk beyond structured codes, by recovering note phrases associated with past-year suicide attempt in hospitalized adolescents (outcome is suicidal behavior, in a small inpatient sample).
22.McCoy TH Jr, Castro VM, Roberson AM, Snapper LA, Perlis RH (2016). Improving Prediction of Suicide and Accidental Death After Discharge From General Hospitals With Natural Language Processing. JAMA Psychiatry, 73(10): 1064-1071. doi:10.1001/jamapsychiatry.2016.2172 ✓Supports scoring affective/sentiment language in clinical notes as a risk signal, by showing positive-valence language in narrative discharge notes was associated with ~30% lower post-discharge suicide risk (adult general-hospital discharges, not adolescent telehealth encounters).
23.King CA, Brent D, Grupp-Phelan J, et al (2021). Prospective development and validation of the Computerized Adaptive Screen for Suicidal Youth. JAMA Psychiatry, 78(5): 540-549. doi:10.1001/jamapsychiatry.2020.4576 ✓An adaptive, clinician-/tablet-administered screen (CASSY) predicts youth suicide attempts within ~3 months at AUC ~0.87-0.89 — flagging who needs a mental-health intervention far more accurately than a blunt, retrospective ACE score (individual-level prediction AUC ~0.58). Grounds the clinician's encounter assessment as a targeted predictive 'label'.
24.Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, Duffett C, Dube K, Gallagher T, McLachlan S (2018). Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. Journal of the American Medical Informatics Association, 25(3): 230-238. doi:10.1093/jamia/ocx079 ✓The open-source synthetic-patient generator (Apache-2.0) whose Generic Module Framework idioms and FHIR/CSV output our generator follows; an authored GMF module ships as a credibility artifact.
25.Yoon J, Mizrahi M, Ghalaty NF, et al. (2023). EHR-Safe: Generating high-fidelity and privacy-preserving synthetic electronic health records. npj Digital Medicine, 6: 141. doi:10.1038/s41746-023-00888-7 ✓The three-axis validation framework we adopt — fidelity (KS / distribution), utility (train-on-synthetic/test-on-real), privacy (membership-inference, re-identification, attribute-inference).
26.U.S. Census Bureau / HRSA MCHB (2024). National Survey of Children's Health (NSCH) — public-use datasets. census.gov. link ✓The nationally representative survey whose 2022-23 weighted prevalence estimates calibrate the synthetic cohort's marginals.
27.Do D-K, Rockenschaub P, Boie SD, Kumpf O, Volk H-D, Balzer F, von Dincklage F, Lichtner G (2026). The impact of evaluation strategy on sepsis prediction model performance metrics in intensive care data: Retrospective cohort study. Journal of Medical Internet Research, 28. link ✓Evaluation-strategy choices (continuous-windowed vs fixed-horizon, onset-matching, silencing window) move early-warning metrics more than the model does — the basis for our continuous + onset-matched + silenced evaluation. (Domain: ICU sepsis; cited for methodology.)
28.Davis SE, Matheny ME, Balu S, Sendak MP (2023). A framework for understanding label leakage in machine learning for health care. Journal of the American Medical Informatics Association. link ✓The timing / association / aggregation label-leakage taxonomy our feature construction and patient-level split are audited against.
29.Gupta A, Chauhan R, Saravanan G, Shreekumar A (2024). Improving sepsis prediction in intensive care with SepsisAI: A clinical decision support system with a focus on minimizing false alarms. PLOS Digital Health. link ✓Template for reporting lead-time as a distribution and selecting a false-alarm-controlled operating point (warning vs alert tiers).⚠ ADULT ICU study (Beth Israel + Emory). Cited ONLY for its reporting methodology — lead-time as a distribution plus a false-alarm-controlled operating point via multi-warning windowing. NOT cited as pediatric evidence.
30.van der Schaar Lab (2023). synthcity: a library for generating and evaluating synthetic tabular data. GitHub (vanderschaarlab/synthcity). link ✓Reference taxonomy for synthetic-data metrics (KS / Jensen-Shannon / MMD fidelity, distinguishability detection-AUC, TSTR utility, privacy, survival-KM distance).
31.Haw C, Hawton K, Niedzwiedz C, Platt S (2013). Suicide clusters: a review of risk factors and mechanisms. Suicide and Life-Threatening Behavior, 43(1): 97-108. doi:10.1111/j.1943-278X.2012.00130.x ✓Anchors that point (spatiotemporal) and mass suicide clusters are a real phenomenon, disproportionately affecting adolescents/young adults, justifying our model of localized correlated crisis clusters.
32.Abrutyn S, Mueller AS (2014). Are Suicidal Behaviors Contagious in Adolescence? Using Longitudinal Data to Examine Suicide Suggestion. American Sociological Review, 79(2): 211-227. doi:10.1177/0003122413519445 ✓Anchors the social-contagion assumption behind community spread by showing, in longitudinal adolescent data, that exposure to a role model's suicide attempt triggers new suicidal thoughts/attempts, with effects fading over time and stronger in girls.
33.Niederkrotenthaler T, Voracek M, Herberth A, Till B, Strauss M, Etzersdorfer E, et al (2010). Role of media reports in completed and prevented suicide: Werther v. Papageno effects. British Journal of Psychiatry, 197(3): 234-243. doi:10.1192/bjp.bp.109.074633 ✓Supports that a shared external event can transiently shift suicide risk across a population (the Werther/Papageno media-contagion finding), an analogue to the school-event-driven distress surge we model (mechanism is media exposure, not peer ties).
34.Ivey-Stephenson AZ, Ballesteros MF, Trinh E, Stone DM, Crosby AE (2024). CDC Guidance for Community Response to Suicide Clusters, United States, 2024. MMWR Supplements (Centers for Disease Control and Prevention), 73(2): 17-26. doi:10.15585/mmwr.su7302a3 ✓Supports the operational framing that communities (including schools) detect and respond to suicide clusters - groups of suicides/attempts closer in time and space than expected - the real-world counterpart to our school-level surge detector.
35.Kulldorff M (1997). A spatial scan statistic. Communications in Statistics - Theory and Methods, 26(6): 1481-1496. doi:10.1080/03610929708831995 ✓Provides the established spatiotemporal cluster-detection methodology (the spatial scan statistic, basis of SaTScan) that our school-level real-time surge scan operationalizes for detecting localized crisis clusters.
36.van der Bles AM, van der Linden S, Freeman ALJ, Mitchell J, Galvao AB, Zaval L, Spiegelhalter DJ (2019). Communicating uncertainty about facts, numbers and science. Royal Society Open Science, 6(5): 181870. doi:10.1098/rsos.181870 ✓Numeric uncertainty ranges do not substantially erode trust, whereas verbal-only uncertainty does — the basis for our words-AND-numbers risk communication.

36 sources, numbered by first appearance. Every entry verified 2026-06-11 against PubMed / PMC / publisher pages (214 in the full bibliography).