Introduction

Close to 200,000 United States (U.S.) military service members transition back to civilian life each year1. Many of these transitioning service members (TSMs) report feeling unprepared for the challenges of returning to civilian life, including finding employment, managing finances, and addressing health issues2,3,4. Prior longitudinal studies found that approximately one-third of TSMs were not employed about 9 months after separation5 and that 5% experienced homelessness within 12 months of separation6, and a U.S. Department of Veterans Affairs (VA) report documented a suicide rate of 46.2 per 100,000 in the first year after separation7, underscoring the substantial challenges and risks faced during the transition back to civilian life. The U.S. Department of Defense (DoD) and VA have developed numerous transition support programs to address these problems8,9. However, most of these programs have unclear effects10. And most are applied with the same intensity to all TSMs without regard to differences in risk. This approach is almost certainly suboptimal, as TSMs at the highest risk often experience complex needs that are not adequately addressed by programs at the relatively low level of intensity of existing transitional services programs2,11. But more intensive programs, given their greater costs, would need to be targeted at high-risk TSMs to make their use cost-effective. The U.S. Government Accountability Office acknowledged this in a recent report calling for more effective screening prior to separation to determine risk of post-separation negative outcomes12.

Researchers from the Study to Assess Risk and Resilience in Servicemembers – Longitudinal Study (STARRS-LS), a longitudinal epidemiological-neurobiological study of U.S. Army soldiers13, addressed this need by developing machine learning risk prediction models for some of these outcomes in a sample of U.S. Army soldiers who recently separated from service. The predictors were self-report survey data obtained shortly before separation. These models, which were able to identify TSMs at high risk of homelessness6 and nonfatal suicide attempts (SA)14, are currently being used within VA to target more intensive interventions to TSMs at high predicted risk of these outcomes15,16. Due to the low base rates of suicide death, though, the relatively small number of TSMs in the STARRS-LS survey data could not be used to develop a suicide death model. However, it was possible for the STARRS-LS team to use Army and DoD administrative data available at the time of separation to predict suicide deaths identified in the National Death Index (NDI) for all soldiers who separated from service over the years 2010–2019. As shown in a previous report, the 10% of TSMs with the highest predicted risk in this suicide death model accounted for a cross-validated 34.1% of observed suicide deaths over the three years after separation17.

Despite widespread concern about high TSM unemployment, we are unaware of any prior effort to develop a machine learning model using pre-separation data to predict post-separation unemployment. Nor has any research been carried out to identify TSMs at elevated risk of more than one of the above outcomes, even though evidence exists that these outcomes are significantly intercorrelated18,19,20,21 and DoD/VA outreach programs attempt to address all these problems. A better understanding of intersecting risk would help optimize outreach efforts, as the most effective interventions would presumably differ depending on multi-outcome risk profiles.

In the current report, we present the results of attempts to update previously developed STARRS-LS machine learning models for post-separation homelessness and SA. The previous models for these outcomes were based on fewer waves of survey data than we now have available. We also attempted to develop a model for post-separation unemployment. The models were all developed to predict outcomes over a three-year risk horizon after separation. We then imputed predicted probabilities of suicide death over the same risk horizon using the previously developed STARRS-LS administrative data model. We then cross-classified predictions from the significant models to investigate patterns of joint risk.

Methods

Samples

STARRS-LS is an extension of the earlier Army STARRS initiative, which included three surveys with active duty soldiers between 2011 and 2014 based on group in-person self-administration22: (1) the New Soldier Study (2011–2012) of n = 38,733 soldiers during their first week on active duty; (2) the All Army Study (2011–2013) of n = 25,088 soldiers throughout the Army, including soldiers deployed to Afghanistan who were surveyed as they transitioned through Kuwait for mid-tour leave; and (3) the Pre-Post Deployment Study (2012–2014) of n = 8566 soldiers in three Combat Brigade Teams who were surveyed shortly before deployment to Afghanistan and then again after returning from this deployment. Further details regarding field procedures have been described elsewhere23,24. The Human Subjects Committees of the University of Michigan, the Uniformed Services University of the Health Sciences, and the Army Medical Research and Materiel Command approved all recruitment, consent, and field procedures. A total of n = 72,387 respondents across these three surveys consented to link their deidentified survey data with Army administrative data. Research was performed in accordance with the Declaration of Helsinki.

A probability sample of the respondents from the baseline Army STARRS surveys who consented to administrative data linkage was recruited to participate in the first STARRS-LS survey in 2016–2018 (LS1). LS1 over-sampled segments of the baseline samples of special interest to the Army (Special Operations soldiers, women, members of the activated Army National Guard and Reserve, soldiers who reported suicidality, and soldiers with evidence based on either self-report or administrative data of other clinically significant mental health problems). Attempts were then made to re-interview the n = 14,508 participants from LS1 (35.6% response rate) in 2018–2019 (LS2; n = 12,156, 83.7% conditional response rate among LS1 respondents), 2020–2022 (LS3; n = 11,119, 76.7% conditional response rate among LS1 respondents), and 2022–2024 (LS4; n���= 10,830, 74.6% conditional response rate among LS1 respondents). As detailed elsewhere25, calibration weights were applied to the initial Army STARRS samples to adjust for differential probabilities of participation. Additional calibration weights were then applied to each wave of STARRS-LS to adjust for differential probabilities of participation13. STARRS-LS procedures have been described in more detail elsewhere26.

For the current study, the models for all three outcomes were developed only among LS respondents who were in the Regular Army (i.e., not in the activated Army National Guard or Army Reserve) at the time of their initial Army STARRS survey and who had subsequently separated or retired from active service in the Regular Army (whether or not they subsequently joined the National Guard or Reserve or another active branch). We also limited analysis to the subset of respondents who participated in LS3 or LS4, as a key question was asked for the first time in LS3 (and subsequently LS4) about the month and year the soldier was last on active duty in the Regular Army, based on a discovery in the course of earlier analyses that administrative data on this outcome were unreliable. Survey predictors of the outcomes were based on the most recent survey completed prior to the date of separation. In the case of respondents who separated before LS1, we required separation to be no more than three years after the individual’s last Army STARRS survey to be included in the analysis. The total sample resulted in 7188 respondents (see Supplementary Fig. 1 for details regarding person-level sample selection).

Given that post-separation unemployment was not a focus of attention in designing the LS surveys, we did not have retrospective reports in the survey comparable to those about homelessness and SA regarding whether respondents were unemployed in the first, second, or third year after separation. However, we did have information about the current employment status in each of the LS surveys. We consequently developed a dataset to predict current unemployment at the time of survey among respondents who were separated 36 months or less at the time of their most recent LS survey. This analysis included a stacked sample of n = 6119 respondents who separated within either 0–12 (n = 1844), 13–24 (n = 2038), or 25–36 (n = 2237) months of an LS survey. These three conditional datasets were stacked to form a single dataset for the purposes of increasing statistical power. We then disaggregated results to evaluate model fit in each of the three subsamples.

The models for post-separation homelessness and SA required the respondent to have separated at least 12 months before their latest LS survey to avoid bias due to right censoring. We then estimated discrete-time survival models with person-years as the unit of analysis27 for the self-reported occurrence of the outcome in the first 12 months after separation among respondents who had been separated at least 12 months before the survey, in the second year (i.e., 13–24 months) after separation among respondents who had been separated at least 24 months before the survey and reported not having the outcome in the first 12 months after separation, and in the third year (i.e., 25–36 months) after separation among respondents who had been separated at least 36 months before the survey and reported not having the outcome in the first 24 months after separation. Datasets were stacked to increase statistical power to predict these relatively uncommon outcomes and results were subsequently disaggregated to evaluate model fit in each of the three person-years.

For post-separation homelessness, n = 6904 LS respondents were included in the analysis who left active Army service at least one year before their most recent LS survey. Smaller numbers left service at least two years before their most recent survey without experiencing homelessness in the first year of separation (n = 6158) and at least three years before their most recent survey without experiencing homelessness in the first two years after separation (n = 5602). Thirty-three respondents were excluded because we were unable to determine if homelessness occurred within 36 months after separation.

For post-separation SA, n = 6929 LS respondents were included in the analysis who left active Army service at least one year before their most recent LS survey. Smaller numbers left service at least two years before their most recent survey without a SA in their first year of separation (n = 6521) and at least three years before their most recent survey without a SA in the first two years of separation (n = 6191). Eight respondents were excluded because we were unable to determine if their SA was pre- or post-separation.

Measures

Self-reported unemployment in the LS surveys was assessed by asking respondents if they were currently: “employed, temporarily laid off, on sick leave or short-term disability, on long-term or permanent disability, on parental leave, unemployed and looking for work, unemployed and not looking for work, a homemaker, or retired.” Respondents could select multiple responses. Responses of “unemployed and looking for work” were coded as unemployed. A separate analysis was carried out to predict reports of unemployment whether or not looking for work, but results are not reported here because they were very similar to those for unemployed and looking for work.

To assess homelessness, respondents were asked if they were ever homeless since leaving Regular Army service. Homelessness was defined as “not having stable housing that you either own, rent, or stay in as part of a household” based on the VA Homeless Emergency Assistance and Rapid Transition to Housing’s definition and the VA Homeless Screening Clinical Reminder28,29. Respondents who endorsed homelessness received follow-up questions to gather information about episodes and timing of homelessness in relation to time of separation.

Self-reported suicide attempt (SA) was assessed with a question adapted from the Columbia-Suicide Severity Rating Scale30: “Did you ever make a suicide attempt (i.e., purposefully hurt yourself with at least some intention to die) at any time since your last survey?” Respondents who answered “yes” were then asked age of first SA, number of lifetime SAs, recency of last SA, and in the case of the LS surveys, age of first SA since their prior STARRS or STARRS-LS survey. Information about month and year of separation was combined with information about the respondent's birthdate and reports about the ages of SAs to estimate whether reported SAs occurred before separation and in each of the first three years after separation. In cases of uncertainty involving overlap in months of a given age across two different years since leaving, we rounded down (i.e., 6 months was assigned to the earlier year). Previous studies found that self-reports capture about two-thirds of the SAs detected either by self-reports or medical records31,32.

Pre-separation predictors included 130 indicators of five broad categories of variables: socio-demographics, Army career, psychopathological risk factors (self-injurious thoughts and behaviors, mental disorders, personality), physical health, and stressors (adverse childhood experiences, other lifetime traumatic events, chronic stressors) (Supplementary Table 1). Predictors came from baseline Army STARRS surveys and STARRS-LS surveys. A few career variables (e.g., number of combat deployments) were also included from administrative datasets, but only if they could also be assessed via self-report to guarantee that models using these predictors could be used in real-time to target TSMs for preventive interventions based on responses to surveys administered during the DoD Transitional Assistance Program (TAP33). Information available only in administrative data systems was not used in the predictor set because Army and DoD administrative data systems are not currently set up to be accessed in real time.

Analysis Methods

Analysis was carried out November 2024 to April 2025 using machine learning (ML) methods. The models were person-level for unemployment and person-year discrete-time survival models for homelessness and SA27. Most studies that use ML to facilitate targeting preventive interventions either use a single algorithm34 or try several different algorithms and choose the one with the best prediction accuracy35. We instead used the Super Learner (SL) ensemble ML method36,37 to predict each of the three outcomes. SL uses stacked generalization to pool across multiple algorithms by generating a weight for each algorithm in a user-specified collection (“ensemble”) to create a composite predicted outcome score that is guaranteed in expectation to perform at least as well as the best component algorithm in the ensemble. This optimization guarantee is according to a pre-specified criterion which, in our case, was non-negative least squares (minimizing MSE)36,37.

Consistent with recommendations38, we used a diverse set of algorithms in the ensemble to capture nonlinearities and interactions and to reduce the risk of misspecification39,40 (Supplementary Table 2). Hyperparameter tuning was achieved by including individual algorithms multiple times in the ensemble with different hyperparameter values and allowing SL to weight relative importance across this range rather than using an external grid search or random search procedure (Supplementary Table 3). We compared the fit of SL to lasso penalized regression models to determine if the more complex SL approach was more accurate than a simpler lasso approach.

We attempted to reduce risk of over-fitting in three ways: (i) By excluding from the predictor set dichotomous variables with fewer than 10 observed cases of the outcome; (ii) By estimating univariable 10-fold cross-validated (10F-CV) area under the receiver operating characteristic curve (AU-ROC) of each remaining predictor and excluding predictors with a lower bound of the AU-ROC 95% confidence interval less than 0.51; and (iii) By restricting the number of predictors to include in each algorithm to be either 5%, 10%, or 20% the number of respondents with the outcome. Each learner in the ensemble was estimated three times, corresponding to each of the three numbers of predictors, which were selected separately within folds. Predictor selection for linear models was carried out using lasso and for other models using random forests.

Nested 10F-CV was used to evaluate model performance. This was done rather than the more typical use of separate training and test samples because of the relatively small overall sample size. Model discrimination for individual algorithms in the ensemble as well as for the overall ensemble was then evaluated with AU-ROC41, while model calibration was evaluated using the integrated calibration index (ICI)42. Overall model accuracy (i.e., combining discrimination and calibration) was evaluated using the Brier score. It is noteworthy that the Brier score, although sensitive to prevalence, can be used legitimately in the way we used it here as an overall measure combining discrimination and calibration across models predicting a single outcome in a single sample43.

For each outcome, we then divided the sample into 20 risk ventiles (i.e., 20 subsamples, each consisting of 5% of respondents) based on cross-validated predicted risk and calculated both conditional and cumulative sensitivity (SN; the proportion of all respondents with the outcome who are in and across ventiles of predicted risk) and positive predictive value (PPV; prevalence of the outcome within and across ventiles of predicted risk). As we would expect SN to be 5% within each ventile purely by chance, we focused on ventiles with SN of approximately 10% as those with meaningful concentration-of-risk. Inspection of CV SN across these ventiles allowed us to define high-risk strata for each outcome.

We then applied the coefficients from the previously developed STARRS-LS suicide death model17 to our sample. This model used administrative variables as predictors of NDI suicides in the population of nearly 1 million Army soldiers who separated from active service since 2010.

We then calculated cross-tabulations of predicted risk strata at the individual level across all outcomes. And, finally, the relative importance of predictors used to develop the models was evaluated using the model-agnostic Shapley Additive Explanations (SHAP) method44, a general-purpose approach to examine predictor importance in any ML prediction model by calculating marginal contribution to overall model accuracy. However, as causal interpretation of such associations is hazardous, we have relegated the discussion of this part of the analysis to the Supplementary Materials.

The Super Learner and Lasso models were estimated in R version 4.4.145. SHAP values were estimated in the XGBoost R package46. Data management and calculations of cross-validated estimates of AU-ROC, SN, PPV, ICI, and Brier scores were carried out in SAS version 9.447.

Results

Sample Composition

Although sample composition differed across outcomes (unemployment n = 6119 persons; homelessness n = 18,664 person-years; SA n = 19,641 person-years) for reasons described above, general sample characteristics were similar (Supplementary Table 4). Across the samples, the great majority of respondents were male (84–85%), Non-Hispanic White (63–66%), and 28+ years of age (53–63%). Most respondents had only a high school diploma (66–71%) and were either currently married (61–67%) or never married (27–34%) at the time of separation from service. Most had either no history of combat deployment (44–46%) or only one deployment (31–32%) and were of senior enlisted rank (74–81%) at the time of separation.

Outcome prevalence

An average of 10.3% of respondents who were in their first through third years after separation at the time of an LS survey reported being unemployed. As mentioned previously, unlike the other survey outcomes, only the point prevalence of unemployment at the time of survey was assessed rather than interval prevalence. Homelessness and SA, in comparison, were assessed retrospectively as of the time since separation. Homelessness was reported as occurring among 3.2% of respondents on average per year in the first three years after separation, and SA among 1.0% of respondents on average per year over the same three-year period. Prevalence was consistently highest in the first year after separation (15.6% unemployment, 6.4% homelessness, 1.6% SA) and declined thereafter (Table 1).

Table 1 Prevalence of the outcomesa

Model results

The performance of the best random forest model was generally either comparable to or better than other algorithms or the entire stacked-generalization ensemble in predicting unemployment (Table 2). In contrast, the simple lasso model was generally either comparable to more complex algorithms or the entire stacked-generalization ensemble in predicting homelessness and SA (Table 2). We consequently focus below on random forest results for unemployment and lasso results for homelessness and SA.

Table 2 Estimates of model performance for unemployment, homelessness, and suicide attempta

We examined model performance predicting outcomes in the first year after separation as well as pooled across three years to determine if model performance decreased with increasing risk horizon. Models using data pooled across all three years after separation generally performed as well as models using only data from the first year (Table 3); thus, we focus below on models with a 3-year risk horizon to increase statistical power in analyses.

Table 3 Estimates of model performance for models estimated on all three years, when applied to outcomes in individual years, and models estimated on Year 1, when applied only to outcomes in Year 1a

The cross-validated AU-ROC of the unemployment model was 0.60. Inspection of the predicted high-risk ventiles (Table 4) showed that none had an observed SN meaningfully higher than the 5.0% expected by chance (9.7–8.5%). Based on this result, we concluded that post-separation unemployment could not be predicted accurately with our data. Unemployment was therefore not considered in subsequent analyses of overlapping risk.

Table 4 Observed 12-month self-reported unemployment, homelessness, and suicide attempt by the ventiles of predicted risk in the total samples

The cross-validated AU-ROC of the homelessness model was 0.68. Inspection of the predicted risk ventiles (Table 4) showed that only the top two had elevated SN (15.7–11.0%), with approximately one-quarter (26.6%) of respondents with post-separation homelessness found among this 10% at the highest predicted risk. Twelve-month homelessness prevalence was 8.5% in this high-risk segment of the sample. Based on this result, we defined 10% of respondents as having predicted a high risk of homelessness.

The cross-validated AU-ROC of the SA model was 0.78. Inspection of the predicted risk ventiles (Table 4) showed very elevated SN in the top ventile (26.6%) and an elevated average SN of 11.4% across the next three ventiles, with 60.9% of respondents with post-separation SA found among this 20% at highest predicted risk. Twelve-month SA prevalence was 3.2% in this high-risk segment of the sample. Based on this result, we defined 20% of respondents as having predicted a high risk of SA.

Overlap of high-risk across outcomes

As noted above, we imputed the predicted risk of post-separation suicide death based on a previously developed model17. The 10% of TSMs at the highest predicted risk of suicide death were defined as high-risk. It is noteworthy that there were no documented suicide deaths in the STARRS-LS sample in the three years after separation. However, this is not surprising given the comparatively small size of the sample.

Given the thresholds for defining high-risk status (i.e., the 10% of respondents at highest predicted risk of homelessness and the 20% at highest predicted risk of SA), we would expect 35% of the sample to be classified high-risk for at least one of these outcomes if the three predictions were completely unrelated (i.e., 100% – [100–20%] x [100–10%] x [100–10%]). At the other extreme, only 20% of the sample would be classified as high-risk of any outcome if the 10% predicted to be high-risk for homelessness and the 10% predicted to be high-risk for suicide death were both subsumed in the 20% predicted to be high-risk for SA.

The actual proportion of the sample classified as high-risk for at least one outcome was 28.1% (Table 5). This reflects statistically significant (p < .001) but substantively modest Pearson correlations between all pairs of predicted risk dichotomies: r = 0.37 for homelessness and SA, r = 0.23 for homelessness and suicide death, and r = 0.20 for SA and suicide death. Included in the 28.1% were 18.3% predicted to be high-risk for only one outcome (2.5% homelessness, 11.2% SA, 4.6% suicide death), accounting for (i.e., SN) 23.5% of homelessness and 28.8% of SA, and 9.9% predicted to be high-risk for multiple outcomes (4.5% homelessness and SA, 1.0% homelessness and suicide death, 2.4% SA and suicide death, 2.0% all three), accounting for 29.9% of homelessness and 48.6% of SA.

Table 5 Intersection of high predicted risk of homelessness, suicide attempt, and suicide death among STARRS-LS respondents who were separated as of LS4

It is noteworthy that lift, defined as the ratio of SN in the subgroup to the proportion of participants in the subgroup48, varied meaningfully across the different high-risk subgroups. Using the common rule of thumb that lift of 2.0+ is clinically significant49, clinically significant lift for homelessness was found among respondents with high predicted risk of homelessness whether alone (2.6) or in conjunction with other outcomes (2.8–6.1), while clinically significant lift for SA was found only among respondents with a combination of high predicted risk of SA and at least one other outcome (3.7–7.2). Lift for SA was considerably lower among respondents with high predicted risk only of SA (1.8).

Important predictors of post-separation outcomes

It is hazardous to focus too closely on predictor importance in machine learning models for two reasons: because “importance” is difficult to sort out when data-driven methods are used to select predictors from a larger set of often highly correlated variables and the associations under study are nonadditive; and because the most important predictors are not necessarily the most important causal variables. There is nonetheless interest in gaining some broad understanding of predictors, which we did by looking at SHAPP values for predictor classes, as well as for the most important individual predictors in each class. This was done both by looking at predictors in the overall homelessness and SA models and by estimating new models for high risk of only one outcome (separate models for high risk of homelessness-only, SA-only, and suicide death-only) and multiple outcomes (a single model for high risk of 2-3 outcomes). An explanation of the method is provided in the Supplementary Materials, and detailed results are presented in Supplementary Figures 2-7. Only a very brief overview is presented here.

Psychopathological variables were the most important class of predictors for most models (SHAPP = 65.1–87.2%) and were second most important for overall homelessness (SHAPP = 75.4%), but less important for high risk of homelessness-only (SHAPP = 35.4%) (Supplementary Table 5). A wide variety of psychopathological predictors emerged as important, with prior psychopathology predicting increased risk of the outcomes in the great majority of cases. A notable exception was that a prior nonfatal SA was associated with reduced risk of being at high risk only of suicide death. Army career characteristics, in comparison, were the most important class of predictors of overall homelessness (SHAPP = 102.9%), with leaving the Army at a young age, low rank at discharge, fewer months on active duty, and receiving a general (rather than honorable) discharge the most important predictors (SHAPP = 4.8–35.8%). Army career characteristics also were an important class of predictors of high risk of homelessness-only, suicide death-only, and 2+ outcomes (SHAPP = 39.0–48.1%). Other important classes of predictors were socio-demographics for high risk of homelessness-only and SA-only (SHAPP = 46.8–53.1%) and stressors for overall homelessness (SHAPP = 61.5%).

Discussion

The results of this study expand on previous STARRS-LS work by presenting updated prediction models for post-separation homelessness and SA using a larger sample and longer follow-up period than in previous reports6,14. We also attempted to develop a model for post-separation unemployment, which we did not do previously. Finally, we examined overlap in risk across multiple outcomes of interest to DoD and VA. Comparison of high predicted risk across outcomes showed that approximately 10% of TSMs are at high risk for two or more of the outcomes considered, and another 18% of only one of these outcomes. Being at high risk of multiple outcomes was associated with a greater lift for both homelessness and SA than when risk was elevated for only one outcome.

These results have direct implications for transition support programs in three ways. First, the results reiterate the potential value of machine learning risk prediction models in identifying the small proportion of TSMs who account for most instances of homelessness, SA, and suicide death in the years immediately following separation from military service. Through accurate identification, transition support interventions can move away from sole reliance on costly and suboptimal low-intensity programs delivered to all TSMs to allocate more resources and services to TSMs with high predicted risk. It is also noteworthy that this approach could spare TSMs at low risk from receiving unnecessary interventions. Such applications of risk prediction models are already recognized in pilot programs designed to address risk for suicide16 and homelessness6 in TSMs.

Second, the results provide greater understanding of post-separation risk than in prior studies by highlighting a subgroup of TSMs at elevated risk for multiple adverse outcomes and others at elevated risk of only one of the outcomes. Our results suggest that as many as 10% of TSMs may be at elevated risk of multiple outcomes. Consistent with previous research18,19,21, there were moderate correlations in risk of homelessness, SA, and suicide death. Observed patterns in SHAPP values suggest that TSMs at high-risk of multiple outcomes likely have unique needs relative to their peers at risk of single outcomes. Therefore, if transition support programs are designed with sole consideration of single outcomes, the programs risk under-addressing the needs of TSMs with intersecting risks. It might be, for example, that relatively intensive wrap-around case management makes most sense for TSMs at risk of multiple outcomes, whereas those only at risk of suicide death might need means restriction focused on firearms safety (given that firearms are overwhelmingly the most common means of suicide among TSMs). Those only at risk of SA might need suicide-focused psychotherapy. And those only at risk of homelessness might need assistance with financial planning.

We found that groups having intersecting risks comprise a smaller number of TSMs than groups with single risks, although with higher lift, highlighting the importance of identifying multi-risk subgroups in the design and delivery of transition support programs. It is possible that tailoring the structure and/or nature of transition support to the risks of TSMs could provide a more cost-efficient means of mitigating these risks than current programs. For example, TSMs at high risk only of suicide death or suicide attempt might be appropriate for connection to suicide-focused interventions, such as lethal means counseling and/or psychotherapy50,51, whereas those at high risk only of homelessness may be more appropriate for supported connection to housing support interventions6. TSMs at risk for multiple outcomes would presumably require more complex care coordination, possibly making it important to provide special types of integrated case management for such individuals.

Third, the detailed results regarding predictors, presented in the Supplementary Materials, highlight important differences in predictors of risk across outcomes, with the most salient predictors of SA and suicide death being distinct and, at least in some cases, opposite (e.g., a positive association of service in the reserve component with SA risk, but a negative association with risk of suicide death). This is an important finding given that many efforts to mitigate TSM suicide risk are guided by research on post-separation SA due to the low base rate of suicide death7. There is some prior research consistent with the existence of the differences we found in the predictors of SA and suicide death52,53,54. To mitigate risk effectively across outcomes, transition support programs need to be guided by research that explicitly focuses on common and unique risks of the targeted outcomes. Reliance on outcome proxies (e.g., assessment of SA or suicidal ideation as a proxy for risk of suicide death) will contribute to continued under-identification of risk and potential misallocation of risk mitigation resources.

As mentioned previously, we should not focus too closely on individual predictors. However, given that previous research suggests that prior SAs are a risk factor for suicide death52,55, an interesting finding about predictors worth highlighting is the negative association between prior SA and high-risk only of suicide death. This may be related to the fact that an especially high proportion of suicide decedents with military experience die by firearms, and prior research shows that suicide decedents who died by firearms are less likely than those who died by a different method to have a prior SA56. A useful avenue of future research might be to examine more nuanced characteristics of these high-risk subgroups (e.g., access to firearms, access to treatment, social support) to better understand associations among the various high-risk profiles of TSMs.

As noted in the introduction, we developed the above models to provide guidance in targeting preventive interventions for high-risk TSMs based on the assumption that the outcomes were sufficiently uncommon, and the intervention costs sufficiently high that it would not be cost-effective to provide these interventions to all TSMs. There is also the issue that some interventions for these outcomes might create burdens to the individuals who experience them, including possible stigma. Those costs need to be included in a comprehensive cost-benefit analysis of implementing the interventions.

Some critics of developing models like those presented here have argued that suicide-related behaviors (SRBs) are too rare to make interventions based on such models cost-effective, given that only a small proportion of the individuals defined as having “high” risk experience the outcome even when the models are significant57,58. However, this criticism is ad hoc rather than based on a formal analysis comparing the costs of intervening with enough people to prevent a single case of the negative outcome to the value (either from a societal perspective or from the perspective of the organization implementing the intervention) of preventing that single instance of the outcome. Formal analyses of this sort in other areas of medicine sometimes find that screening and intervening with high-risk cases can be cost-effective even for disorders much less common than SRBs. For example, newborn screening for certain rare metabolic and endocrine disorders has been shown to be beneficial for infants with predicted risks as low as 0.5% (compared to 5.6% for SA and 10.1% for homelessness in the top risk categories of our models), given that the interventions are relatively inexpensive and are effective in preventing severe, irreversible lifelong disorders59. Similarly, use of statins, which reduces coronary heart disease (CHD) by about 30%60, is judged to be cost-effective for patients with an annual CHD risk of about 3%61.

We did not attempt to carry out an analysis of cost-effectiveness for any of our models because three types of critical information required for such analyses are missing for these outcomes: (i) information about the costs of the best-practices preventive interventions that might be used for these outcomes; (ii) information about the effectiveness of these interventions; and (iii) information about the financial value placed by the Army or the Veterans Health Administration on preventing one instance of these outcomes. An experimental evaluation designed to obtain information of the first sort for one such proposed intervention is currently underway to prevent post-separation SRBs62. Formal analyses of this sort for other SRB models have shown that preventive interventions targeted with models like those presented here would be cost-effective over a plausible range of assumptions about costs and benefits17,63,64. Information from empirical evaluations of proposed interventions should be required to justify the use of models like those presented here to target interventions for high-risk TSMs before large-scale implementation.

The results reported here should be interpreted in the context of several limitations. First, although we had a relatively large sample size (n = 7188), the base rates of the outcomes were low. Future studies using larger samples are needed to increase statistical power and model accuracy. Second, the data considered here were not collected at the time of separation but were instead based on retrospective reports. This could have introduced bias into estimates. Third, the unemployment model was much less fine-grained than the other models. This might help explain our failure to develop a useful prediction model for post-separation unemployment. Finally, we made no attempt to determine if model parameters were stable over time, although the latter type of model updating would be required if the models were implemented on a long-term basis65.

This is the first study we are aware of to examine the intersection of risk for post-separation homelessness, suicide attempt, and suicide death. We found that many of the TSMs predicted to be at high risk of one such outcome were not at high predicted risk of the others. This is an important result because many machine learning prediction analyses purporting to target patients for suicide prevention are based on models that predicted nonfatal suicide attempts66,67. The latter is often the only model that can be estimated because nonfatal suicide attempts are sufficiently common to be used as an outcome in a model, whereas suicide deaths are much less common. As shown here, though, the individuals identified as high-risk of these two outcomes overlap only partially. In the presence of multiple models, as we have here, this creates a challenge – as well as an opportunity – for interventionists to determine whether different interventions are needed for individuals with different high-risk profiles and to decide optimal allocation of intervention resources across all such profiles.