ABSTRACT
Transitioning service members (TSMs) leaving military service have high risks of unemployment, homelessness, nonfatal suicide attempt (SA), and suicide death. Data from n = 7188 recently separated TSMs from the U.S. Army were used to update previously developed models for post-separation homelessness and SA based on data at the time of separation and to develop a new unemployment model. Predicted probabilities of suicide from a model developed elsewhere were imputed for comparison purposes. Cross-validated predictions were significant for the homelessness (AU-ROC = 0.68) and SA (AU-ROC = 0.78) models but not the unemployment model (AU-ROC = 0.60). Elevated cross-validated risk was found for the 10% of TSMs at the highest predicted risk of homelessness (SN = 26.6%), 20% for SA (SN = 60.9%), and 10% for suicide death (SN = 34.1%). 28% of TSMs were in the highest risk categories for at least one and 10% for more than one outcome. Findings regarding incomplete overlap highlight the complexities of risk targeting when multiple outcomes are of interest.
Similar content being viewed by others
Introduction
Close to 200,000 United States (U.S.) military service members transition back to civilian life each year1. Many of these transitioning service members (TSMs) report feeling unprepared for the challenges of returning to civilian life, including finding employment, managing finances, and addressing health issues2,3,4. Prior longitudinal studies found that approximately one-third of TSMs were not employed about 9 months after separation5 and that 5% experienced homelessness within 12 months of separation6, and a U.S. Department of Veterans Affairs (VA) report documented a suicide rate of 46.2 per 100,000 in the first year after separation7, underscoring the substantial challenges and risks faced during the transition back to civilian life. The U.S. Department of Defense (DoD) and VA have developed numerous transition support programs to address these problems8,9. However, most of these programs have unclear effects10. And most are applied with the same intensity to all TSMs without regard to differences in risk. This approach is almost certainly suboptimal, as TSMs at the highest risk often experience complex needs that are not adequately addressed by programs at the relatively low level of intensity of existing transitional services programs2,11. But more intensive programs, given their greater costs, would need to be targeted at high-risk TSMs to make their use cost-effective. The U.S. Government Accountability Office acknowledged this in a recent report calling for more effective screening prior to separation to determine risk of post-separation negative outcomes12.
Researchers from the Study to Assess Risk and Resilience in Servicemembers – Longitudinal Study (STARRS-LS), a longitudinal epidemiological-neurobiological study of U.S. Army soldiers13, addressed this need by developing machine learning risk prediction models for some of these outcomes in a sample of U.S. Army soldiers who recently separated from service. The predictors were self-report survey data obtained shortly before separation. These models, which were able to identify TSMs at high risk of homelessness6 and nonfatal suicide attempts (SA)14, are currently being used within VA to target more intensive interventions to TSMs at high predicted risk of these outcomes15,16. Due to the low base rates of suicide death, though, the relatively small number of TSMs in the STARRS-LS survey data could not be used to develop a suicide death model. However, it was possible for the STARRS-LS team to use Army and DoD administrative data available at the time of separation to predict suicide deaths identified in the National Death Index (NDI) for all soldiers who separated from service over the years 2010–2019. As shown in a previous report, the 10% of TSMs with the highest predicted risk in this suicide death model accounted for a cross-validated 34.1% of observed suicide deaths over the three years after separation17.
Despite widespread concern about high TSM unemployment, we are unaware of any prior effort to develop a machine learning model using pre-separation data to predict post-separation unemployment. Nor has any research been carried out to identify TSMs at elevated risk of more than one of the above outcomes, even though evidence exists that these outcomes are significantly intercorrelated18,19,20,21 and DoD/VA outreach programs attempt to address all these problems. A better understanding of intersecting risk would help optimize outreach efforts, as the most effective interventions would presumably differ depending on multi-outcome risk profiles.
In the current report, we present the results of attempts to update previously developed STARRS-LS machine learning models for post-separation homelessness and SA. The previous models for these outcomes were based on fewer waves of survey data than we now have available. We also attempted to develop a model for post-separation unemployment. The models were all developed to predict outcomes over a three-year risk horizon after separation. We then imputed predicted probabilities of suicide death over the same risk horizon using the previously developed STARRS-LS administrative data model. We then cross-classified predictions from the significant models to investigate patterns of joint risk.
Methods
Samples
STARRS-LS is an extension of the earlier Army STARRS initiative, which included three surveys with active duty soldiers between 2011 and 2014 based on group in-person self-administration22: (1) the New Soldier Study (2011–2012) of n = 38,733 soldiers during their first week on active duty; (2) the All Army Study (2011–2013) of n = 25,088 soldiers throughout the Army, including soldiers deployed to Afghanistan who were surveyed as they transitioned through Kuwait for mid-tour leave; and (3) the Pre-Post Deployment Study (2012–2014) of n = 8566 soldiers in three Combat Brigade Teams who were surveyed shortly before deployment to Afghanistan and then again after returning from this deployment. Further details regarding field procedures have been described elsewhere23,24. The Human Subjects Committees of the University of Michigan, the Uniformed Services University of the Health Sciences, and the Army Medical Research and Materiel Command approved all recruitment, consent, and field procedures. A total of n = 72,387 respondents across these three surveys consented to link their deidentified survey data with Army administrative data. Research was performed in accordance with the Declaration of Helsinki.
A probability sample of the respondents from the baseline Army STARRS surveys who consented to administrative data linkage was recruited to participate in the first STARRS-LS survey in 2016–2018 (LS1). LS1 over-sampled segments of the baseline samples of special interest to the Army (Special Operations soldiers, women, members of the activated Army National Guard and Reserve, soldiers who reported suicidality, and soldiers with evidence based on either self-report or administrative data of other clinically significant mental health problems). Attempts were then made to re-interview the n = 14,508 participants from LS1 (35.6% response rate) in 2018–2019 (LS2; n = 12,156, 83.7% conditional response rate among LS1 respondents), 2020–2022 (LS3; n = 11,119, 76.7% conditional response rate among LS1 respondents), and 2022–2024 (LS4; n = 10,830, 74.6% conditional response rate among LS1 respondents). As detailed elsewhere25, calibration weights were applied to the initial Army STARRS samples to adjust for differential probabilities of participation. Additional calibration weights were then applied to each wave of STARRS-LS to adjust for differential probabilities of participation13. STARRS-LS procedures have been described in more detail elsewhere26.
For the current study, the models for all three outcomes were developed only among LS respondents who were in the Regular Army (i.e., not in the activated Army National Guard or Army Reserve) at the time of their initial Army STARRS survey and who had subsequently separated or retired from active service in the Regular Army (whether or not they subsequently joined the National Guard or Reserve or another active branch). We also limited analysis to the subset of respondents who participated in LS3 or LS4, as a key question was asked for the first time in LS3 (and subsequently LS4) about the month and year the soldier was last on active duty in the Regular Army, based on a discovery in the course of earlier analyses that administrative data on this outcome were unreliable. Survey predictors of the outcomes were based on the most recent survey completed prior to the date of separation. In the case of respondents who separated before LS1, we required separation to be no more than three years after the individual’s last Army STARRS survey to be included in the analysis. The total sample resulted in 7188 respondents (see Supplementary Fig. 1 for details regarding person-level sample selection).
Given that post-separation unemployment was not a focus of attention in designing the LS surveys, we did not have retrospective reports in the survey comparable to those about homelessness and SA regarding whether respondents were unemployed in the first, second, or third year after separation. However, we did have information about the current employment status in each of the LS surveys. We consequently developed a dataset to predict current unemployment at the time of survey among respondents who were separated 36 months or less at the time of their most recent LS survey. This analysis included a stacked sample of n = 6119 respondents who separated within either 0–12 (n = 1844), 13–24 (n = 2038), or 25–36 (n = 2237) months of an LS survey. These three conditional datasets were stacked to form a single dataset for the purposes of increasing statistical power. We then disaggregated results to evaluate model fit in each of the three subsamples.
The models for post-separation homelessness and SA required the respondent to have separated at least 12 months before their latest LS survey to avoid bias due to right censoring. We then estimated discrete-time survival models with person-years as the unit of analysis27 for the self-reported occurrence of the outcome in the first 12 months after separation among respondents who had been separated at least 12 months before the survey, in the second year (i.e., 13–24 months) after separation among respondents who had been separated at least 24 months before the survey and reported not having the outcome in the first 12 months after separation, and in the third year (i.e., 25–36 months) after separation among respondents who had been separated at least 36 months before the survey and reported not having the outcome in the first 24 months after separation. Datasets were stacked to increase statistical power to predict these relatively uncommon outcomes and results were subsequently disaggregated to evaluate model fit in each of the three person-years.
For post-separation homelessness, n = 6904 LS respondents were included in the analysis who left active Army service at least one year before their most recent LS survey. Smaller numbers left service at least two years before their most recent survey without experiencing homelessness in the first year of separation (n = 6158) and at least three years before their most recent survey without experiencing homelessness in the first two years after separation (n = 5602). Thirty-three respondents were excluded because we were unable to determine if homelessness occurred within 36 months after separation.
For post-separation SA, n = 6929 LS respondents were included in the analysis who left active Army service at least one year before their most recent LS survey. Smaller numbers left service at least two years before their most recent survey without a SA in their first year of separation (n = 6521) and at least three years before their most recent survey without a SA in the first two years of separation (n = 6191). Eight respondents were excluded because we were unable to determine if their SA was pre- or post-separation.
Measures
Self-reported unemployment in the LS surveys was assessed by asking respondents if they were currently: “employed, temporarily laid off, on sick leave or short-term disability, on long-term or permanent disability, on parental leave, unemployed and looking for work, unemployed and not looking for work, a homemaker, or retired.” Respondents could select multiple responses. Responses of “unemployed and looking for work” were coded as unemployed. A separate analysis was carried out to predict reports of unemployment whether or not looking for work, but results are not reported here because they were very similar to those for unemployed and looking for work.
To assess homelessness, respondents were asked if they were ever homeless since leaving Regular Army service. Homelessness was defined as “not having stable housing that you either own, rent, or stay in as part of a household” based on the VA Homeless Emergency Assistance and Rapid Transition to Housing’s definition and the VA Homeless Screening Clinical Reminder28,29. Respondents who endorsed homelessness received follow-up questions to gather information about episodes and timing of homelessness in relation to time of separation.
Self-reported suicide attempt (SA) was assessed with a question adapted from the Columbia-Suicide Severity Rating Scale30: “Did you ever make a suicide attempt (i.e., purposefully hurt yourself with at least some intention to die) at any time since your last survey?” Respondents who answered “yes” were then asked age of first SA, number of lifetime SAs, recency of last SA, and in the case of the LS surveys, age of first SA since their prior STARRS or STARRS-LS survey. Information about month and year of separation was combined with information about the respondent's birthdate and reports about the ages of SAs to estimate whether reported SAs occurred before separation and in each of the first three years after separation. In cases of uncertainty involving overlap in months of a given age across two different years since leaving, we rounded down (i.e., 6 months was assigned to the earlier year). Previous studies found that self-reports capture about two-thirds of the SAs detected either by self-reports or medical records31,32.
Pre-separation predictors included 130 indicators of five broad categories of variables: socio-demographics, Army career, psychopathological risk factors (self-injurious thoughts and behaviors, mental disorders, personality), physical health, and stressors (adverse childhood experiences, other lifetime traumatic events, chronic stressors) (Supplementary Table 1). Predictors came from baseline Army STARRS surveys and STARRS-LS surveys. A few career variables (e.g., number of combat deployments) were also included from administrative datasets, but only if they could also be assessed via self-report to guarantee that models using these predictors could be used in real-time to target TSMs for preventive interventions based on responses to surveys administered during the DoD Transitional Assistance Program (TAP33). Information available only in administrative data systems was not used in the predictor set because Army and DoD administrative data systems are not currently set up to be accessed in real time.
Analysis Methods
Analysis was carried out November 2024 to April 2025 using machine learning (ML) methods. The models were person-level for unemployment and person-year discrete-time survival models for homelessness and SA27. Most studies that use ML to facilitate targeting preventive interventions either use a single algorithm34 or try several different algorithms and choose the one with the best prediction accuracy35. We instead used the Super Learner (SL) ensemble ML method36,37 to predict each of the three outcomes. SL uses stacked generalization to pool across multiple algorithms by generating a weight for each algorithm in a user-specified collection (“ensemble”) to create a composite predicted outcome score that is guaranteed in expectation to perform at least as well as the best component algorithm in the ensemble. This optimization guarantee is according to a pre-specified criterion which, in our case, was non-negative least squares (minimizing MSE)36,37.
Consistent with recommendations38, we used a diverse set of algorithms in the ensemble to capture nonlinearities and interactions and to reduce the risk of misspecification39,40 (Supplementary Table 2). Hyperparameter tuning was achieved by including individual algorithms multiple times in the ensemble with different hyperparameter values and allowing SL to weight relative importance across this range rather than using an external grid search or random search procedure (Supplementary Table 3). We compared the fit of SL to lasso penalized regression models to determine if the more complex SL approach was more accurate than a simpler lasso approach.
We attempted to reduce risk of over-fitting in three ways: (i) By excluding from the predictor set dichotomous variables with fewer than 10 observed cases of the outcome; (ii) By estimating univariable 10-fold cross-validated (10F-CV) area under the receiver operating characteristic curve (AU-ROC) of each remaining predictor and excluding predictors with a lower bound of the AU-ROC 95% confidence interval less than 0.51; and (iii) By restricting the number of predictors to include in each algorithm to be either 5%, 10%, or 20% the number of respondents with the outcome. Each learner in the ensemble was estimated three times, corresponding to each of the three numbers of predictors, which were selected separately within folds. Predictor selection for linear models was carried out using lasso and for other models using random forests.
Nested 10F-CV was used to evaluate model performance. This was done rather than the more typical use of separate training and test samples because of the relatively small overall sample size. Model discrimination for individual algorithms in the ensemble as well as for the overall ensemble was then evaluated with AU-ROC41, while model calibration was evaluated using the integrated calibration index (ICI)42. Overall model accuracy (i.e., combining discrimination and calibration) was evaluated using the Brier score. It is noteworthy that the Brier score, although sensitive to prevalence, can be used legitimately in the way we used it here as an overall measure combining discrimination and calibration across models predicting a single outcome in a single sample43.
For each outcome, we then divided the sample into 20 risk ventiles (i.e., 20 subsamples, each consisting of 5% of respondents) based on cross-validated predicted risk and calculated both conditional and cumulative sensitivity (SN; the proportion of all respondents with the outcome who are in and across ventiles of predicted risk) and positive predictive value (PPV; prevalence of the outcome within and across ventiles of predicted risk). As we would expect SN to be 5% within each ventile purely by chance, we focused on ventiles with SN of approximately 10% as those with meaningful concentration-of-risk. Inspection of CV SN across these ventiles allowed us to define high-risk strata for each outcome.
We then applied the coefficients from the previously developed STARRS-LS suicide death model17 to our sample. This model used administrative variables as predictors of NDI suicides in the population of nearly 1 million Army soldiers who separated from active service since 2010.
We then calculated cross-tabulations of predicted risk strata at the individual level across all outcomes. And, finally, the relative importance of predictors used to develop the models was evaluated using the model-agnostic Shapley Additive Explanations (SHAP) method44, a general-purpose approach to examine predictor importance in any ML prediction model by calculating marginal contribution to overall model accuracy. However, as causal interpretation of such associations is hazardous, we have relegated the discussion of this part of the analysis to the Supplementary Materials.
The Super Learner and Lasso models were estimated in R version 4.4.145. SHAP values were estimated in the XGBoost R package46. Data management and calculations of cross-validated estimates of AU-ROC, SN, PPV, ICI, and Brier scores were carried out in SAS version 9.447.
Results
Sample Composition
Although sample composition differed across outcomes (unemployment n = 6119 persons; homelessness n = 18,664 person-years; SA n = 19,641 person-years) for reasons described above, general sample characteristics were similar (Supplementary Table 4). Across the samples, the great majority of respondents were male (84–85%), Non-Hispanic White (63–66%), and 28+ years of age (53–63%). Most respondents had only a high school diploma (66–71%) and were either currently married (61–67%) or never married (27–34%) at the time of separation from service. Most had either no history of combat deployment (44–46%) or only one deployment (31–32%) and were of senior enlisted rank (74–81%) at the time of separation.
Outcome prevalence
An average of 10.3% of respondents who were in their first through third years after separation at the time of an LS survey reported being unemployed. As mentioned previously, unlike the other survey outcomes, only the point prevalence of unemployment at the time of survey was assessed rather than interval prevalence. Homelessness and SA, in comparison, were assessed retrospectively as of the time since separation. Homelessness was reported as occurring among 3.2% of respondents on average per year in the first three years after separation, and SA among 1.0% of respondents on average per year over the same three-year period. Prevalence was consistently highest in the first year after separation (15.6% unemployment, 6.4% homelessness, 1.6% SA) and declined thereafter (Table 1).
Model results
The performance of the best random forest model was generally either comparable to or better than other algorithms or the entire stacked-generalization ensemble in predicting unemployment (Table 2). In contrast, the simple lasso model was generally either comparable to more complex algorithms or the entire stacked-generalization ensemble in predicting homelessness and SA (Table 2). We consequently focus below on random forest results for unemployment and lasso results for homelessness and SA.
We examined model performance predicting outcomes in the first year after separation as well as pooled across three years to determine if model performance decreased with increasing risk horizon. Models using data pooled across all three years after separation generally performed as well as models using only data from the first year (Table 3); thus, we focus below on models with a 3-year risk horizon to increase statistical power in analyses.
The cross-validated AU-ROC of the unemployment model was 0.60. Inspection of the predicted high-risk ventiles (Table 4) showed that none had an observed SN meaningfully higher than the 5.0% expected by chance (9.7–8.5%). Based on this result, we concluded that post-separation unemployment could not be predicted accurately with our data. Unemployment was therefore not considered in subsequent analyses of overlapping risk.
The cross-validated AU-ROC of the homelessness model was 0.68. Inspection of the predicted risk ventiles (Table 4) showed that only the top two had elevated SN (15.7–11.0%), with approximately one-quarter (26.6%) of respondents with post-separation homelessness found among this 10% at the highest predicted risk. Twelve-month homelessness prevalence was 8.5% in this high-risk segment of the sample. Based on this result, we defined 10% of respondents as having predicted a high risk of homelessness.
The cross-validated AU-ROC of the SA model was 0.78. Inspection of the predicted risk ventiles (Table 4) showed very elevated SN in the top ventile (26.6%) and an elevated average SN of 11.4% across the next three ventiles, with 60.9% of respondents with post-separation SA found among this 20% at highest predicted risk. Twelve-month SA prevalence was 3.2% in this high-risk segment of the sample. Based on this result, we defined 20% of respondents as having predicted a high risk of SA.
Overlap of high-risk across outcomes
As noted above, we imputed the predicted risk of post-separation suicide death based on a previously developed model17. The 10% of TSMs at the highest predicted risk of suicide death were defined as high-risk. It is noteworthy that there were no documented suicide deaths in the STARRS-LS sample in the three years after separation. However, this is not surprising given the comparatively small size of the sample.
Given the thresholds for defining high-risk status (i.e., the 10% of respondents at highest predicted risk of homelessness and the 20% at highest predicted risk of SA), we would expect 35% of the sample to be classified high-risk for at least one of these outcomes if the three predictions were completely unrelated (i.e., 100% – [100–20%] x [100–10%] x [100–10%]). At the other extreme, only 20% of the sample would be classified as high-risk of any outcome if the 10% predicted to be high-risk for homelessness and the 10% predicted to be high-risk for suicide death were both subsumed in the 20% predicted to be high-risk for SA.
The actual proportion of the sample classified as high-risk for at least one outcome was 28.1% (Table 5). This reflects statistically significant (p < .001) but substantively modest Pearson correlations between all pairs of predicted risk dichotomies: r = 0.37 for homelessness and SA, r = 0.23 for homelessness and suicide death, and r = 0.20 for SA and suicide death. Included in the 28.1% were 18.3% predicted to be high-risk for only one outcome (2.5% homelessness, 11.2% SA, 4.6% suicide death), accounting for (i.e., SN) 23.5% of homelessness and 28.8% of SA, and 9.9% predicted to be high-risk for multiple outcomes (4.5% homelessness and SA, 1.0% homelessness and suicide death, 2.4% SA and suicide death, 2.0% all three), accounting for 29.9% of homelessness and 48.6% of SA.
It is noteworthy that lift, defined as the ratio of SN in the subgroup to the proportion of participants in the subgroup48, varied meaningfully across the different high-risk subgroups. Using the common rule of thumb that lift of 2.0+ is clinically significant49, clinically significant lift for homelessness was found among respondents with high predicted risk of homelessness whether alone (2.6) or in conjunction with other outcomes (2.8–6.1), while clinically significant lift for SA was found only among respondents with a combination of high predicted risk of SA and at least one other outcome (3.7–7.2). Lift for SA was considerably lower among respondents with high predicted risk only of SA (1.8).
Important predictors of post-separation outcomes
It is hazardous to focus too closely on predictor importance in machine learning models for two reasons: because “importance” is difficult to sort out when data-driven methods are used to select predictors from a larger set of often highly correlated variables and the associations under study are nonadditive; and because the most important predictors are not necessarily the most important causal variables. There is nonetheless interest in gaining some broad understanding of predictors, which we did by looking at SHAPP values for predictor classes, as well as for the most important individual predictors in each class. This was done both by looking at predictors in the overall homelessness and SA models and by estimating new models for high risk of only one outcome (separate models for high risk of homelessness-only, SA-only, and suicide death-only) and multiple outcomes (a single model for high risk of 2-3 outcomes). An explanation of the method is provided in the Supplementary Materials, and detailed results are presented in Supplementary Figures 2-7. Only a very brief overview is presented here.
Psychopathological variables were the most important class of predictors for most models (SHAPP = 65.1–87.2%) and were second most important for overall homelessness (SHAPP = 75.4%), but less important for high risk of homelessness-only (SHAPP = 35.4%) (Supplementary Table 5). A wide variety of psychopathological predictors emerged as important, with prior psychopathology predicting increased risk of the outcomes in the great majority of cases. A notable exception was that a prior nonfatal SA was associated with reduced risk of being at high risk only of suicide death. Army career characteristics, in comparison, were the most important class of predictors of overall homelessness (SHAPP = 102.9%), with leaving the Army at a young age, low rank at discharge, fewer months on active duty, and receiving a general (rather than honorable) discharge the most important predictors (SHAPP = 4.8–35.8%). Army career characteristics also were an important class of predictors of high risk of homelessness-only, suicide death-only, and 2+ outcomes (SHAPP = 39.0–48.1%). Other important classes of predictors were socio-demographics for high risk of homelessness-only and SA-only (SHAPP = 46.8–53.1%) and stressors for overall homelessness (SHAPP = 61.5%).
Discussion
The results of this study expand on previous STARRS-LS work by presenting updated prediction models for post-separation homelessness and SA using a larger sample and longer follow-up period than in previous reports6,14. We also attempted to develop a model for post-separation unemployment, which we did not do previously. Finally, we examined overlap in risk across multiple outcomes of interest to DoD and VA. Comparison of high predicted risk across outcomes showed that approximately 10% of TSMs are at high risk for two or more of the outcomes considered, and another 18% of only one of these outcomes. Being at high risk of multiple outcomes was associated with a greater lift for both homelessness and SA than when risk was elevated for only one outcome.
These results have direct implications for transition support programs in three ways. First, the results reiterate the potential value of machine learning risk prediction models in identifying the small proportion of TSMs who account for most instances of homelessness, SA, and suicide death in the years immediately following separation from military service. Through accurate identification, transition support interventions can move away from sole reliance on costly and suboptimal low-intensity programs delivered to all TSMs to allocate more resources and services to TSMs with high predicted risk. It is also noteworthy that this approach could spare TSMs at low risk from receiving unnecessary interventions. Such applications of risk prediction models are already recognized in pilot programs designed to address risk for suicide16 and homelessness6 in TSMs.
Second, the results provide greater understanding of post-separation risk than in prior studies by highlighting a subgroup of TSMs at elevated risk for multiple adverse outcomes and others at elevated risk of only one of the outcomes. Our results suggest that as many as 10% of TSMs may be at elevated risk of multiple outcomes. Consistent with previous research18,19,21, there were moderate correlations in risk of homelessness, SA, and suicide death. Observed patterns in SHAPP values suggest that TSMs at high-risk of multiple outcomes likely have unique needs relative to their peers at risk of single outcomes. Therefore, if transition support programs are designed with sole consideration of single outcomes, the programs risk under-addressing the needs of TSMs with intersecting risks. It might be, for example, that relatively intensive wrap-around case management makes most sense for TSMs at risk of multiple outcomes, whereas those only at risk of suicide death might need means restriction focused on firearms safety (given that firearms are overwhelmingly the most common means of suicide among TSMs). Those only at risk of SA might need suicide-focused psychotherapy. And those only at risk of homelessness might need assistance with financial planning.
We found that groups having intersecting risks comprise a smaller number of TSMs than groups with single risks, although with higher lift, highlighting the importance of identifying multi-risk subgroups in the design and delivery of transition support programs. It is possible that tailoring the structure and/or nature of transition support to the risks of TSMs could provide a more cost-efficient means of mitigating these risks than current programs. For example, TSMs at high risk only of suicide death or suicide attempt might be appropriate for connection to suicide-focused interventions, such as lethal means counseling and/or psychotherapy50,51, whereas those at high risk only of homelessness may be more appropriate for supported connection to housing support interventions6. TSMs at risk for multiple outcomes would presumably require more complex care coordination, possibly making it important to provide special types of integrated case management for such individuals.
Third, the detailed results regarding predictors, presented in the Supplementary Materials, highlight important differences in predictors of risk across outcomes, with the most salient predictors of SA and suicide death being distinct and, at least in some cases, opposite (e.g., a positive association of service in the reserve component with SA risk, but a negative association with risk of suicide death). This is an important finding given that many efforts to mitigate TSM suicide risk are guided by research on post-separation SA due to the low base rate of suicide death7. There is some prior research consistent with the existence of the differences we found in the predictors of SA and suicide death52,53,54. To mitigate risk effectively across outcomes, transition support programs need to be guided by research that explicitly focuses on common and unique risks of the targeted outcomes. Reliance on outcome proxies (e.g., assessment of SA or suicidal ideation as a proxy for risk of suicide death) will contribute to continued under-identification of risk and potential misallocation of risk mitigation resources.
As mentioned previously, we should not focus too closely on individual predictors. However, given that previous research suggests that prior SAs are a risk factor for suicide death52,55, an interesting finding about predictors worth highlighting is the negative association between prior SA and high-risk only of suicide death. This may be related to the fact that an especially high proportion of suicide decedents with military experience die by firearms, and prior research shows that suicide decedents who died by firearms are less likely than those who died by a different method to have a prior SA56. A useful avenue of future research might be to examine more nuanced characteristics of these high-risk subgroups (e.g., access to firearms, access to treatment, social support) to better understand associations among the various high-risk profiles of TSMs.
As noted in the introduction, we developed the above models to provide guidance in targeting preventive interventions for high-risk TSMs based on the assumption that the outcomes were sufficiently uncommon, and the intervention costs sufficiently high that it would not be cost-effective to provide these interventions to all TSMs. There is also the issue that some interventions for these outcomes might create burdens to the individuals who experience them, including possible stigma. Those costs need to be included in a comprehensive cost-benefit analysis of implementing the interventions.
Some critics of developing models like those presented here have argued that suicide-related behaviors (SRBs) are too rare to make interventions based on such models cost-effective, given that only a small proportion of the individuals defined as having “high” risk experience the outcome even when the models are significant57,58. However, this criticism is ad hoc rather than based on a formal analysis comparing the costs of intervening with enough people to prevent a single case of the negative outcome to the value (either from a societal perspective or from the perspective of the organization implementing the intervention) of preventing that single instance of the outcome. Formal analyses of this sort in other areas of medicine sometimes find that screening and intervening with high-risk cases can be cost-effective even for disorders much less common than SRBs. For example, newborn screening for certain rare metabolic and endocrine disorders has been shown to be beneficial for infants with predicted risks as low as 0.5% (compared to 5.6% for SA and 10.1% for homelessness in the top risk categories of our models), given that the interventions are relatively inexpensive and are effective in preventing severe, irreversible lifelong disorders59. Similarly, use of statins, which reduces coronary heart disease (CHD) by about 30%60, is judged to be cost-effective for patients with an annual CHD risk of about 3%61.
We did not attempt to carry out an analysis of cost-effectiveness for any of our models because three types of critical information required for such analyses are missing for these outcomes: (i) information about the costs of the best-practices preventive interventions that might be used for these outcomes; (ii) information about the effectiveness of these interventions; and (iii) information about the financial value placed by the Army or the Veterans Health Administration on preventing one instance of these outcomes. An experimental evaluation designed to obtain information of the first sort for one such proposed intervention is currently underway to prevent post-separation SRBs62. Formal analyses of this sort for other SRB models have shown that preventive interventions targeted with models like those presented here would be cost-effective over a plausible range of assumptions about costs and benefits17,63,64. Information from empirical evaluations of proposed interventions should be required to justify the use of models like those presented here to target interventions for high-risk TSMs before large-scale implementation.
The results reported here should be interpreted in the context of several limitations. First, although we had a relatively large sample size (n = 7188), the base rates of the outcomes were low. Future studies using larger samples are needed to increase statistical power and model accuracy. Second, the data considered here were not collected at the time of separation but were instead based on retrospective reports. This could have introduced bias into estimates. Third, the unemployment model was much less fine-grained than the other models. This might help explain our failure to develop a useful prediction model for post-separation unemployment. Finally, we made no attempt to determine if model parameters were stable over time, although the latter type of model updating would be required if the models were implemented on a long-term basis65.
This is the first study we are aware of to examine the intersection of risk for post-separation homelessness, suicide attempt, and suicide death. We found that many of the TSMs predicted to be at high risk of one such outcome were not at high predicted risk of the others. This is an important result because many machine learning prediction analyses purporting to target patients for suicide prevention are based on models that predicted nonfatal suicide attempts66,67. The latter is often the only model that can be estimated because nonfatal suicide attempts are sufficiently common to be used as an outcome in a model, whereas suicide deaths are much less common. As shown here, though, the individuals identified as high-risk of these two outcomes overlap only partially. In the presence of multiple models, as we have here, this creates a challenge – as well as an opportunity – for interventionists to determine whether different interventions are needed for individuals with different high-risk profiles and to decide optimal allocation of intervention resources across all such profiles.
Data availability
STARRS-LS Wave 1, Wave 2, Wave 3, and Wave 4 data, as well as data from the earlier Army STARRS New Soldier Study (NSS), All Army Study (AAS) and Pre and Post Deployment studies (PPDS), are available through the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan (https://www.icpsr.umich.edu/web/ICPSR/studies/35197). STARRS-LS and Army STARRS data are restricted from general dissemination, meaning that a confidential data use agreement must be established prior to access. Researchers interested in gaining access to the data can submit their applications via ICPSR’s online Restricted Contracting System. Guidelines for applying for access to this data can be found under the data and documentation tab at the above URL. The STARRS Historical Administrative Data Study (HADS) data are not available for public release.
Code availability
Code used to produce the statistical output can be located here: https://github.com/ihwang0829/starrsls_badoutcome_paper/. Because the data used in this study are not publicly available and require DoD clearance to access, we cannot post the source code used to read in the Army administrative data (which requires DoD clearance) and create the 2000+ variables used in this analysis.
References
U.S. Government Accountability Office (GAO). Transitioning Servicemembers: Information on Military Employment Assistance Centers. https://www.gao.gov/assets/gao-19-438r.pdf (2019).
Edwards, E.R., et al. Veteran suicide thoughts and attempts during the transition from military service to civilian life: qualitative insights. Death Stud. 1–13 (2024).
Vogt, D. et al. Changes in the health and broader well-being of U.S. veterans in the first three years after leaving military service: overall trends and group differences. Soc. Sci. Med. 294, 114702 (2022).
Parker, K., Igielnik, R., Barroso, A. & Cilluffo, A. The American Veteran Experience in the Post 9/11 Generation: Readjusting to Civilian Life. https://www.pewresearch.org/social-trends/2019/09/10/readjusting-to-civilian-life/ (2019).
Vogt, D. S. et al. U.S. Military Veterans’ health and well-being in the first year after service. Am. J. Prev. Med. 58, 352–60 (2020).
Tsai, J. et al. Predicting homelessness among transitioning U.S. Army soldiers. Am. J. Prev. Med. 66, 999–1007 (2024).
U.S. Department of Veterans Affairs. 2024 National Veteran Suicide Prevention Annual Report Part 1 of 2: In-Depth Reviews. https://www.mentalhealth.va.gov/docs/data-sheets/2024/2024-Annual-Report-Part-1-of-2_508.pdf (2024).
Karmack, K.N. Military Transition Assistance Program (TAP): an overview. Washington, D.C.: Library of Congress, Congressional Research Service; (2018).
U.S. Department of Veterans Affairs. VA Solid Start. https://benefits.va.gov/transition/solid-start.asp.com (2020).
U.S. Department of Veterans Affairs; Office of Transition and Economic Development (TED). Post-Separation Transition Assistance Program (TAP) Assessment (PSTAP): 2019 Cross-Sectional Survey Report https://benefits.va.gov/TRANSITION/docs/pstap-assessment.pdf (2020).
Stumpp, N. E. & Sauer-Zavala, S. Evidence-based strategies for treatment personalization: a review. Cogn. Behav. Pract. 29, 902–13 (2022).
U.S. Government Accountability Office (GAO). Military to Civilian Transition: Actions Needed to Ensure Effective Mental Health Screening at Separation. https://www.gao.gov/products/gao-25-107205 (2025).
Naifeh, J. A. et al. The Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS): progress toward understanding suicide among soldiers. Mol. Psychiatry 24, 34–48 (2019).
Kearns, J. C. et al. A practical risk calculator for suicidal behavior among transitioning US Army soldiers: results from the Study to Assess Risk and Resilience in Servicemembers-Longitudinal Study (STARRS-LS). Psychol. Med. 53, 7096–105 (2023).
Lykins, A. HEARTH project helps high-risk Veterans transition to civilian life. https://news.va.gov/132175/hearth-helps-high-risk-veterans-transition/ (2024).
U.S. Department of Veterans Affairs. Department of Veterans Affairs Fiscal Year 2024 Annual Evaluation Plan. https://department.va.gov/wp-content/uploads/2023/04/va-annual-evaluation-plan-2024.pdf (2024).
Kennedy, C. J. et al. Predicting suicides among US Army soldiers after leaving active service. JAMA Psychiatry 81, 1215–24 (2024).
Culhane, D., Szymkowiak, D. & Schinka, J. A. Suicidality and the onset of homelessness: evidence for a temporal association from VHA treatment records. Psychiatr. Serv. 70, 1049–52 (2019).
Edwards, E. R. et al. Understanding risk in younger Veterans: risk and protective factors associated with suicide attempt, homelessness, and arrest in a nationally representative Veteran sample. Mil. Psychol. 34, 175–86 (2022).
O’Connor, K. et al. Unemployment and co-occurring disorders among homeless veterans. J. Dual Diagn. 9, 134–8 (2013).
Schinka, J. A., Leventhal, K. C., Lapcevic, W. A. & Casey, R. Mortality and cause of death in younger homeless veterans. Public Health Rep. 133, 177–81 (2018).
Ursano, R. J. et al. The Army study to assess risk and resilience in servicemembers (Army STARRS). Psychiatry 77, 107–19 (2014).
Heeringa, S. G. et al. Field procedures in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Int. J. Methods Psychiatr. Res. 22, 276–87 (2013).
Kessler, R. C. et al. Design of the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Int. J. Methods Psychiatr. Res. 22, 267–75 (2013).
Kessler, R. C. et al. Response bias, weighting adjustments, and design effects in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Int. J. Methods Psychiatr. Res. 22, 288–302 (2013).
Stanley, I. H. et al. Predicting suicide attempts among U.S. Army soldiers after leaving active duty using information available before leaving active duty: results from the Study to Assess Risk and Resilience in Servicemembers-Longitudinal Study (STARRS-LS). Mol. Psychiatry 27, 1631–9 (2022).
Suresh, K., Severn, C. & Ghosh, D. Survival prediction models: an introduction to discrete-time modeling. BMC Med. Res. Methodol. 22, 207 (2022).
Department of Housing and Urban Development. Homeless Emergency Assistance and Rapid Transition to Housing: Defining “Homeless”. https://www.federalregister.gov/documents/2011/12/05/2011-30942/homeless-emergency-assistance-and-rapid-transition-to-housing-defining-homeless (2011).
Montgomery, A. E. et al. Universal screening for homelessness and risk for homelessness in the Veterans Health Administration. Am. J. Public Health 103, S210–1 (2013).
Posner, K. et al. The Columbia-Suicide Severity Rating Scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am. J. Psychiatry 168, 1266–77 (2011).
Chu, C. et al. A test of the interpersonal theory of suicide in a large, representative, retrospective and prospective study: results from the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Behav. Res. Ther. 132, 103688 (2020).
Lee, D. J. et al. A longitudinal study of risk factors for suicide attempts among Operation Enduring Freedom and Operation Iraqi Freedom veterans. Depress Anxiety 35, 609–18 (2018).
Transition Assistance Program. Welcome to DoD TAP. https://www.dodtap.mil/dodtap/app/home (2025).
Jiang, T. et al. Using machine learning to predict suicide in the 30 days after discharge from psychiatric hospital in Denmark. Br. J. Psychiatry 219, 440–7 (2021).
Saxe, G. N., Ma, S., Ren, J. & Aliferis, C. Machine learning methods to predict child posttraumatic stress: a proof of concept study. BMC Psychiatry 17, 223 (2017).
Polley, E., et al. SuperLearner: super learner prediction [R package]. version 2.0-29. https://cran.r-project.org/web/packages/SuperLearner/SuperLearner.pdf (2024).
Polley, E. Rose, S., van der Laan, M. & Rose, S. Super Learning. Targeted learning: causal inference for observational and experimental data. 43-66 (Springer New York, 2011).
LeDell, E., van der Laan, M. J. & Petersen, M. AUC-maximizing ensembles through metalearning. Int. J. Biostat. 12, 203–18 (2016).
Kabir, M. F. & Ludwig, S. A. Enhancing the performance of classification using super learning. Data-Enabled Discov. Appl. 3, 1–13 (2019).
Kennedy, C.J. Guide to SuperLearner. https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.html (2017).
Hajian-Tilaki, K. Receiver Operating Characteristic (ROC) Curve analysis for medical diagnostic test evaluation. Casp. J. Intern. Med. 4, 627–35 (2013).
Austin, P. C. & Steyerberg, E. W. The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models. Stat. Med. 38, 4051–65 (2019).
Assel, M., Sjoberg, D. D. & Vickers, A. J. The Brier score does not evaluate the clinical utility of diagnostic tests or prediction models. Diagn. Progn. Res. 1, 19 (2017).
Lundberg, S. & Lee, S.-I., editors. A unified approach to interpreting model predictions. The 31st international conference on neural information processing systems (NIPS); Red Hook, NY: (Curran Associates Inc.; 2017).
R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. (2021).
Chen, T., Guestrin, C. editors. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery: New York, NY, 2016.
SAS Institute Inc. SAS ®Software 9.4 In (Version 9.4) (2023).
Tufféry, S. Data Mining and Statistics for Decision Making. 1st ed: 720 (Wiley, 2011).
Walsh, C. G., Ribeiro, J. D. & Franklin, J. C. Predicting risk of suicide attempts over time through machine learning. Clin. Psychol. Sci. 5, 457–69 (2017).
Spitzer, E. G. et al. A systematic review of lethal means safety counseling interventions: impacts on safety behaviors and self-directed violence. Epidemiol. Rev. 46, 1–22 (2024).
van Ballegooijen, W. et al. Suicidal ideation and suicide attempts after direct or indirect psychotherapy: a systematic review and meta-analysis. JAMA Psychiatry 82, 31–7 (2025).
Beautrais, A. L. Suicides and serious suicide attempts: two populations or one?. Psychol. Med. 31, 837–45 (2001).
Schafer, K. M. et al. Suicidal ideation, suicide attempts, and suicide death among Veterans and service members: a comprehensive meta-analysis of risk factors. Mil. Psychol. 34, 129–46 (2022).
Ge, F. et al. Contrasting risk profiles for suicide attempt and suicide using Danish registers and genetic data. JAMA Psychiatry 83, 32–42 (2025).
World Health Organization. Suicide. https://www.who.int/news-room/fact-sheets/detail/suicide (2025).
Ammerman, B. A. & Reger, M. A. Evaluation of prevention efforts and risk factors among veteran suicide decedents who died by firearm. Suicide Life Threat. Behav. 50, 679–87 (2020).
Belsher, B. E. et al. Prediction models for suicide attempts and deaths: a systematic review and simulation. JAMA Psychiatry 76, 642–51 (2019).
Luk, J. W. et al. From everyday life predictions to suicide prevention: clinical and ethical considerations in suicide predictive analytic tools. J. Clin. Psychol. 78, 137–48 (2022).
Kwon, C. & Farrell, P. M. The magnitude and challenge of false-positive newborn screening test results. Arch. Pediatr. Adolesc. Med. 154, 714–8 (2000).
LaRosa, J. C., He, J. & Vupputuri, S. Effect of statins on risk of coronary disease: a meta-analysis of randomized controlled trials. JAMA 282, 2340–6 (1999).
Ward, S. et al. A systematic review and economic evaluation of statins for the prevention of coronary events. In: Walley T, editor. NIHR Health Technology Assessment programme: Executive Summaries. Southampton, UK: (NIHR Journals Library, 2007).
Geraci, J. C. et al. Supporting servicemembers and veterans during their transition to civilian life using certified sponsors: a three-arm randomized controlled trial. Psychol. Serv. 20, 248–59 (2023).
Naifeh, J. A. et al. Predicting suicide attempts among US Army soldiers using information available at the time of periodic health assessments. Nat. Ment. Health 3, 242–52 (2025).
Ross, E. L. et al. Accuracy requirements for cost-effective suicide risk prediction among primary care patients in the US. JAMA Psychiatry 78, 642–50 (2021).
Binuya, M. A. E. et al. Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review. BMC Med. Res. Methodol. 22, 316 (2022).
Bayramli, I. et al. Predictive structured-unstructured interactions in EHR models: A case study of suicide prediction. NPJ Digit. Med. 5, 15 (2022).
Wang, H. et al. Suicide risk prediction for Korean adolescents based on machine learning. Sci. Rep. 15, 14921 (2025).
Acknowledgements
The Army STARRS Team consists of Co-Principal Investigators: Robert J. Ursano, MD (Uniformed Services University) and Murray B. Stein, MD, MPH (University of California San Diego and VA San Diego Healthcare System) Site Principal Investigators: James Wagner, PhD (University of Michigan) and Ronald C. Kessler, PhD (Harvard Medical School)Army scientific consultant/liaison: Kenneth Cox, MD, MPH (Office of the Assistant Secretary of the Army (Manpower and Reserve Affairs) Other team members: Pablo A. Aliaga, MA (Uniformed Services University); David M. Benedek, MD (Uniformed Services University); Laura Campbell-Sills, PhD (University of California San Diego); Carol S. Fullerton, PhD (Uniformed Services University); Nancy Gebler, MA (University of Michigan); Meredith House, BA (University of Michigan); Paul E. Hurwitz, MPH (Uniformed Services University); Sonia Jain, PhD (University of California San Diego); Tzu-Cheg Kao, PhD (Uniformed Services University); Lisa Lewandowski-Romps, PhD (University of Michigan); Alex Luedtke, PhD (University of Washington and Fred Hutchinson Cancer Research Center); Holly Herberman Mash, PhD (Uniformed Services University); James A. Naifeh, PhD (Uniformed Services University); Matthew K. Nock, PhD (Harvard University); Nur Hani Zainal, PhD (Harvard Medical School); Nancy A. Sampson, BA (Harvard Medical School); and Alan M. Zaslavsky, PhD (Harvard Medical School). Army STARRS was sponsored by the Department of the Army and funded under cooperative agreement number U01MH087981 (2009-2015) with the National Institute of Mental Health (NIMH). Subsequently, STARRS-LS was sponsored and funded by the Department of Defense (USUHS grant number HU0001-15-2-0004). Edwards was supported in part by the United States Department of Veterans Affairs, Clinical Sciences Research and Development Service (CSR&D) VA-STARRS Researcher-in-Residence Program (Project SPR-002-24F). This work was also supported in part by the United States Department of Veterans Affairs, Clinical Sciences Research and Development Service (CSR&D) project VA-STARRS 22. The contents are solely the responsibility of the authors and do not necessarily represent the views of the NIMH, the Department of the Army, the Department of Defense, Department of Veteran Affairs or the United Statement government.
Author information
Authors and Affiliations
Contributions
S.B. carried out study conceptualization, methodology, project administration, resources, supervision, validation, visualization, writing – original draft, and writing – review and editing. E.R.E. contributed to conceptualization, methodology, project administration, resources, supervision, validation, writing – original draft, and writing – review and editing. J.C.G. contributed to conceptualization, methodology, and writing – review and editing. S.M.G. contributed to project administration, resources, and visualization. I.H. contributed to data curation, formal analysis, investigation, supervision, validation, and writing – review and editing. C.J.K., H.L., and A.L. each contributed to conceptualization, investigation, supervision, and writing – review and editing. N.A.S. contributed to data curation, investigation, methodology, project administration, and writing – review and editing. D.B. contributed to formal analysis, funding acquisition, and writing – review and editing. V.C. contributed to funding acquisition, investigation, and writing – review and editing. J.A.N., M.K.N., and M.B.S. each contributed to conceptualization, funding acquisition, and writing – review and editing. J.W. contributed to data curation, funding acquisition, investigation, project administration, and writing – review and editing. R.J.U. contributed to conceptualization, funding acquisition, investigation, project administration, and writing – review and editing. R.C.K. contributed to conceptualization, funding acquisition, investigation, methodology, project administration, supervision, validation, visualization, writing – original draft, and writing – review and editing. All authors critically revised the manuscript for important intellectual content and approved the final version for submission.
Corresponding author
Ethics declarations
Competing interests
In the past 3 years, Dr. Kessler was a consultant for Cambridge Health Alliance, Child Mind Institute, Massachusetts General Hospital, RallyPoint LLC., Sage Therapeutics, University of Michigan, and University of North Carolina. He has stock options in Cerebral Inc., Mirah, PYM (Prepare Your Mind), and Verisense Health. He has an ownership interest in Menssano LLC. In the past 3 years, Dr. Stein received consulting income from Actelion, Acadia Pharmaceuticals, Aptinyx, atai Life Sciences, Boehringer Ingelheim, Bionomics, BioXcel Therapeutics, Clexio, EmpowerPharm, Engrail Therapeutics, GW Pharmaceuticals, Janssen, Jazz Pharmaceuticals, and Roche/Genentech. He has stock options in Oxeia Biopharmaceuticals and EpiVario. He is paid for his editorial work on Biological Psychiatry (Deputy Editor) and UpToDate (Co-Editor-in-Chief for Psychiatry). The rest of the authors have declared no other conflicts of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Borowski, S., Edwards, E.R., Geraci, J.C. et al. Joint models targeting U.S. Army soldiers at high-risk of post-separation unemployment, homelessness, and suicide-related behaviors. npj Mental Health Res 5, 10 (2026). https://doi.org/10.1038/s44184-026-00192-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s44184-026-00192-8
- Springer Nature Limited


