3.1 Demand forecasting performance and vulnerability effects
The first set of results—honestly, the ones that set the tone for everything else—came from the forecasting module. When we broke observed and predicted PPE demand into low-, medium-, and high-SVI strata, the pattern was not subtle. High-vulnerability areas did not simply have “more demand.” They had more volatile demand, and the uncertainty around that demand behaved differently too. In the forecasting plots (Figure 1), mean demand levels rose with vulnerability, but so did variance. The high-SVI series looked less like a smooth curve and more like a sequence of abrupt shifts—spikes, pullbacks, then another rise. That kind of shape matters operationally because it is precisely what breaks fixed replenishment schedules and static allocation formulas.
The uncertainty bands tell an even more consequential story. The 90% prediction intervals in Figure 1 widen as SVI increases, meaning the system is less confident about what will happen next in more socially vulnerable regions. At first, it is tempting to interpret that as “the model is worse in those areas.” But that framing misses the point. The more plausible interpretation is that the underlying demand process is genuinely more unstable in high-vulnerability settings—because of access frictions, facility constraints, mobility barriers, and episodic surges in need. That is consistent with the broader equity literature showing that vulnerable communities often experience less consistent access and greater disruption during emergencies (Alphonso et al., 2024; Khazanchi et al., 2024).

Figure 1. Observed versus predicted PPE demand across low- and high-SVI regions. Solid lines indicate observed demand; dashed lines indicate model forecasts. Shaded regions represent 90% prediction intervals, highlighting greater demand variability and uncertainty in high-SVI regions.

Figure 2. Comparison of average fill rates across SVI strata under baseline and equity-aware RL allocation models. The proposed model improves service levels in high-SVI regions while maintaining overall system performance.
This connects to a methodological argument that is easy to say but hard to truly operationalize: deterministic demand estimates can become actively misleading during crisis environments, particularly if uncertainty is not spatially homogeneous. Dautel et al. (2024) emphasize the reliability challenge of medical resource demand models in epidemic contexts; our results echo that caution, but in a very practical way. If the forecast uncertainty is systematically higher in high-SVI regions, then any allocation policy that uses a single global safety-stock logic—or assumes the same forecast error behavior everywhere—will predictably under-allocate to those high-SVI regions. In other words, this is not just a forecasting accuracy issue. It is an equity issue hiding inside model error structure (Figure 1; Table 4).
There is also a strong implementation implication here. Price et al. (2024) describe how forecasting was embedded into real replenishment activities in West Virginia under dynamic conditions; our findings reinforce why that embedding must remain probabilistic and iterative. If demand can change quickly and unpredictably—especially in vulnerable regions—then forecasting cannot be a one-time planning input. It has to be a living signal that continuously feeds allocation (Price et al., 2024). That logic, in a sense, is the foundation for why a closed-loop architecture is not “nice to have” but necessary (Table 5).
3.2 Equity outcomes of allocation policies
Once the forecasting behavior was clear, the allocation results became easier to interpret—and also harder to ignore. When we compared fill rates across low-, medium-, and high-SVI regions under baseline policies versus the proposed equity-aware RL policy, the baseline pattern was almost painfully familiar: fill rates declined as vulnerability increased (Table 6; Figure 2, Figure 6). In practical terms, the system served low-SVI regions better, more consistently, and with fewer interruptions. High-SVI regions, by contrast, bore a larger share of partial fulfillment, delayed replenishment, and shortage exposure.
This is not just an abstract computational finding. It mirrors what the pandemic revealed about PPE distribution inequities: shortages persisted unevenly across regions and facility types, not solely because the nation “did not have enough,” but because allocation and distribution failed to adapt fairly under constraints (Rubashkin et al., 2023). It also aligns with broader vaccination equity findings—where high-vulnerability counties experience compounded barriers to access and uptake, including travel burden and structural frictions (Khazanchi et al., 2024; Woolfork et al., 2024).
The proposed RL-based policy shifted that gradient. Under the equity-aware model, fill rates in high-SVI regions improved substantially while maintaining strong service levels in low- and medium-SVI regions (Table 6; Figure 2). Importantly, the improvement did not come from simplistic equalization (e.g., “everyone gets the same percentage”). Instead, the model learned state-dependent decisions—allocations that responded to evolving inventory, forecasted demand, lead times, and vulnerability-weighted service objectives. That distinction matters. It suggests the equity gain is not a moral appeal layered on top of logistics, but a learned operational behavior emerging from the objective and feedback loop.
One useful way to think about this is to contrast equity-as-reporting with equity-as-control. In many empirical equity analyses, vulnerability measures are used post hoc: we look at who had worse outcomes and then describe the disparity (Alphonso et al., 2024; Khazanchi et al., 2024; Woolfork et al., 2024). The model here takes a different posture. It uses vulnerability as a decision signal. That is a meaningful shift—from “measure the gap” to “control the gap.” In doing so, the results extend the operational evidence in Hu et al. (2023), where algorithmic allocation was used in King County to translate requests and constraints into distribution actions. Our findings suggest that when equity weighting and sequential learning are added to that kind of constrained allocation setting, disparities can be reduced without breaking feasibility (Hu et al., 2023) (Table 5).
5.3 System resilience and shortage mitigation
Equity outcomes are important, but in emergency logistics there is always a second question—sometimes an anxious one: “Okay, but does it make the system fragile?” The resilience results help answer that.
Cumulative shortage days over time (Figure 3) show a clear divergence between baseline allocation and RL-based reallocation. Baseline policies—especially those that are static or rule-based—accumulate shortage days steadily. Once shortages begin, they tend to persist. That makes intuitive sense: a plan-centric policy can be “correct” for a short horizon and still fail when conditions shift. It does not adapt quickly enough to break shortage momentum.

Figure 3. Cumulative shortage days over time under baseline and RL-based allocation policies. RL-based reallocation reduces shortage accumulation and improves system responsiveness under dynamic demand conditions.

Figure 4. Observed and forecasted resource demand across SVI tiers. Solid lines indicate observed demand; dashed lines indicate forecasts; shaded bands show 90% prediction intervals, demonstrating higher volatility in high-SVI regions.
The RL-based policy, by contrast, slowed shortage accumulation and often bent the curve downward relative to baseline (Figure 3; Table 3). The most telling aspect is not merely that shortages were lower at the end of the horizon, but that the RL policy intervened earlier—rebalancing resources before shortages became entrenched. This is where the closed-loop architecture becomes visible in outcomes (Figure 5). Because the agent observes evolving system states and learns from feedback, it can take anticipatory actions rather than reactive ones (Table 5).
These resilience gains align closely with the broader resource-sharing and dynamic rebalancing literature. Keyvanshokooh et al. (2024) show that data-driven resource sharing can reduce equipment needs and costs compared with non-sharing approaches while remaining compatible with real-time decision constraints. Our results suggest a similar logic holds for PPE-like resources: dynamic reallocation reduces shortage persistence because it treats the network as a coupled system rather than as isolated regions (Keyvanshokooh et al., 2024). From an operations perspective, this is exactly the kind of “strategic flexibility” that plan-centric optimization often struggles to capture under nonstationary shocks (Dey et al., 2024; Kiss & Elhedhli, 2024).
It is also worth noting that this is one of the places where RL’s conceptual fit becomes practical. Wu et al. (2025) emphasize that healthcare operations often involve sequential decisions and feedback; shortages are a textbook example of delayed consequences. The results here reinforce that if you treat allocation as a sequence—rather than a single optimization snapshot—you can reduce shortage accumulation over time (Wu et al., 2025).
3.4 Efficiency–equity trade-offs
There is a familiar objection to equity-aware allocation: “Fine, but it will cost more.” We took that concern seriously because emergency logistics operates under hard constraints—transportation capacity, storage limits, cold chain requirements (for vaccines), lead times, and sometimes the ability to acquire capacity from third parties (Kiss & Elhedhli, 2024). If equity-aware policies produce large cost penalties, they may be politically or operationally unacceptable even if ethically appealing.
The comparative results suggest a more nuanced reality. We observed substantial improvements in equity (fill-rate parity across SVI strata) and resilience (reduced shortage days) without an undue increase in overall logistics cost (Table 3). Some additional overhead existed—especially associated with reallocation/transshipment actions—but that overhead was offset by reduced shortage persistence and fewer “emergency-like” responses later in the horizon. Put differently, the system spent a bit more effort earlier to avoid paying much more later.
This is an important point because it undermines a simplistic “equity vs. efficiency” framing. The pandemic optimization literature—especially in vaccine and PPE logistics—has shown how fragile systems generate hidden costs when plans fail: expediting, emergency procurement, ad hoc redistribution, and service breakdowns (Dey et al., 2024). Woolfork et al. (2024) similarly emphasize that inequity is not just an ethical failure; it can be a structural driver of poorer population outcomes, which then feeds back into system strain. Our findings are consistent with that idea: when equity is integrated into decision-making early, the system may actually become more efficient in the broader sense—because it reduces crisis amplification loops (Dey et al., 2024; Woolfork et al., 2024).
3.5 Interpreting why the RL + geospatial approach worked
At this point, the question becomes: why did the proposed method perform better, beyond the fact that “RL is adaptive”? The answer is not just RL. It is the combination of (i) probabilistic forecasting, (ii) geospatial/equity weighting, and (iii) sequential policy learning.
First, probabilistic forecasts mattered because uncertainty was heterogeneously distributed (Figure 1; Table 4). If the model had produced point forecasts only, the allocation layer would have had less warning about surge risk in high-SVI areas. Second, geospatial equity modeling mattered because “need” is not purely epidemiological or purely inventory-based; it is mediated by access burdens. That is precisely what empirical work on vaccination access and uptake demonstrates—travel time and vulnerability alter realized coverage, not just theoretical supply (Khazanchi et al., 2024; Alphonso et al., 2024). Third, RL mattered because it converted these signals into adaptive rules—policies that change as the state changes, rather than plans that assume the future behaves like the past (Wu et al., 2025).
Jayaraman et al. (2024) caution that RL in healthcare requires careful implementation and evaluation due to high-stakes outcomes; our results support the view that when RL is constrained, policy-based, and embedded in realistic logistics constraints, it can produce credible operational improvements rather than unstable “black box” behavior (Jayaraman et al., 2024) (Table 5).
3.6 Practical and policy implications
From a policy standpoint, the findings point to a shift in how preparedness tools should be designed. The experience of PPE allocation during COVID-19 already suggested that allocation is a computational problem under policy constraints, not merely procurement (Hu et al., 2023). Our results extend that insight by showing that equity can be operationalized—explicitly—inside the computational problem.
This matters because inequity is not a side effect that disappears once total supply increases. Rubashkin et al. (2023) showed that PPE needs and shortages persisted unevenly across geographies and facility types, sometimes even after peak periods. Woolfork et al. (2024) similarly demonstrate that disparities persisted across time and place during the vaccination effort. These patterns imply that “more inventory” is not enough; the distribution logic must change too (Rubashkin et al., 2023; Woolfork et al., 2024).
In practical terms, the closed-loop architecture (Table 5) offers a blueprint for public health agencies: integrate demand forecasting with allocation; incorporate vulnerability and access measures as decision inputs; and update allocations repeatedly as conditions change. This is consistent with the broader movement in pandemic logistics toward decision systems that are policy-centric and adaptive, rather than horizon plan-centric (Dey et al., 2024; Wu et al., 2025).
3.7 Limitations and future work
A fair reading of these results also requires acknowledging boundaries. First, the analysis is regional. That is valuable for national preparedness, but it can hide within-region inequities—differences between facilities, neighborhoods, or subpopulations. Second, while SVI and accessibility metrics are powerful, they are still proxies. There are equity dimensions that are harder to quantify—trust, language barriers, documentation concerns, and local infrastructure reliability—that may influence realized access. Third, RL in high-stakes settings raises governance and accountability questions. Jayaraman et al. (2024) emphasize the importance of careful evaluation and deployment; translating a policy model into real public health workflows would require transparent objectives, auditable constraints, and stakeholder oversight (Jayaraman et al., 2024).
Future work could extend the framework in three directions. One is facility-level or even sub-county modeling, which would align with the “last-mile equity” emphasis in geospatial work (Alphonso et al., 2024; Khazanchi et al., 2024). A second direction is multi-resource allocation (PPE, drugs, vaccines simultaneously), which introduces coupling and substitution complexities. A third direction is deeper integration of capacity acquisition uncertainty—especially relevant when storage, routing, or third-party capacity can be acquired under disruption (Kiss & Elhedhli, 2024).
3.8 Summary of the evidence
Pulling the threads together: the results show that (1) demand uncertainty is spatially structured and correlated with vulnerability (Figure 1; Table 4), (2) baseline allocation policies reproduce vulnerability gradients in service levels (Figure 2; Table 6), (3) equity-aware RL policies narrow those gaps while maintaining strong overall fill rates (Figure 2; Table 6), and (4) the same policies improve resilience by limiting cumulative shortage days over time (Figure 3; Table 3). Taken together, the findings support the central claim that equitable preparedness is not simply a matter of higher stockpiles—it is a matter of learning, adapting, and explicitly treating equity as a decision objective rather than a post-hoc report (Hu et al., 2023; Wu et al., 2025; Woolfork et al., 2024).
Table 3. Performance Metrics
|
Category
|
Metric
|
Definition
|
|
Efficiency
|
Fill rate
|
Proportion of demand satisfied
|
|
Efficiency
|
Cost
|
Transportation and holding cost
|
|
Resilience
|
Shortage days
|
Days with unmet demand
|
|
Equity
|
SVI-parity index
|
Service level variance across SVI tiers
|
Table 4. Baseline descriptive statistics for critical resource demand, supply chain timing, and equity-related variables. Results demonstrate significant demand heterogeneity and access inequality, supporting probabilistic forecasting and equity-aware allocation approaches (Dautel et al., 2024; Khazanchi et al., 2024).
|
Variable
|
Mean
|
Std. Dev.
|
Min
|
Max
|
|
Daily PPE demand (units)
|
1,420
|
615
|
180
|
4,950
|
|
Vaccine demand (doses/day)
|
860
|
402
|
95
|
3,210
|
|
Drug replenishment demand (units/day)
|
1,105
|
530
|
140
|
3,880
|
|
Average lead time (days)
|
6.8
|
2.1
|
2
|
14
|
|
County SVI score
|
0.52
|
0.21
|
0.08
|
0.96
|
|
Avg. travel time to facility (minutes)
|
27.4
|
11.9
|
6.3
|
68.7
|
Table 5. Comparative Performance: Baseline vs. Proposed Framework
|
Metric
|
Rule-Based
|
Static Optimization
|
Proposed RL-Equity
|
|
Avg. fill rate (%)
|
82.6
|
88.9
|
95.4
|
|
Shortage days (avg.)
|
18.2
|
11.6
|
4.9
|
|
High-SVI fill rate (%)
|
71.4
|
80.3
|
93.1
|
|
Equity gap (fill-rate variance)
|
0.142
|
0.081
|
0.019
|
|
Avg. logistics cost ($M)
|
12.4
|
11.1
|
11.6
|
Table 6. Scenario Stress-Test Results
|
Scenario
|
Shortage Reduction (%)
|
Equity Gap Reduction (%)
|
|
Baseline demand
|
63.1
|
78.4
|
|
Pandemic surge (+40%)
|
54.7
|
69.2
|
|
Transport disruption
|
48.9
|
61.5
|
And perhaps the more uncomfortable takeaway is this: if forecast uncertainty and access barriers concentrate in vulnerable regions, then “neutral” allocation rules are not neutral. They will reliably miss the places where the system is least stable. The proposed framework—probabilistic forecasting, geospatial equity weighting, and RL-based sequential control—offers one practical path toward breaking that pattern.