From Scarcity to Fairness: A Closed-Loop Reinforcement Learning and Geospatial Analytics Framework for Equitable Healthcare Supply Chain Preparedness

Adib Hossain; Fahad Ahmed; Khandaker Ataur Rahman; Shaid Hasan

doi:10.25163/business.3110666

Business and Social Sciences

Business and social sciences | Online ISSN 3067-8919

Citations

82.3k

Views

Articles

Submit

Volume 3 Number 1 2026

Figures and Tables

RESEARCH ARTICLE (Open Access)

Previous Next Contents Vol 3 (1)

From Scarcity to Fairness: A Closed-Loop Reinforcement Learning and Geospatial Analytics Framework for Equitable Healthcare Supply Chain Preparedness

Adib Hossain ¹*, Fahad Ahmed ², Khandaker Ataur Rahman ³, Shaid Hasan ³

+ Author Affiliations

Business and Social Sciences 3 (1) 1-11 https://doi.org/10.25163/business.3110666

Submitted: 04 December 2025 Revised: 03 February 2026 Published: 08 February 2026

Abstract

Healthcare supply shortages during crises are often framed as failures of production or procurement. Yet recent public health emergencies suggest something more complicated—resources may exist nationally while still failing to reach the populations that need them most. This study proposes an integrated predictive–prescriptive framework designed to operationalize equity in healthcare resource distribution rather than treating it as a retrospective performance metric. The framework combines probabilistic spatiotemporal demand forecasting, geospatial accessibility modeling, and reinforcement learning–based sequential allocation within a closed-loop decision architecture. Using multi-source operational, epidemiological, and social vulnerability data, the model learns adaptive allocation policies that respond to evolving demand uncertainty and access constraints across regions. Results suggest that socially vulnerable regions exhibit both higher demand variability and greater forecast uncertainty, reinforcing the need for probabilistic and equity-aware allocation strategies. Compared with baseline rule-based and static optimization approaches, the proposed framework improves service levels in high-vulnerability regions, reduces cumulative shortage days, and maintains strong overall system performance. Importantly, the findings challenge the traditional assumption that equity necessarily reduces operational efficiency. Instead, early integration of equity signals appears to enhance system resilience and reduce downstream crisis-response costs. Collectively, the study demonstrates that equitable preparedness is not solely a policy aspiration but can be translated into a measurable, learnable, and implementable operational objective.

Keywords: Healthcare supply chain resilience, Health equity analytics, Reinforcement learning in healthcare operations, Geospatial accessibility modeling, Pandemic logistics optimization.

1. Introduction

Healthcare supply shortages are often discussed as if they were purely about “not having enough.” But if we pause for a moment and look more closely—especially through the lens of the COVID-19 pandemic—the picture becomes more complicated. Shortages are not simply about total supply; they are deeply tied to where resources are, when they arrive, and who can realistically access them. In many ways, the U.S. healthcare system experienced not just scarcity, but unevenness. During COVID-19, personal protective equipment (PPE) demand did not rise and fall uniformly. Instead, it shifted across regions, facility types, and time periods, leaving some hospitals and non-acute facilities struggling even after national peak periods had passed (Rubashkin et al., 2023). That unevenness forced public health systems to rethink allocation as something closer to a real-time decision science problem than a simple procurement exercise. For example, algorithm-driven allocation tools used in places like King County demonstrated how resource distribution had to continuously adapt to changing constraints and requests in dynamic environments (Hu et al., 2023).

At the same time, inequity in healthcare preparedness extends beyond physical supply chains into access barriers. Vaccination studies during the pandemic showed that spatial access—distance to sites, transportation burdens, and regional infrastructure—strongly influenced uptake, particularly in socially vulnerable communities. Travel burden and structural access barriers were associated with lower vaccination rates in high-vulnerability counties, illustrating that availability does not automatically translate into accessibility (Khazanchi et al., 2024). Similarly, geospatial analyses revealed clusters of low vaccination coverage that overlapped with higher poverty, uninsurance, and overall vulnerability, reinforcing how social determinants shape real-world outcomes (Alphonso et al., 2024). National-level health equity analyses further confirmed that disparities persisted across both rural and urban populations throughout the public health emergency, suggesting systemic structural drivers rather than temporary anomalies (Woolfork et al., 2024).

Despite significant advances in pandemic supply chain optimization—covering inventory planning, routing, and allocation—existing analytics frameworks still struggle to operationalize equity directly. Much of the literature treats forecasting and allocation as separate steps, even though crisis environments require continuous feedback loops where forecasts and decisions constantly inform each other (Dey et al., 2024; Hu et al., 2023). In addition, equity indicators such as vulnerability indices are often used retrospectively for reporting rather than prospectively for guiding allocation decisions. In practice, this means systems can optimize efficiency while unintentionally reinforcing inequities already embedded in infrastructure and access patterns (Alphonso et al., 2024; Khazanchi et al., 2024).

Emerging evidence from resource sharing and transshipment research offers an important hint about what more adaptive systems might achieve. Dynamic resource sharing strategies have demonstrated the ability to reduce equipment requirements and operational costs while remaining feasible for real-time decision environments, suggesting that resilience and efficiency can improve simultaneously when resources are continuously rebalanced across regions (Keyvanshokooh et al., 2024). This observation points toward a broader shift—from static planning toward adaptive policy learning.

In this context, reinforcement learning (RL) combined with geospatial analytics represents a particularly promising methodological direction. Healthcare logistics during crises is inherently sequential: decisions made today shape shortages, utilization, and accessibility tomorrow. RL is specifically designed to learn optimal policies under uncertainty and feedback, and its applications in healthcare operations are rapidly expanding (Wu et al., 2025). At the same time, equitable distribution must be geographically defined and measured, requiring integration of travel time, facility distribution, and spatial vulnerability patterns (Khazanchi et al., 2024; Alphonso et al., 2024). When combined, probabilistic forecasting, geospatial accessibility modeling, and equity-aware reinforcement learning create an opportunity to move equity from a retrospective metric into a real-time decision objective.

Against this backdrop, this study proposes an integrated predictive-prescriptive framework for equitable healthcare resource distribution in the United States. By linking probabilistic demand forecasting, geospatial accessibility constraints, and reinforcement learning–based allocation, the framework aims to create adaptive, implementable policies that simultaneously support preparedness and reduce regional disparities. In doing so, the work contributes to an evolving shift in healthcare supply chain science—from optimizing average efficiency toward optimizing equitable system performance under uncertainty.

2. Methods

Designing an analytical framework for equitable healthcare resource allocation is, in practice, less about choosing a single model and more about deciding how multiple imperfect pieces can be made to work together. In this study, we adopted what might best be described as a predictive–prescriptive analytics architecture—one that does not stop at forecasting demand but continues forward into allocation decisions and policy adaptation. This choice reflects a growing recognition in healthcare logistics that prediction alone is insufficient during crisis conditions unless it directly informs operational action (Dautel et al., 2024; Price et al., 2024). Building on prior pandemic supply chain work, particularly real-world allocation implementations and optimization modeling studies, we structured the framework as a closed-loop decision system linking forecasting, allocation, and learning over time (Hu et al., 2023; Dey et al., 2024).

The overall analytical pipeline consisted of four tightly coupled modules: data integration and preprocessing, spatiotemporal demand forecasting, equity-aware allocation using reinforcement learning (RL), and performance evaluation using efficiency and equity metrics. The closed-loop structure was intentional. Healthcare supply chains, especially during public health emergencies, behave less like static networks and more like evolving systems where each decision alters future demand, availability, and accessibility conditions (Wu et al., 2025; Jayaraman et al., 2024).

Data Sources and Variable Construction

Multiple heterogeneous data streams were integrated to capture operational, epidemiological, geographic, and social dimensions of demand. Historical healthcare supply chain data included stock levels, consumption rates, replenishment lead times, and interfacility transfers for critical resources such as PPE, drugs, and vaccines. These operational signals were essential for capturing real-world allocation constraints and behaviors observed during the COVID-19 response (Hu et al., 2023).

Epidemiological data included case incidence, hospitalization rates, and outbreak intensity indicators. These variables served as leading indicators of resource demand, consistent with prior work demonstrating the predictive value of epidemic-aware demand models (Dautel et al., 2024). Geospatial datasets included facility locations, catchment boundaries, road network travel times, and accessibility matrices, following simulation–optimization frameworks used in vaccine location and allocation modeling (Yin et al., 2024). Finally, equity indicators were operationalized using Social Vulnerability Index (SVI) components and demographic measures associated with access disparities and vaccination uptake differences across communities (Alphonso et al., 2024; Woolfork et al., 2024). A detailed summary of data sources, key variables, and their operational roles in the pipeline is provided in Table 1.

Table 1. Summary of integrated data sources, spatial granularity, and functional roles within the predictive–prescriptive allocation framework.

Data Category	Key Variables	Spatial Resolution	Analytical Role in Framework
Supply and Inventory	Stock levels, resource utilization rates, replenishment lead times	Hospital, Regional distribution node	Defines system state variables for allocation and inventory decision modeling
Epidemiological	Case incidence rates, hospitalization counts, outbreak intensity indicators	County, State	Serves as leading indicators for spatiotemporal demand forecasting
Geospatial	Travel time matrices, facility density, catchment boundaries, transportation network accessibility	County, Regional service area	Defines accessibility constraints and service coverage limitations
Social Vulnerability	Social Vulnerability Index (SVI) themes, median income, insurance coverage rates	County	Generates equity weighting factors for vulnerability-adjusted demand estimation

Spatiotemporal Demand Forecasting

Demand was modeled using probabilistic time-series forecasting rather than deterministic point estimation. This decision was based on evidence that deterministic forecasts often underrepresent uncertainty during crisis volatility, potentially leading to systematic underallocation in high-risk regions (Dautel et al., 2024). For each location i and time t, demand was modeled using hybrid models combining autoregressive temporal patterns, epidemiological covariates such as infection growth rates, and lagged utilization signals.

This hybrid structure allowed the model to capture both baseline demand behavior and sudden surge dynamics. Prior field implementations of PPE forecasting in rural response settings have shown that integrating epidemiological signals with operational usage improves forecast reliability and replenishment timing (Price et al., 2024).

Geospatial Accessibility and Equity Weighting

To translate equity from a reporting metric into a decision input, baseline demand forecasts were adjusted using accessibility and vulnerability weighting. Accessibility was estimated using travel-time–based catchment modeling, while vulnerability weights were derived from normalized SVI scores. In practical terms, effective demand was scaled upward in high-vulnerability regions to reflect documented disparities in access and protection. The construction of vulnerability weights (including normalization and stratification) is summarized in Table 2.

Table 2. Equity Weighting Scheme

SVI Tier	Description	Weight wiw_iwi?
Low	Low vulnerability	1.0
Medium	Moderate vulnerability	1.2
High	High vulnerability	1.5

This approach aligns with empirical evidence showing that spatial accessibility and vulnerability are strongly associated with vaccination uptake and healthcare access outcomes, particularly in socially disadvantaged regions (Khazanchi et al., 2024; Alphonso et al., 2024). Conceptually, this step reframes equity from “who received less historically” to “who should receive more proactively under uncertainty.”

Reinforcement Learning–Based Allocation and Inventory Control

The allocation problem was formulated as a constrained Markov Decision Process (MDP), where system state included inventory levels, forecasted demand, replenishment lead times, and accessibility constraints. Actions represented allocation decisions and inter-regional transfers. The objective function minimized total operational cost while penalizing inequitable service outcomes using a tunable trade-off parameter.

Reinforcement learning was implemented using policy-based optimization, chosen for its ability to learn adaptive allocation strategies under nonstationary demand and delayed feedback conditions. This is particularly relevant in healthcare logistics, where today’s allocation decisions influence tomorrow’s shortages and access outcomes (Wu et al., 2025). At the same time, implementation followed conservative design principles recommended for high-risk healthcare RL deployment, including constrained action spaces and policy stability evaluation (Jayaraman et al., 2024).

The RL framework also drew conceptual inspiration from resource-sharing research demonstrating that dynamic redistribution policies can significantly reduce system-wide shortages and equipment requirements compared with static allocation strategies (Keyvanshokooh et al., 2024).

Evaluation Metrics and Validation

System performance was evaluated across three domains: efficiency (fill rates, logistics cost), resilience (cumulative shortage days, recovery speed), and equity (service-level parity across SVI strata). Comparisons were conducted against baseline rule-based allocation and static optimization benchmarks.

Robustness testing used scenario-based stress testing, including simulated demand surges and transportation disruptions. Scenario-based validation has been widely recommended for healthcare logistics evaluation because real-world crisis conditions rarely follow stationary or predictable patterns (Dey et al., 2024; Keyvanshokooh et al., 2024).

Closed-Loop Decision Learning

Perhaps the most important methodological feature is the continuous feedback structure. Forecast outputs update allocation decisions, allocation outcomes update system states, and updated states inform subsequent forecasting and policy learning cycles. This architecture reflects real-world public health decision environments, where allocation is not a one-time optimization but an ongoing adaptive process constrained by logistics, policy, and uncertainty (Hu et al., 2023; Wu et al., 2025).

In summary, the methodological design intentionally integrates forecasting, equity measurement, and adaptive allocation into a single operational learning system. The underlying premise—admittedly ambitious, but increasingly supported by emerging literature—is that equitable preparedness cannot be achieved through static planning alone, but instead requires systems capable of learning from evolving demand, vulnerability, and access conditions over time.

3. Results and Discussion

3.1 Demand forecasting performance and vulnerability effects

The first set of results—honestly, the ones that set the tone for everything else—came from the forecasting module. When we broke observed and predicted PPE demand into low-, medium-, and high-SVI strata, the pattern was not subtle. High-vulnerability areas did not simply have “more demand.” They had more volatile demand, and the uncertainty around that demand behaved differently too. In the forecasting plots (Figure 1), mean demand levels rose with vulnerability, but so did variance. The high-SVI series looked less like a smooth curve and more like a sequence of abrupt shifts—spikes, pullbacks, then another rise. That kind of shape matters operationally because it is precisely what breaks fixed replenishment schedules and static allocation formulas.

The uncertainty bands tell an even more consequential story. The 90% prediction intervals in Figure 1 widen as SVI increases, meaning the system is less confident about what will happen next in more socially vulnerable regions. At first, it is tempting to interpret that as “the model is worse in those areas.” But that framing misses the point. The more plausible interpretation is that the underlying demand process is genuinely more unstable in high-vulnerability settings—because of access frictions, facility constraints, mobility barriers, and episodic surges in need. That is consistent with the broader equity literature showing that vulnerable communities often experience less consistent access and greater disruption during emergencies (Alphonso et al., 2024; Khazanchi et al., 2024).

Figure 1. Observed versus predicted PPE demand across low- and high-SVI regions. Solid lines indicate observed demand; dashed lines indicate model forecasts. Shaded regions represent 90% prediction intervals, highlighting greater demand variability and uncertainty in high-SVI regions.

Figure 2. Comparison of average fill rates across SVI strata under baseline and equity-aware RL allocation models. The proposed model improves service levels in high-SVI regions while maintaining overall system performance.

This connects to a methodological argument that is easy to say but hard to truly operationalize: deterministic demand estimates can become actively misleading during crisis environments, particularly if uncertainty is not spatially homogeneous. Dautel et al. (2024) emphasize the reliability challenge of medical resource demand models in epidemic contexts; our results echo that caution, but in a very practical way. If the forecast uncertainty is systematically higher in high-SVI regions, then any allocation policy that uses a single global safety-stock logic—or assumes the same forecast error behavior everywhere—will predictably under-allocate to those high-SVI regions. In other words, this is not just a forecasting accuracy issue. It is an equity issue hiding inside model error structure (Figure 1; Table 4).

There is also a strong implementation implication here. Price et al. (2024) describe how forecasting was embedded into real replenishment activities in West Virginia under dynamic conditions; our findings reinforce why that embedding must remain probabilistic and iterative. If demand can change quickly and unpredictably—especially in vulnerable regions—then forecasting cannot be a one-time planning input. It has to be a living signal that continuously feeds allocation (Price et al., 2024). That logic, in a sense, is the foundation for why a closed-loop architecture is not “nice to have” but necessary (Table 5).

3.2 Equity outcomes of allocation policies

Once the forecasting behavior was clear, the allocation results became easier to interpret—and also harder to ignore. When we compared fill rates across low-, medium-, and high-SVI regions under baseline policies versus the proposed equity-aware RL policy, the baseline pattern was almost painfully familiar: fill rates declined as vulnerability increased (Table 6; Figure 2, Figure 6). In practical terms, the system served low-SVI regions better, more consistently, and with fewer interruptions. High-SVI regions, by contrast, bore a larger share of partial fulfillment, delayed replenishment, and shortage exposure.

This is not just an abstract computational finding. It mirrors what the pandemic revealed about PPE distribution inequities: shortages persisted unevenly across regions and facility types, not solely because the nation “did not have enough,” but because allocation and distribution failed to adapt fairly under constraints (Rubashkin et al., 2023). It also aligns with broader vaccination equity findings—where high-vulnerability counties experience compounded barriers to access and uptake, including travel burden and structural frictions (Khazanchi et al., 2024; Woolfork et al., 2024).

The proposed RL-based policy shifted that gradient. Under the equity-aware model, fill rates in high-SVI regions improved substantially while maintaining strong service levels in low- and medium-SVI regions (Table 6; Figure 2). Importantly, the improvement did not come from simplistic equalization (e.g., “everyone gets the same percentage”). Instead, the model learned state-dependent decisions—allocations that responded to evolving inventory, forecasted demand, lead times, and vulnerability-weighted service objectives. That distinction matters. It suggests the equity gain is not a moral appeal layered on top of logistics, but a learned operational behavior emerging from the objective and feedback loop.

One useful way to think about this is to contrast equity-as-reporting with equity-as-control. In many empirical equity analyses, vulnerability measures are used post hoc: we look at who had worse outcomes and then describe the disparity (Alphonso et al., 2024; Khazanchi et al., 2024; Woolfork et al., 2024). The model here takes a different posture. It uses vulnerability as a decision signal. That is a meaningful shift—from “measure the gap” to “control the gap.” In doing so, the results extend the operational evidence in Hu et al. (2023), where algorithmic allocation was used in King County to translate requests and constraints into distribution actions. Our findings suggest that when equity weighting and sequential learning are added to that kind of constrained allocation setting, disparities can be reduced without breaking feasibility (Hu et al., 2023) (Table 5).

5.3 System resilience and shortage mitigation

Equity outcomes are important, but in emergency logistics there is always a second question—sometimes an anxious one: “Okay, but does it make the system fragile?” The resilience results help answer that.

Cumulative shortage days over time (Figure 3) show a clear divergence between baseline allocation and RL-based reallocation. Baseline policies—especially those that are static or rule-based—accumulate shortage days steadily. Once shortages begin, they tend to persist. That makes intuitive sense: a plan-centric policy can be “correct” for a short horizon and still fail when conditions shift. It does not adapt quickly enough to break shortage momentum.

Figure 3. Cumulative shortage days over time under baseline and RL-based allocation policies. RL-based reallocation reduces shortage accumulation and improves system responsiveness under dynamic demand conditions.

Figure 4. Observed and forecasted resource demand across SVI tiers. Solid lines indicate observed demand; dashed lines indicate forecasts; shaded bands show 90% prediction intervals, demonstrating higher volatility in high-SVI regions.

The RL-based policy, by contrast, slowed shortage accumulation and often bent the curve downward relative to baseline (Figure 3; Table 3). The most telling aspect is not merely that shortages were lower at the end of the horizon, but that the RL policy intervened earlier—rebalancing resources before shortages became entrenched. This is where the closed-loop architecture becomes visible in outcomes (Figure 5). Because the agent observes evolving system states and learns from feedback, it can take anticipatory actions rather than reactive ones (Table 5).

These resilience gains align closely with the broader resource-sharing and dynamic rebalancing literature. Keyvanshokooh et al. (2024) show that data-driven resource sharing can reduce equipment needs and costs compared with non-sharing approaches while remaining compatible with real-time decision constraints. Our results suggest a similar logic holds for PPE-like resources: dynamic reallocation reduces shortage persistence because it treats the network as a coupled system rather than as isolated regions (Keyvanshokooh et al., 2024). From an operations perspective, this is exactly the kind of “strategic flexibility” that plan-centric optimization often struggles to capture under nonstationary shocks (Dey et al., 2024; Kiss & Elhedhli, 2024).

It is also worth noting that this is one of the places where RL’s conceptual fit becomes practical. Wu et al. (2025) emphasize that healthcare operations often involve sequential decisions and feedback; shortages are a textbook example of delayed consequences. The results here reinforce that if you treat allocation as a sequence—rather than a single optimization snapshot—you can reduce shortage accumulation over time (Wu et al., 2025).

3.4 Efficiency–equity trade-offs

There is a familiar objection to equity-aware allocation: “Fine, but it will cost more.” We took that concern seriously because emergency logistics operates under hard constraints—transportation capacity, storage limits, cold chain requirements (for vaccines), lead times, and sometimes the ability to acquire capacity from third parties (Kiss & Elhedhli, 2024). If equity-aware policies produce large cost penalties, they may be politically or operationally unacceptable even if ethically appealing.

The comparative results suggest a more nuanced reality. We observed substantial improvements in equity (fill-rate parity across SVI strata) and resilience (reduced shortage days) without an undue increase in overall logistics cost (Table 3). Some additional overhead existed—especially associated with reallocation/transshipment actions—but that overhead was offset by reduced shortage persistence and fewer “emergency-like” responses later in the horizon. Put differently, the system spent a bit more effort earlier to avoid paying much more later.

This is an important point because it undermines a simplistic “equity vs. efficiency” framing. The pandemic optimization literature—especially in vaccine and PPE logistics—has shown how fragile systems generate hidden costs when plans fail: expediting, emergency procurement, ad hoc redistribution, and service breakdowns (Dey et al., 2024). Woolfork et al. (2024) similarly emphasize that inequity is not just an ethical failure; it can be a structural driver of poorer population outcomes, which then feeds back into system strain. Our findings are consistent with that idea: when equity is integrated into decision-making early, the system may actually become more efficient in the broader sense—because it reduces crisis amplification loops (Dey et al., 2024; Woolfork et al., 2024).

3.5 Interpreting why the RL + geospatial approach worked

At this point, the question becomes: why did the proposed method perform better, beyond the fact that “RL is adaptive”? The answer is not just RL. It is the combination of (i) probabilistic forecasting, (ii) geospatial/equity weighting, and (iii) sequential policy learning.

First, probabilistic forecasts mattered because uncertainty was heterogeneously distributed (Figure 1; Table 4). If the model had produced point forecasts only, the allocation layer would have had less warning about surge risk in high-SVI areas. Second, geospatial equity modeling mattered because “need” is not purely epidemiological or purely inventory-based; it is mediated by access burdens. That is precisely what empirical work on vaccination access and uptake demonstrates—travel time and vulnerability alter realized coverage, not just theoretical supply (Khazanchi et al., 2024; Alphonso et al., 2024). Third, RL mattered because it converted these signals into adaptive rules—policies that change as the state changes, rather than plans that assume the future behaves like the past (Wu et al., 2025).

Jayaraman et al. (2024) caution that RL in healthcare requires careful implementation and evaluation due to high-stakes outcomes; our results support the view that when RL is constrained, policy-based, and embedded in realistic logistics constraints, it can produce credible operational improvements rather than unstable “black box” behavior (Jayaraman et al., 2024) (Table 5).

3.6 Practical and policy implications

From a policy standpoint, the findings point to a shift in how preparedness tools should be designed. The experience of PPE allocation during COVID-19 already suggested that allocation is a computational problem under policy constraints, not merely procurement (Hu et al., 2023). Our results extend that insight by showing that equity can be operationalized—explicitly—inside the computational problem.

This matters because inequity is not a side effect that disappears once total supply increases. Rubashkin et al. (2023) showed that PPE needs and shortages persisted unevenly across geographies and facility types, sometimes even after peak periods. Woolfork et al. (2024) similarly demonstrate that disparities persisted across time and place during the vaccination effort. These patterns imply that “more inventory” is not enough; the distribution logic must change too (Rubashkin et al., 2023; Woolfork et al., 2024).

In practical terms, the closed-loop architecture (Table 5) offers a blueprint for public health agencies: integrate demand forecasting with allocation; incorporate vulnerability and access measures as decision inputs; and update allocations repeatedly as conditions change. This is consistent with the broader movement in pandemic logistics toward decision systems that are policy-centric and adaptive, rather than horizon plan-centric (Dey et al., 2024; Wu et al., 2025).

3.7 Limitations and future work

A fair reading of these results also requires acknowledging boundaries. First, the analysis is regional. That is valuable for national preparedness, but it can hide within-region inequities—differences between facilities, neighborhoods, or subpopulations. Second, while SVI and accessibility metrics are powerful, they are still proxies. There are equity dimensions that are harder to quantify—trust, language barriers, documentation concerns, and local infrastructure reliability—that may influence realized access. Third, RL in high-stakes settings raises governance and accountability questions. Jayaraman et al. (2024) emphasize the importance of careful evaluation and deployment; translating a policy model into real public health workflows would require transparent objectives, auditable constraints, and stakeholder oversight (Jayaraman et al., 2024).

Future work could extend the framework in three directions. One is facility-level or even sub-county modeling, which would align with the “last-mile equity” emphasis in geospatial work (Alphonso et al., 2024; Khazanchi et al., 2024). A second direction is multi-resource allocation (PPE, drugs, vaccines simultaneously), which introduces coupling and substitution complexities. A third direction is deeper integration of capacity acquisition uncertainty—especially relevant when storage, routing, or third-party capacity can be acquired under disruption (Kiss & Elhedhli, 2024).

3.8 Summary of the evidence

Pulling the threads together: the results show that (1) demand uncertainty is spatially structured and correlated with vulnerability (Figure 1; Table 4), (2) baseline allocation policies reproduce vulnerability gradients in service levels (Figure 2; Table 6), (3) equity-aware RL policies narrow those gaps while maintaining strong overall fill rates (Figure 2; Table 6), and (4) the same policies improve resilience by limiting cumulative shortage days over time (Figure 3; Table 3). Taken together, the findings support the central claim that equitable preparedness is not simply a matter of higher stockpiles—it is a matter of learning, adapting, and explicitly treating equity as a decision objective rather than a post-hoc report (Hu et al., 2023; Wu et al., 2025; Woolfork et al., 2024).

Table 3. Performance Metrics

Category	Metric	Definition
Efficiency	Fill rate	Proportion of demand satisfied
Efficiency	Cost	Transportation and holding cost
Resilience	Shortage days	Days with unmet demand
Equity	SVI-parity index	Service level variance across SVI tiers

Table 4. Baseline descriptive statistics for critical resource demand, supply chain timing, and equity-related variables. Results demonstrate significant demand heterogeneity and access inequality, supporting probabilistic forecasting and equity-aware allocation approaches (Dautel et al., 2024; Khazanchi et al., 2024).

Variable	Mean	Std. Dev.	Min	Max
Daily PPE demand (units)	1,420	615	180	4,950
Vaccine demand (doses/day)	860	402	95	3,210
Drug replenishment demand (units/day)	1,105	530	140	3,880
Average lead time (days)	6.8	2.1	2	14
County SVI score	0.52	0.21	0.08	0.96
Avg. travel time to facility (minutes)	27.4	11.9	6.3	68.7

Table 5. Comparative Performance: Baseline vs. Proposed Framework

Metric	Rule-Based	Static Optimization	Proposed RL-Equity
Avg. fill rate (%)	82.6	88.9	95.4
Shortage days (avg.)	18.2	11.6	4.9
High-SVI fill rate (%)	71.4	80.3	93.1
Equity gap (fill-rate variance)	0.142	0.081	0.019
Avg. logistics cost ($M)	12.4	11.1	11.6

Table 6. Scenario Stress-Test Results

Scenario	Shortage Reduction (%)	Equity Gap Reduction (%)
Baseline demand	63.1	78.4
Pandemic surge (+40%)	54.7	69.2
Transport disruption	48.9	61.5

And perhaps the more uncomfortable takeaway is this: if forecast uncertainty and access barriers concentrate in vulnerable regions, then “neutral” allocation rules are not neutral. They will reliably miss the places where the system is least stable. The proposed framework—probabilistic forecasting, geospatial equity weighting, and RL-based sequential control—offers one practical path toward breaking that pattern.

4. Conclusion

This study demonstrates that equity in healthcare resource allocation can be operationalized using integrated predictive–prescriptive analytics. Results indicate that demand uncertainty is spatially structured and closely associated with social vulnerability, challenging uniform allocation strategies. The proposed framework, combining probabilistic forecasting, geospatial accessibility modeling, and reinforcement learning–based allocation, improved service levels in high-vulnerability regions while reducing cumulative shortages and maintaining overall system efficiency. These findings suggest that equity and system resilience may be complementary rather than competing objectives in emergency supply chain management. However, implementation requires strong governance, transparency, and ethical oversight. Future research should evaluate finer geographic resolution, multi-resource allocation environments, and real-world deployment feasibility. Overall, integrating equity directly into allocation decision models may improve preparedness performance and reduce structural disparities during public health emergencies.

Author contributions

A.H. conceptualized the study, developed the closed-loop reinforcement learning framework, led the modeling and analysis, and drafted the original manuscript. F.A. contributed to the system architecture design, reinforcement learning methodology, and interpretation of operational results. K.A.R. supported data integration, geospatial accessibility modeling, and validation of analytical outputs. S.H. assisted with literature review, social vulnerability integration, and critical revision of the manuscript. All authors reviewed, edited, and approved the final version of the manuscript and agree to be accountable for all aspects of the work.

Acknowledgment

The authors gratefully acknowledge Trine University, Angola, Indiana, USA, for providing academic support and research infrastructure for this study. The authors also thank colleagues from the Department of Business Analytics, Engineering Management, and the College of Graduate and Professional Studies for insightful discussions that strengthened the conceptual and methodological rigor of the work.

References

Alphonso, S. R., et al. (2024). Geospatially clustered low COVID-19 vaccine rates among adolescents in socially vulnerable US counties. Preventive Medicine Reports, 37, 102545. https://doi.org/10.1016/j.pmedr.2023.102545

Dautel, K., et al. (2024). Assessing the reliability of medical resource demand models (COVID-19 and related contexts). [Journal name not provided].

Dey, S., et al. (2024). Optimization modeling for pandemic vaccine supply chain management: A review and future research opportunities. Naval Research Logistics. https://doi.org/10.1002/nav.22181

Hu, A., Casey, D. C., Toyoji, M., Brown, A. T., & Elsenboss, C. (2023). A data-driven approach to allocating personal protective equipment during the COVID-19 pandemic in King County, Washington. Health Security, 21(2), 156–163. https://doi.org/10.1089/hs.2022.0115

Jayaraman, A., et al. (2024). A primer on reinforcement learning in medicine for clinicians. npj Digital Medicine, 7(1), 337. https://doi.org/10.1038/s41746-024-01316-0

Keyvanshokooh, E., et al. (2024). Mitigating the COVID-19 pandemic through data-driven resource sharing. Naval Research Logistics, 71(1), 41–63. https://doi.org/10.1002/nav.22117

Khazanchi, R., et al. (2024). Spatial accessibility and uptake of pediatric COVID-19 vaccinations by social vulnerability. Pediatrics, 154(2), e2024065938. https://doi.org/10.1542/peds.2024-065938

Kiss, J., & Elhedhli, S. (2024). Capacity acquisition and PPE distribution planning during the COVID-19 pandemic. Computers & Industrial Engineering, 187, 109715. https://doi.org/10.1016/j.cie.2023.109715

Price, B. S., et al. (2024). Maintaining healthcare capacity in rural America by replenishing personal protective equipment: The case from West Virginia. INFORMS Journal on Applied Analytics.

Rubashkin, M., et al. (2023). PPE needs in the United States during the COVID-19 pandemic: An analysis using the GetUsPPE online platform. Public Health Challenges, 2(1), e65. https://doi.org/10.1002/puh2.65

Woolfork, M. N., et al. (2024). A health equity science approach to assessing drivers of COVID-19 vaccination coverage disparities over the course of the COVID-19 pandemic, United States, December 2020–December 2022. Vaccine, 42(Suppl 3), 126158. https://doi.org/10.1016/j.vaccine.2024.126158

Wu, Q., Han, J., Yan, Y., Kuo, Y.-H., & Shen, Z.-J. M. (2025). Reinforcement learning for healthcare operations management: Methodological framework, recent developments, and future research directions. Health Care Management Science, 28(2), 298–333. https://doi.org/10.1007/s10729-025-09699-6

Yin, X., Bushaj, S., Yuan, Y., & Büyüktahtakin, I. E. (2024). COVID-19: Agent-based simulation-optimization to vaccine center location vaccine allocation problem. IISE Transactions, 56(7), 699–714. https://doi.org/10.1080/24725854.2023.2223246

Article metrics

View details

Downloads

Citations

240

Views

📥 PDF ▾

📖 Cite article

View Dimensions

View Plumx

View Altmetric

3
Save

0
Citation

240
View

0
Share

Business and Social Sciences

Article Contents

From Scarcity to Fairness: A Closed-Loop Reinforcement Learning and Geospatial Analytics Framework for Equitable Healthcare Supply Chain Preparedness

Abstract

1. Introduction

2. Methods

3. Results and Discussion

4. Conclusion

Author contributions

Acknowledgment

References

Stay connected