Interpretable Machine Learning Data Modeling&nbsp;for Liver Disease Risk Profiling: Insights from the Indian Liver Patient Dataset

Kamruzzaman Mithu; Shahanara Begum; Md Nesar Uddin; Mohammad Nurul Huda

doi:10.25163/data.1110695

Data Modeling

Mathematical sciences

Citations

Views

Articles

Submit

Volume 1 Number 1 2026

RESEARCH ARTICLE (Open Access)

Previous Contents Vol 1 (1)

Interpretable Machine Learning Data Modeling for Liver Disease Risk Profiling: Insights from the Indian Liver Patient Dataset

Abstract 1. Introduction 2. Methodology 3. Result and Discussion 4. Limitations 5. Conclusion References

Kamruzzaman Mithu ¹*, Shahanara Begum ², Md Nesar Uddin ¹, Mohammad Nurul Huda ¹

+ Author Affiliations

Data Modeling 1 (1) 1-8 https://doi.org/10.25163/data.1110695

Submitted: 29 December 2025 Revised: 10 March 2026 Accepted: 16 March 2026 Published: 18 March 2026

Abstract

Liver disease remains a significant global health concern, yet identifying subtle or latent risk factors from clinical data continues to be challenging. In this study, we sought to explore these less obvious patterns using the Indian Liver Patient Dataset (ILPD), a widely used benchmark dataset comprising 583 records, including both liver and non-liver cases. Rather than focusing solely on predictive accuracy, we aimed to balance performance with interpretability an aspect that is, perhaps, equally critical in clinical contexts. The dataset was carefully preprocessed, including label encoding of categorical variables and normalization of continuous features. Multiple supervised machine learning models were evaluated to determine the most suitable approach. Among them, Logistic Regression emerged as the most consistent performer, achieving a test accuracy of approximately 71%, while also providing probabilistic outputs conducive to clinical interpretation. To better understand the model’s decision-making process, SHAP (SHapley Additive exPlanations) was employed for feature attribution. This analysis revealed that Total Proteins, Age, and Albumin were the most influential predictors of liver disease within the dataset. These findings align, to some extent, with established clinical indicators, lending credibility to the model’s outputs. Overall, this study demonstrates that interpretable machine learning can offer meaningful insights into liver disease risk while maintaining transparency. By translating predictions into individualized risk profiles, the approach supports more informed and human-centric healthcare decisions, aligning with emerging Industry 5.0 principles.

Keywords: Liver disease prediction, Machine learning, SHAP interpretability, Risk factor analysis.

References

Dritsas, E., & Trigka, M. (2023). Supervised machine learning models for liver disease risk prediction. Computers, 12(1), 19.

Ding, H., Fawad, M., Xu, X., & Hu, B. (2022). A framework for identification and classification of liver diseases based on machine learning algorithms. Frontiers in Oncology, 12, 1048348.

Khan, R. A., Luo, Y., & Wu, F.-X. (2022). Machine learning based liver disease diagnosis: A systematic review. Neurocomputing, 468, 492–509.

Stenwig, E., Salvi, G., Rossi, P. S., & Skjærvold, N. K. (2022). Comparative analysis of explainable machine learning prediction models for hospital mortality. BMC Medical Research Methodology, 22(1), 1–14.

Amann, J., Blasimme, A., Vayena, E., Frey, D., & Madai, V. I. (2020). Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20(1), 1–9.

Singh, G., Agarwal, C., & Gupta, S. (2022). Detection of liver disease using machine learning techniques: A systematic survey. In International Conference on Emerging Technologies in Computer Engineering (pp. 39–51). Springer.

Mostafa, F., Hasan, E., Williamson, M., & Khan, H. (2021). Statistical machine learning approaches to liver disease prediction. Livers, 1(4), 294–312.

Fathi, M., Nemati, M., Mohammadi, S. M., & Abbasi-Kesbi, R. (2020). A machine learning approach based on SVM for classification of liver diseases. Biomedical Engineering: Applications, Basis and Communications, 32(3), 2050018.

Jin, H., Kim, S., & Kim, J. (2014). Decision factors on effective liver patient data prediction. International Journal of Bio-Science and Bio-Technology, 6(4), 167–178.

Weng, S., Hu, D., Chen, J., Yang, Y., & Peng, D. (2023). Prediction of fatty liver disease in a Chinese population using machine-learning algorithms. Diagnostics, 13(6), 1168.

Chen, Y.-Y., Lin, C.-Y., Yen, H.-H., Su, P.-Y., Zeng, Y.-H., Huang, S.-P., & Liu, I.-L. (2022). Machine-learning algorithm for predicting fatty liver disease in a Taiwanese population. Journal of Personalized Medicine, 12(7), 1026.

Su, P.-Y., Chen, Y.-Y., Lin, C.-Y., Su, W.-W., Huang, S.-P., & Yen, H.-H. (2023). Comparison of machine learning models and the fatty liver index in predicting lean fatty liver. Diagnostics, 13(8), 1407.

Wu, C.-C., Yeh, W.-C., Hsu, W.-D., Islam, M. M., Nguyen, P. A. A., Poly, T. N., Wang, Y.-C., Yang, H.-C., & Li, Y.-C. J. (2019). Prediction of fatty liver disease using machine learning algorithms. Computer Methods and Programs in Biomedicine, 170, 23–29.

Ramana, B., & Venkateswarlu, N. (2012). ILPD (Indian Liver Patient Dataset). UCI Machine Learning Repository. https://doi.org/10.24432/C5D02C

Data Modeling

Article Contents

Interpretable Machine Learning Data Modeling for Liver Disease Risk Profiling: Insights from the Indian Liver Patient Dataset

Abstract

References

Recommended articles

Stay connected