A Systematic Literature Review of Distributed Data Warehouse Architectures

Authors

  • Lama AlOud Department of Software Engineering, King Saud University, Sudia Arabia
  • Ohoud Alharbi Department of Software Engineering, King Saud University, Sudia Arabia

DOI:

https://doi.org/10.54489/ijtim.v5i2.567

Keywords:

Federated Data Warehousing, Data Mesh, Data Lakehouse Architecture, Systematic Literature Review, Data Warehouse Architecture, Distributed Data Warehouse

Abstract

This research presents a systematic literature review (SLR) of distributed data warehouse (DDW) architectures, addressing challenges in governance, security, scalability, and real-time analytics. Conducted in accordance with PRISMA 2020 guidelines, the review synthesizes 29 peer-reviewed studies from 2020 to 2025. It identifies four major architectural themes: security-oriented, federated and data mesh–oriented, data lakehouse-based, and real-time/streaming-enabled architectures. These themes address recurring challenges such as data privacy, organizational autonomy, governance of diverse data types, and low-latency analytics. The review highlights the trend towards multi-paradigm designs that integrate multiple principles to balance autonomy, governance, performance, and security. Additionally, it outlines future research directions in autonomous architectures, AI-driven metadata management, and empirical evaluation of hybrid DDW models.

References

Pörtner, L., Möske, R., & Riel, A. (2023). Data Ecosystem for Industrial Product-Service Systems (IPS2) Based on a Decentralized Data Architecture. Procedia CIRP, 119, 1228–1233. DOI: https://doi.org/10.1016/j.procir.2023.02.190

https://doi.org/10.1016/j.procir.2023.02.190 Thantilage, R. D., Le-Khac, N.-A., & Kechadi, M.-T. (2023).

Healthcare data security and privacy in Data Warehouse architectures. Information Medical Unlocked, 39, 101270. https://doi.org/10.1016/j.imu.2023.101270 DOI: https://doi.org/10.1016/j.imu.2023.101270

Vestues, K., Hanssen, G. K., Mikalsen, M., Buan, T. A., & Conboy,

K. (2022). Agile Data Management in NAV: A Case Study. In V. Stray, K.-J. Stol, M. Paasivaara, & P. Kruchten (Eds.), Agile Processes in Software Engineering and Extreme Programming (Vol. 445, pp. 220–235). Springer International Publishing. https://doi.org/10.1007/978- 3-031-08169-9_14

Fugkeaw, S., & Hak, L. (2024). PPAC-CDW: A Privacy- Preserving Access Control Scheme With Fast OLAP Query and Efficient Revocation for Cloud Data Warehouse. IEEE Access, 12, 78743–78758. https://doi.org/10.1109/ACCESS.2024.3408221 DOI: https://doi.org/10.1109/ACCESS.2024.3408221

Bergers, J., Shi, Z., Korsmit, K., & Zhao, Z. (2021). DWH-DIM: A Blockchain Based Decentralized Integrity Verification Model for Data Warehouses. In 2021 IEEE International Conference on Blockchain (Blockchain) (pp. 221–228). IEEE. DOI: https://doi.org/10.1109/Blockchain53845.2021.00037

https://doi.org/10.1109/Blockchain53845.2021.0003 7

AlMeghari, M., Taha, S., Elmahdy, H., & Shen, X. (2021). A proposed authentication and group-key distribution model for data warehouse signature, DWS framework. Egyptian Informatics Journal, 22(3), 245–255. https://doi.org/10.1016/j.eij.2020.09.002 DOI: https://doi.org/10.1016/j.eij.2020.09.002

Butakova, M. A., Chernov, A. V., Savvas, I. K., & Garani, G. (2020). Data Warehouse Design for Security Applications Using Distributed Ontology-Based Knowledge Representation. In I. Kotenko, C. Badica, V. Desnitsky, D. El Baz, & M. Ivanovic (Eds.), Intelligent Distributed Computing XIII (Vol. 868, pp. 140–145). Springer International Publishing. https://doi.org/10.1007/978-3-030- 32258-8_16 DOI: https://doi.org/10.1007/978-3-030-32258-8_16

Vadim, B., Dmitry, K., & Alexander, M. (2020). Intelligent Information Search Method Based on a Compositional Ontological Approach. In S. O. Kuznetsov, A. I. Panov, &

K. S. Yakovlev (Eds.), Artificial Intelligence (Vol. 12412, pp. 371–381). Springer International Publishing. https://doi.org/10.1007/978-3-030-59535-7_27 DOI: https://doi.org/10.1007/978-3-030-59535-7_27

Barnes, C., et al. (2022). The Biomedical Research Hub: A Federated Platform for Patient Research Data. Journal of the American Medical Informatics Association, 29(4), 619–625. https://doi.org/10.1093/jamia/ocab247 DOI: https://doi.org/10.1093/jamia/ocab247

Loukiala, A., Joutsenlahti, J.-P., Raatikainen, M., Mikkonen, T., & Lehtonen, T. (2021). Migrating from a Centralized Data Warehouse to a Decentralized Data Platform Architecture. In L. Ardito, A. Jedlitschka, M. Morisio, & M. Torchiano (Eds.), Product-Focused Software Process Improvement (Vol. 13126, pp. 36–48). Springer International Publishing. https://doi.org/10.1007/978- 3-030-91452-3_3 DOI: https://doi.org/10.1007/978-3-030-91452-3_3

Rosenau, L., & Ingenerf, J. (2024). Structured Queries to AQL: Querying OpenEHR Data Leveraging a FHIR-Based Infrastructure for Federated Feasibility Queries. Studies in Health Technology and Informatics. https://doi.org/10.33/SHTI230922 DOI: https://doi.org/10.3233/SHTI230922

Ghane, K. (2020). Big Data Pipeline with ML-Based and Crowd Sourced Dynamically Created and Maintained Columnar Data Warehouse for Structured and Unstructured Big Data. In 2020 3rd International Conference on Information and Computer Technologies (ICICT) (pp. 60–67). IEEE. DOI: https://doi.org/10.1109/ICICT50521.2020.00018

https://doi.org/10.1109/ICICT50521.2020.00018 Rossini, E., Bicocchi, N., Hadjidimitriou, N. S., Pietri, M., Picone,

M., & Mamei, M. (2024). Towards a Distributed Data Mesh Model for the IoT-Edge-Cloud Continuum in Smart Cities. In 2024 IEEE/ACM Symposium on Edge Computing (SEC) (pp. 383–388). IEEE. https://doi.org/10.1109/SEC62691.2024.00041 DOI: https://doi.org/10.1109/SEC62691.2024.00041

Ryffel, T., et al. (2025). Federated Analysis With Differential Privacy in Oncology Research: Longitudinal Observational Study Across Hospital Data Warehouses. JMIR Medical Informatics, 13, e59685–e59685. https://doi.org/10.2196/59685 DOI: https://doi.org/10.2196/59685

Silva, D., et al. (2024). Review of open-source software for developing heterogeneous data management systems for bioinformatics applications. Bioinformatics Advances, 5(1), vbaf168. https://doi.org/10.1093/bioadv/vbaf168 DOI: https://doi.org/10.1093/bioadv/vbaf168

Harby, A. A., & Zulkernine, F. (2025). Data Lakehouse: A Survey and Experimental Study. Information Systems, 127, 102460. https://doi.org/10.1016/j.is.2024.102460 DOI: https://doi.org/10.1016/j.is.2024.102460

Levandoski, J., et al. (2024). BigLake: BigQuery’s Evolution toward a Multi-Cloud Lakehouse. In Companion of the 2024 International Conference on Management of Data (pp. 334–346). ACM.

https://doi.org/10.1145/3626246.3653388 DOI: https://doi.org/10.1145/3626246.3653388

Kalmuk, D., et al. (2024). Native Cloud Object Storage in Db2 Warehouse: Implementing a Fast and Cost-Efficient Cloud Storage Architecture. In Companion of the 2024 International Conference on Management of Data (pp. 188–200). ACM.

https://doi.org/10.1145/3626246.3653393 DOI: https://doi.org/10.1145/3626246.3653393

Tripathi, A., Waqas, A., Venkatesan, K., Yilmaz, Y., & Rasool, G. (2024). Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets. Sensors, 24(5), 1634.

https://doi.org/10.3390/s24051634 DOI: https://doi.org/10.3390/s24051634

Downloads

Published

2026-01-19

How to Cite

[1]
“A Systematic Literature Review of Distributed Data Warehouse Architectures”, Int. J. TIM, vol. 5, no. 2, pp. 82–89, Jan. 2026, doi: 10.54489/ijtim.v5i2.567.

Similar Articles

1-10 of 35

You may also start an advanced similarity search for this article.