Dynamic weighted cluster-sampling: An optimized cohesive method for improving data quality in the context of big data

Benabderrahmane MOUTASSEM, Laouni DJAFRI, Abdelkader GAFOUR

Abstract

In the field of data mining, imbalanced big data has emerged as a critical challenge, characterized by a disproportionate distribution of classes within large datasets. This phenomenon often results in biased models that underperform on minority classes, compromising the overall effectiveness of predictive analytics. Standard machine learning algorithms may struggle to accurately classify underrepresented instances, leading to predictions that reflect majority class tendencies rather than the true underlying patterns. To effectively address these challenges, it is imperative to employ advanced methods. This work presents a novel hybrid approach designed to mitigate the challenges of imbalanced big data classification effectively by employing clustering and sampling methods. Our proposed approach aims to reduce data volume, enhance veracity (improving performance metrics), and accelerate execution time, all while preserving essential attributes and ensuring data reliability. The results demonstrate that our approach achieves superior accuracy, AUC, F1-score, and G-means metrics compared to scenarios lacking data balancing strategies. Furthermore, we evaluate our proposed method against current methods in the field using large imbalanced datasets. Notably, our method exhibits an impressive accuracy rate approaching 100%, with improvements ranging from 17% to 22% across all performance metrics assessed, thus underscoring its effectiveness in addressing the challenges associated with imbalanced big data classification.

Authors

Benabderrahmane MOUTASSEM
Laouni DJAFRI
laouni.djafri@univ-tiaret.dz (Primary Contact)
Abdelkader GAFOUR
MOUTASSEM, B. ., DJAFRI, L. ., & GAFOUR, A. . (2025). Dynamic weighted cluster-sampling: An optimized cohesive method for improving data quality in the context of big data. International Journal of Innovative Research and Scientific Studies, 8(3), 1703–1720. https://doi.org/10.53894/ijirss.v8i3.6878

Article Details

No Related Submission Found