THE IMPACT OF DEMOGRAPHIC COMPOSITION IN UTKFACE AND APPA-REAL DATASETS ON FAIRNESS IN AGE ESTIMATION MODELS
Abstract
This paper analyzes the impact of demographic composition on fairness in facial age estimation models trained on the UTKFace and APPA-REAL datasets. Building on previously published empirical results, the study provides a theoretical and analytical interpretation of how dataset imbalance affects model bias. Through comparative evaluation of group-wise performance metrics including Mean Absolute Error, Standard Deviation, Disparate Impact, and Equality of Opportunity, the paper introduces the concept of a Distributional Fairness Baseline (DFB) as a diagnostic framework for separating dataset-driven bias from model-induced bias. The analysis reveals that fairness is primarily a function of the representativeness and internal structure of training data, rather than model architecture. Contrary to common assumptions, full data equalization through oversampling does not necessarily enhance equity and may even amplify disparities due to overfitting and redundancy. Instead, moderate redistribution, particularly controlled undersampling of dominant groups often achieves an optimal balance between accuracy and fairness. These findings emphasize that equitable model performance depends on both quantitative and qualitative diversity within datasets, establishing data design as the central determinant of fairness in automated age estimation systems.
References
Agustsson, E., Timofte, R., Escalera, S., Baro, X., Guyon, I., & Rothe, R. (2017). Apparent and real age estimation in still images with deep residual regressors on appa-real database. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Washington DC, USA, pp. 87–94. https://doi.org/10.1109/FG.2017.20
Albiero, V., & Bowyer, K. W. (2020). Is face recognition sexist? no, gendered hairstyles and biology are. arXiv, arXiv:2008.06989.
Albiero, V., Zhang, K., & Bowyer, K. W. (2020). How does gender balance in training data affect face recognition accuracy? IEEE International Joint Conference on Biometrics (IJCB), Houston, USA, pp. 1-10. https://doi.org/10.1109/IJCB48548.2020.9304924
Angulu, R., Tapamo, J. R., & Adewumi, A. O. (2018). Age estimation via face images: a survey. EURASIP Journal on Image and Video Processing, 2018(1), 1-35. https://doi.org/10.1186/s13640-018-0278-6
Branco, P., Torgo, L., & Ribiero, R. P. (2016). A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR), 49(2), 1-50. https://doi.org/10.1145/2907070
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. 1st Conference on fairness, accountability and transparency, New York, USA, pp. 77-91.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16(1), 321-357. https://doi.org/10.1613/jair.953
Clapés, A., Bilici, O., Temirova, D., Avots, E., Anbarjafari, G., & Escalera, S. (2018). From apparent to real age: gender, age, ethnic, makeup, and expression bias analysis in real age estimation. IEEE conference on computer vision and pattern recognition workshops, Salt Lake City, USA, pp. 2373-2382.
Dey, P., Mahmud, T., Chowdhury, M. S., Hosssain, M. S., & Andersson, K. (2024). Human Age and Gender Prediction from Facial Images Using Deep Learning Methods. The 15th International Conference on Ambient Systems, Networks and Technologies (ANT), Hasselt, Belgium, pp. 314-321.
Hassanpour, A., Kowsari, Y., Shahreza, H. O., Yang, B., & Marcel, S. (2024). Chatgpt and Biometrics: an Assessment of Face Recognition, Gender Detection, and Age Estimation Capabilities. IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, pp. 3224-3229. https://doi.org/10.1109/ICIP51287.2024.10647924
Hasib, K. M., Iqbal, M. S., Shah, F. M., Mahmud, J. A., Popel, M. H., Showrov, M. I. H., … & Rahman, O. (2020). A survey of methods for managing the classification and solution of data imbalance problem. arXiv, arXiv:2012.11870
Jacques, J.C.S., Ozcinar, C., Marjanovic, M., Baró, X., Anbarjafari, G., & Escalera, S. (2019). On the effect of age perception biases for real age regression. IEEE International Conference on Automatic Face & Gesture Recognition, Lille, France, pp. 1-8. https://doi.org/10.1109/FG.2019.8756595
Kärkkäinen, K., Joo, J. (2019). Fairface: Face attribute dataset for balanced race, gender, and age. arXiv, arXiv:1908.04913
Khan, H., Perperoglou, A., & Majeed, H. (2020). A survey of methods for managing the classification and solution of data imbalance problem. arXiv, arXiv:2012.11870
Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS international transactions on computer science and engineering, 30(1), 25-36.
Narayan, K., Vibashan, V. S., Chellappa, R., & Patel, V. M. (2025). FaceXFormer: A Unified Transformer for Facial Analysis. IEEE International Conference on Computer Vision (ICCV), Honolulu, Hawaii, pp. 11369-11382.
Michalski, D., Yiu, S. Y., & Malec, C. (2018). The impact of age and threshold variation on facial recognition algorithm performance using images of children. IEEE International conference on biometrics (ICB), Gold Coast, Australia, pp. 217-224. https://doi.org/10.1109/ICB2018.2018.00041
Oladipo, O., Omidiora, E. O., & Osamor, V. C. (2024). Comparative analysis of features extraction techniques for black face age estimation. AI & Soc, 39(1), 1769-1783.
Paplhám, J., & Franc, V. (2024). A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, pp. 1196-1205.
Panić, N., Marjanović, M., & Bezdan, T. (2024). Addressing Demographic Bias in Age Estimation Models through Optimized Dataset Composition. Mathematics, 12(15), 2358. https://doi.org/10.3390/math12152358
Puc, A., Štruc, V., & Grm, K. (2021). Analysis of race and gender bias in deep age estimation models. 28th European Signal Processing Conference (EUSIPCO), Amsterdam, Netherlands, pp. 830-834.
Ramyachitra, D., & Manikandan, P. (2014). Imbalanced dataset classification and solutions: a review. International Journal of Computing and Business Research (IJCBR), 5(4), 1-29.
Shou, Y., Cao, X., Liu, H., & Meng, D. (2025). Masked contrastive graph representation learning for age estimation. Pattern Recognition, 158, https://doi.org/10.1016/j.patcog.2024.110974
Srinivas, N., Ricanek, K., Michalski, D., Bolme, D. S., & King, M. (2019). Face recognition algorithm bias: Performance differences on images of children and adults. IEEE/CVF conference on computer vision and pattern recognition workshops, Long Beach, USA.
Terhörst, P., Kolf, J. N., Huber, M., Kirchbuchner, F., Damer, N., Moreno, A. M., … & Kuijper, A. (2021). A comprehensive study on face recognition biases beyond demographics. IEEE Transactions on Technology and Society, 3(1), 16-30.
Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, USA, pp. 1521-1528.
Voigt, P., & Von dem Bussche, A. (2017). The eu general data protection regulation (gdpr). A practical guide, 1st ed., Cham: Springer International Publishing, 10(3152676), 10-5555.
Xing, J., Li, K., Hu, W., Yuan, C., & Ling, H. (2017). Diagnosing deep learning models for high accuracy age estimation from a single image. Pattern Recognition, 66(1), 106-116. https://doi.org/10.1016/j.patcog.2017.01.005
