Preview

Voprosy statistiki

Advanced search
Open Access Open Access  Restricted Access Subscription Access

Application of Random Forest Method to Identify Determinants of Wages

https://doi.org/10.34023/2313-6383-2025-32-5-30-36

Abstract

The article is devoted to the analysis of salary predictors based on data on vacancies published in the public domain on the Internet portal of the federal state information system of the Federal Service for Labor and Employment «Work in Russia». This article examines the determinants of job openings that determine salary levels and make them the highest-paying in the labor market. To identify these determinants, the author uses the random forest machine learning method.

The forecast quality of the random forest method is assessed using the mean absolute error metric, which is more easily interpreted in the context of the scale of the dependent variable used. In addition, the values of the mean square error and estimates of the coefficient of determination, which show the proportion of the variance of the dependent variable explained by the predictors, were calculated. To improve the accuracy of the model, hyperparameters were tuned using the RandomizedSearchCV algorithm, which allows selecting the best options from a large number of possibilities.

The results of the study showed that the characteristics that have the greatest impact on salary are the region in which the vacancy is open, and to a relatively lesser extent (but also significantly) the professional field and work schedule. The findings may be useful for understanding the factors that shape competitive salaries and for implementing effective support measures in the labor market.

About the Author

A. A. Salmina
Russian Presidential Academy of National Economy and Public Administration (RANEPA)
Russian Federation

Alla A. Salmina – Cand. Sci. (Sociol.), Senior Researcher, Center «Institute for Social Analysis and Forecasting», Institute for Applied Economic Studies

82, Vernadskogo Ave., Moscow, 119571

 



References

1. Martín I. et al. Salary Prediction in the IT Job Market with Few High-Dimensional Samples: A Spanish Case Study. International Journal of Computational Intelligence Systems. 2018;11(1):1192–1209. Available from: https://doi.org/10.2991/ijcis.11.1.90.

2. Wang Z., Sugaya S., Nguyen D. Salary Prediction Using Bidirectional-GRU-CNN Model. In: Proc. of the Annual Meeting of the Association for Natural Language Processing, F3-1, Nagoya, 13 March 2019. P. 292–295. Available from: https://www.anlp.jp/proceedings/annual_meeting/2019/pdf_dir/F3-1.pdf.

3. Chen J., Mao Sh., Yuan Q. Salary Prediction Using Random Forest with Fundamental Features. In: Proc. SPIE 12167, Third International Conference on Electronics and Communication; Network and Computer Technology (ECNCT 2021), 1216720 (7 March 2022). Available from: https://doi.org/10.1117/12.2628520.

4. Khokhlova O.A., Khokhlova A.N., ChoyzhalsanovaA.T. Development of an Algorithm to Analyze Vacancies in the Labor Market Based on Open-Source Data. Voprosy Statistiki. 2022;29(4):33–41. (In Russ.) Available from: https://doi.org/10.34023/2313-6383-2022-29-4-33-41.

5. Garafiev I.Z., Garafieva G.I. Typology of Engineering Vacancies: Differences in the Conditions of Employer Posting of Similar Vacancies. Management Accounting. 2024;8:426–432. (In Russ.)

6. Shamina L.K. et al. Regional Dispersion of Population Income: Analysis of Mathematical Approaches to Modeling. Natural-Humanitarian Studies. 2023;6(50):512–520. (In Russ.)

7. Dzhunkeev U. Forecasting Inflation in Russia Using Gradient Boosting and Neural Networks. Russian Journal of Money and Finance. 2024;83(1):53–76. (In Russ.)

8. Zarova E.V., Dubravskaya E.I. The Random Forest Method in the Study of the Impact of Macroeconomic Indicators of Regional Development on the Level of Informal Employment. Voprosy Statistiki. 2020;27(6):37–55. (In Russ.) Available from: https://doi.org/10.34023/2313-6383-2020-27-6-37-55.

9. Breiman L. Random Forests. Machine Learning. 2001;45:5–32. Available from: https://doi.org/10.1023/A:1010933404324.

10. Cutler A. Random Forests. In: Balakrishnan N. et al. (eds) Wiley StatsRef: Statistics Reference Online. 2014. Available from: https://doi.org/10.1002/9781118445112.stat06520.

11. Cutler A., Cutler D.R., Stevens J.R. Random Forests. In: Zhang C., Ma Y. (eds) Ensemble Machine Learning. New York: Springer; 2012. Available from: https://doi.org/10.1007/978-1-4419-9326-7_5.

12. Rashka S., Mirjalili V. Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2. Birmingham, Mumbai: Packt Publ.; 2017. 622 p. (Russ. ed.: Rashka S., Mirdzhalili V. Python i mashinnoe obuchenie. Mashinnoe i glubokoe obuchenie s ispol’zovaniem Python, scikit-learn i TensorFlow. 2-e izd. St. Petersburg: Dialektika Publ.; 2019. 656 p.)

13. Grass J. Data Science from Scratch. Sebastopol, CA: O’Reilly Media; 2015. 408 p. (Russ. ed.: Gras Dzh. Data Science. Nauka o dannykh s nulya. St. Petersburg: BHV Publ. House; 2020. 336 p.)

14. James G. et al. An Introduction to Statistical Learning with Applications in R. New York: Springer; 2013. 440 p. (Russ. ed.: Dzheims G. et al. Vvedenie v statisticheskoe obuchenie s primerami na yazyke R. Izd. 2-e, ispr. Moscow: DMK Press; 2017. 456 p.)


Review

For citations:


Salmina A.A. Application of Random Forest Method to Identify Determinants of Wages. Voprosy statistiki. 2025;32(5):30-36. (In Russ.) https://doi.org/10.34023/2313-6383-2025-32-5-30-36

Views: 44


ISSN 2313-6383 (Print)
ISSN 2658-5499 (Online)