Abstract:
Spontaneous learner migration is an ongoing concern in South African schools, posing challenges to educational planning and resource allocation. This phenomenon refers to learners transitioning prematurely to alternative learning spaces without prior planning. It is more prevalent in rural provinces of South Africa, including Limpopo, where urban schools tend to be more affluent and well-resourced than rural schools. The complex interplay of personal, environmental, and socio-economic factors driving learner migration decisions often complicates predictive efforts, necessitating robust computational models for improved understanding and decision-making.
This study utilised a longitudinal dataset from the Limpopo Education Management Information System (EMIS) records, spanning ten years (2011–2020), and identified biographical and structural variables that influence learner migration. These variables were used to develop three theory-based learner-migration indices: likelihood of migration, reason for migration, and distance of migration. The study applies Cross-Industry Standard Process for Data Mining (CRISP-DM) to guide the technical data-mining process and Design Science Research (DSR) to provide a broader framework that positions learner-migration computational models as reusable artefacts for educational planners. This methodological framework was grounded in Ravenstein and Everett Lee’s theories of migration and Hein de Haas’s aspiration–capability framework.
Building on this methodological foundation, Feature Selection (FS) was performed using four techniques – Boruta, RPART, AdaBoost.M1, and J48 – to determine salient input features for the predictive models. Boruta demonstrated the most consistent feature importance scores, with a variance of 19.85 compared to 21.39 (RPART), 24.60 (AdaBoost.M1), and 24.60 (J48). The learner migration indices were optimised using the Social Ski-Driver (SSD) and Culture Algorithm. Both optimisers achieved commendable and comparable results, with the average F1 score metric for the three indices consistently surpassing 0.8 on a time series learner migration dataset spanning ten years. The CA-derived hyperparameter set was selected for the final model due to its low variance in the F1 score weights of the three indices and strong alignment with the Berger-Tal multidisciplinary framework's convergence principles on the exploration-exploitation trade-off.
While previous studies on learner migration have primarily emphasised external factors such as a school’s poverty ranking, curriculum performance, the language of instruction, and legislative frameworks as the sole drivers of migration, this study reveals that migration is also influenced by biographical factors such as learner age, gender, home language, and socio-economic status. These insights are not just academic findings but may have direct implications for educational policy development and resource allocation strategies, offering a balanced understanding of migration dynamics. The developed models and their indices and metrics may support education planners in responding proactively to learner migration challenges.