Predictive working tool for early identification of ‘at risk’ students
This paper explores the variables that may influence persistence or dropout of students at the Open Polytechnic. These include socio-demographic variables such as age, gender, ethnicity, education, work status, and disability as well as variables related to the study environment such as course faculty.
Authors
- Zlatko J. Kovačić, Associate Professor
Phone: + 64 4 913 5777
Email: Zlatko.Kovacic@openpolytechnic.ac.nz - John Steven Green, Senior Lecturer
Phone: + 64 4 913 5724
Email: John.Green@openpolytechnic.ac.nz
Date - July 2010
Executive summary
This paper explores the variables that may influence persistence or dropout of students at the Open Polytechnic. These include socio-demographic variables such as age, gender, ethnicity, education, work status, and disability as well as variables related to the study environment such as course faculty (School of Business, School of Information and Social Sciences and Workplace Learning and Development), programme (Bachelor of Business, Bachelor of Applied Science and Bachelor of Arts), level (Level 5, 6 and 7), block (Trimester 1, Trimester 2 and Trimester 3) and offer type (Distance, Blended and Online).
We sought to determine the extent to which data captured by the enrolment form could help us to identify future successful and unsuccessful students before the course began. This would enable us to provide guidance to students on their course choices and to be able to focus additional support on those students statistically more likely to fail. All too often students enrol on courses at a level too high for their current skills; find themselves at risk of failing.
Data from 2006 to 2009, covering over 19,400 enrolled students stored in the Open Polytechnic student management system was used to perform a quantitative analysis of study outcome. Using various data mining techniques the most important factors for student success were identified and typical profiles of successful and unsuccessful students were constructed. For the Open Polytechnic, the student most likely to be successful is European with University Entrance or an overseas qualification and female and will pass with a probability of 0.921. The student with the greatest number of indicators of failing are either Māori or Pacific Island studying a level 5 course in the Bachelor of Applied Science. They will fail with a probability of 0.751.
The empirical results show that the most important factors separating successful from unsuccessful students in order of importance, were: ethnicity, course level, secondary school qualification (highest level of achievement held from a secondary school), programme and age.
- Ethnicity is not something a student can change or an institution can influence, but advice on the most appropriate study options for that student, i.e. distance, online or contact study may be provided. Would some students be better served by studying in a contact institution, or if in a contact institution by distance?
- The factor course level is deceiving, as it might suggest that students studying a lower level course are more likely to succeed. In fact the reverse is true. Students on higher level courses who have already proven themselves in lower level courses are more likely to succeed, making this a relatively predictable result in much the same space as the third factor, secondary school qualification.
- Previous academic success is a strong indicator of future academic success and has been used in the UK by University Matriculation Boards for decades.
- Advising a student that a different degree programme might increase their chances of academic success appears to be fraught with difficulties but in a larger institution with more choices of programme it may be quite appropriate.
- Like ethnicity, age is not something a student or institution can modify but it should come as no surprise that younger less mature people have less motivation than older more mature people who generally have a higher motivation to succeed.
The implications of these results for academic and administrative staff are several. The implications of identifying a student as potentially unsuccessful must be considered. If the student is told how they have been categorised what effect might this have on their self-esteem and subsequent motivation? In tough economic times should the organisation refuse to enrol students statistically unlikely to pass the course? Or should they allocate further resources to support those students with no guarantee that this support will be effective?
Classification using discriminant functions was the most accurate overall but required the consideration of more factors and was less accurate in identifying ‘at risk’ students. The CART classification tree was the most accurate. Regardless of the method used, our results suggest that using enrolment data alone is only moderately successful in separating successful from unsuccessful students.
It is essential to recognise that while this model will effectively separate successful and unsuccessful students with a good level of accuracy the results are specific to the population analysed. The results would need to be regularly updated with each passing trimester and for a different student population. This would allow each unique student body to be modelled and a checklist used to identify potentially unsuccessful students prior to enrolment.
This study is limited in the three main ways that future research can perhaps address. Firstly, our research is based on enrolment data only. Leaving out other important factors (academic achievement, number of courses completed, motivation, financial aids, etc.) that may affect study outcome could distort results obtained with models used. For example, including the assignment mark after the submission of the first course assignment or even better a pre-entry test would probably improve the predictive accuracy of the models. To improve the model, more attributes could be included to obtain prediction models with lower misclassification errors. However, the model in this case would not be a tool for pre-enrolment, i.e. early identification of ‘at risk’ students.
Secondly, the time line should be included in the analysis. We would need to follow those students who failed the course and also transferees and withdrawal students. Some of them may re-enrol in one of the next semesters and might successfully complete the course at the second or third attempt. Tracking Fail and Lost students in subsequent semesters and tracking their study outcomes would help make modelling their behaviour more accurate.
Thirdly, from a methodological point of view an alternative to logistic regression and discriminant analysis should be considered. The prime candidate to be used with this data set is neural networks. We may also consider other classification tree models such as exhaustive CHAID, QUEST, random forest, and ensembles of models.
This work is published under the Creative Commons 3.0 New Zealand Attribution Non-commercial Share Alike Licence (BY-NC-SA). Under this licence you are free to copy, distribute, display and perform the work as well as to remix, tweak, and build upon this work noncommercially, as long as you credit the author/s and license your new creations under the identical terms.