Abstract :

This study evaluated the performance of machine learning algorithms in classifying mental workload among 74 undergraduate students (aged 19–22) during a controlled multitasking experiment conducted on the LabintheWild platform. Immediately following the task, participants self-reported their perceived workload using the NASA Task Load Index (NASA-TLX), rating six dimensions-Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration-on a 0–20 scale. Using stratified 10-fold cross-validation, K-Nearest Neighbors (KNN) achieved the highest accuracy (88.67%) and recall (90.67%), outperforming Naive Bayes (84.67%) and Extra Trees (80.67%). One-way ANOVA revealed that Mental Demand (F = 30.58, p < 0.001), Effort, Performance, and Frustration significantly differentiated workload levels, while Physical and Temporal Demand did not (p > 0.05). The study addresses a critical research gap: comparative evaluation of multiple ML classifiers on NASA-TLX data in multitasking contexts among young adults. KNN demonstrated strong generalization (learning curve convergence) and robustness to small datasets. These findings validate NASA-TLX as a reliable input for automated workload classification, with applications in adaptive learning systems, task allocation, and user experience optimization.