Automatic Question Generation for Bahasa Indonesia Examination using CopyNet

In educational institutions, an educator is responsible for assessing the student’s knowledge grasp through examination. Creating exam questions, even the low-level factoid questions, is time-consuming, especially for inexperienced educators. Therefore, this study aims to create a sequence-to-sequence model using CopyNet by exploiting its copying mechanism advantage to automatically generate Bahasa Indonesia factoid questions to ease the educator’s burden. Indonesian records in the TyDi QA dataset are used as the model input. GRU and Bi-GRU are employed as the CopyNet encoder, while LSTM is used as the CopyNet decoder. The model that utilizes GRU as the encoder achieves BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L scores of 0.28, 0.19, 0.14, 0.1, and 0.32, respectively. Bi-GRU utilization as the model encoder achieves BLEU1, BLEU2, BLEU3, BLEU4, and ROUGE-L scores of 0.26, 0.17, 0.12, 0.09, and 0.30, respectively. Models using either encoder still achieve low scores. However, compared with the previous work, the result is still on par regarding the BLEU score. Further examination found that the generated questions do not adhere to semantic and syntactical correctness. Adding more records to the dataset and utilizing a more advanced architecture like CopyBERT are encouraged to improve the model performance in future work. Despite the result, this study has shown that CopyNet, primarily designed for text summarization or single-turn dialogue, can be tailored for factoid question generation.
Authors:
Matthew Martianus Henry, Gregorius Natanael Elwirehardja, Bens Pardamean
2024 9th International Conference on Computer Science and Computational Intelligence (ICCSCI)