Automatic Smart Crawling on Twitter for Weather Information in Indonesia
As a popular resource for analyzing social interactions and text data mining, Twitter utilization is facing an automation problem in collecting Twitter users’ geolocation. To surpass this problem, the research proposes Support Vector Machine (SVM) model that can be used to automatically design a smart crawling system on Twitter. Twint, a Python-based Twitter scraping program is utilized to perform data crawling based on keywords related to the weather in Indonesia. Null-geolocations are filled toward using aliases generated based on Indonesians’ behavior of reporting about Indonesia’s location in Twitter tweets. The accuracy of the outcomes of automated smart crawling using the SVM model is 85%.
Authors:
Kartika Purwandari, Reza Bayu Perdana, Join W.C. Sigalingging, Reza Rahutomo, and Bens Pardamean
8th International Conference on Computer Science and Computational Intelligence, ICCSCI 2023