Sentiment Classification Analysis On Phone Identity Card Data Leaks Issues On Twitter

Main Article Content

Muh Ichlasul Amal
Elsa Syafira Rahmasita
Edward Suryaputra
Nur Aini Rakhmawati


Technology developments bring great threats related to privacy and security of personal data. In September 2022, a data leak incident of 1.3 billion SIM card registration data containing user's personal data was uploaded on dark web. Indonesian people voice their opinion regarding this issue on Twitter. This study aims to find out the word distribution and sentiment classification analysis of public opinion on Twitter related to the issue. Sentiment classification analysis was carried out using a machine learning approach with four methods, namely Random Forest, Logistic Regression, Support-Vector Machine, and IndoBERT model. The four methods will be compared to see which model produces the best performance. From the crawling process, 957 tweets were obtained, of which 609 were labeled and trained using the four methods. From the data obtained, there is an imbalance between classes, where positive sentiment has a much smaller number than the rest. Some words that are often used in the tweet are SIM card, data SIM, bocor data, miliar data, and kominfo. The results of the model show that the Support-Vector Machine has the best performance with an f1-score of 0.81, followed by Random Forest of 0.78, IndoBERT of 0.76, and Logistic Regression of 0.74. Class imbalance and lack of training data make IndoBERT's performance lower when compared to other algorithms. The results of this study can be used by the authorities to evaluate policies in dealing with data security issues by listening to opinions from the Indonesian people.


Download data is not yet available.

Article Details

How to Cite
M. I. Amal, E. S. . Rahmasita, E. Suryaputra, and N. A. . Rakhmawati, “Sentiment Classification Analysis On Phone Identity Card Data Leaks Issues On Twitter”, JuTISI, vol. 8, no. 3, pp. 645 –, Dec. 2022.