Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity
Main Article Content
Social media is evidently the most popular platform compared to other web applications. Indonesians spend an average of 3 hours and 15 minutes every day to access social media, resulting in a substantial amount of information flow. Even though research on information retrieval with social media data is common, only an inconsiderable amount concentrate using social media content in the Indonesian language. Our research aims to form an Indonesian language topic dataset using social media data from Twitter. The methods used in this research include TF-IDF for data formation and cosine similarity to group the Twitter data. Based on the test we conducted, our system is able to produce a fairly accurate result with 64% as its most optimal percentage for the process of every 200 Tweets.
Download data is not yet available.
How to Cite
K. A. Nugraha and D. Sebastian, “Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity”, JuTISI, vol. 4, no. 3, pp. 376–386, Dec. 2018.
This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial used, distribution and reproduction in any medium.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.