Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity

Isi Artikel Utama

Kristian Adi Nugraha
Danny Sebastian

Abstrak

Social media is evidently the most popular platform compared to other web applications. Indonesians spend an average of 3 hours and 15 minutes every day to access social media, resulting in a substantial amount of information flow. Even though research on information retrieval with social media data is common, only an inconsiderable amount concentrate using social media content in the Indonesian language. Our research aims to form an Indonesian language topic dataset using social media data from Twitter. The methods used in this research include TF-IDF for data formation and cosine similarity to group the Twitter data. Based on the test we conducted, our system is able to produce a fairly accurate result with 64% as its most optimal percentage for the process of every 200 Tweets.

Unduhan

Data unduhan belum tersedia.

Rincian Artikel

Cara Mengutip
[1]
K. A. Nugraha dan D. Sebastian, “Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity”, JuTISI, vol. 4, no. 3, hlm. 376–386, Des 2018.
Bagian
Articles

Artikel paling banyak dibaca berdasarkan penulis yang sama