Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity

Kristian Adi Nugraha; Danny Sebastian

PDF

Published: Dec 21, 2018

Kristian Adi Nugraha

Fakultas Teknologi Informasi, Universitas Kristen Duta Wacana

Danny Sebastian

Fakultas Teknologi Informasi, Universitas Kristen Duta Wacana

Abstract

Social media is evidently the most popular platform compared to other web applications. Indonesians spend an average of 3 hours and 15 minutes every day to access social media, resulting in a substantial amount of information flow. Even though research on information retrieval with social media data is common, only an inconsiderable amount concentrate using social media content in the Indonesian language. Our research aims to form an Indonesian language topic dataset using social media data from Twitter. The methods used in this research include TF-IDF for data formation and cosine similarity to group the Twitter data. Based on the test we conducted, our system is able to produce a fairly accurate result with 64% as its most optimal percentage for the process of every 200 Tweets.

Downloads

Download data is not yet available.

How to Cite

[1]

K. A. Nugraha and D. Sebastian, “Pembentukan Dataset Topik Kata Bahasa Indonesia pada Twitter Menggunakan TF-IDF & Cosine Similarity”, JuTISI, vol. 4, no. 3, pp. 376–386, Dec. 2018.

Issue

Vol. 4 No. 3 (2018): JuTISI

Section

Articles

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial used, distribution and reproduction in any medium.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

Most read articles by the same author(s)