Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Isi Artikel Utama

Oscar Karnalim

Abstrak

Byte code as information source is a novel approach which enable Java archive search engine to be built without relying on another resources except the Java archive itself [1]. Unfortunately, its effectiveness is not considerably high since some relevant documents may not be retrieved because of vocabulary mismatch. In this research, a vector space model (VSM) is extended with semantic relatedness to overcome vocabulary mismatch issue in Java archive search engine. Aiming the most effective retrieval model, some sort of equations in retrieval models are also proposed and evaluated such as sum up all related term, substituting non-existing term with most related term, logaritmic normalization, context-specific relatedness, and low-rank query-related retrieved documents. In general, semantic relatedness improves recall as a tradeoff of its precision reduction. We also proposed a scheme to take the advantage of relatedness without affected by its disadvantage (VSM + considering non-retrieved documents as low-rank retrieved documents using semantic relatedness). This scheme assures that relatedness score should be ranked lower than standard exact-match score. This scheme yields 1.754% higher effectiveness than our standard VSM.

Unduhan

Data unduhan belum tersedia.

Rincian Artikel

Cara Mengutip
[1]
O. Karnalim, “Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine”, JuTISI, vol. 1, no. 2, Agu 2015.
Bagian
Articles