EVALUASI EKSTRAKSI FITUR KLASIFIKASI TEKS UNTUK PENINGKATAN AKURASI KLASIFIKASI MENGGUNAKAN NAIVE BAYES

  • Aji Priyambodo Institut Teknologi dan Bisnis Semarang
  • Prihati Prihati Institut Teknologi dan Bisnis Semarang
Keywords: Text classification, feature extraction, Count Vectorizer, Naive Bayes

Abstract

Classification is one of the most widely used techniques in machine learning. Text classification is the process of classifying data according to pre-determined groups or classes. Where in most cases, text classification uses labeled training data to obtain the rules used to classify test data into predefined groups. In this study, it is proposed to use CountVectorizer for Indonesian text classification which will be compared with TF-IDF Term Weighting and its three feature levels, namely Character Level, Word Level and N-gram Level as feature extraction which is implemented together with Naive Bayes classification and the BPPPTIndToEngCorpusHalfM dataset. To compare the classification performance, this study uses 10-Fold Cross Validation and Split Data using a ratio of 90:10, while to evaluate the accuracy of the authors using the F1-Score and AUC with the hope that this study will get good accuracy results so that it can be used as a reference to be developed using another method. The F1-Score accuracy obtained in this study was 0.93 and the AUC score was 0.95.

Published
2020-07-01
How to Cite
[1]
Aji Priyambodo and Prihati Prihati, “EVALUASI EKSTRAKSI FITUR KLASIFIKASI TEKS UNTUK PENINGKATAN AKURASI KLASIFIKASI MENGGUNAKAN NAIVE BAYES”, ELKOM, vol. 13, no. 1, pp. 159-175, Jul. 2020.