EVALUASI EKSTRAKSI FITUR KLASIFIKASI TEKS UNTUK PENINGKATAN AKURASI KLASIFIKASI MENGGUNAKAN NAIVE BAYES

Aji Priyambodo; Prihati Prihati

doi:10.51903/elkom.v13i1.277

Aji Priyambodo Institut Teknologi dan Bisnis Semarang
Prihati Prihati Institut Teknologi dan Bisnis Semarang

DOI: https://doi.org/10.51903/elkom.v13i1.277

Keywords: Text classification, feature extraction, Count Vectorizer, Naive Bayes

Abstract

Classification is one of the most widely used techniques in machine learning. Text classification is the process of classifying data according to pre-determined groups or classes. Where in most cases, text classification uses labeled training data to obtain the rules used to classify test data into predefined groups. In this study, it is proposed to use CountVectorizer for Indonesian text classification which will be compared with TF-IDF Term Weighting and its three feature levels, namely Character Level, Word Level and N-gram Level as feature extraction which is implemented together with Naive Bayes classification and the BPPPTIndToEngCorpusHalfM dataset. To compare the classification performance, this study uses 10-Fold Cross Validation and Split Data using a ratio of 90:10, while to evaluate the accuracy of the authors using the F1-Score and AUC with the hope that this study will get good accuracy results so that it can be used as a reference to be developed using another method. The F1-Score accuracy obtained in this study was 0.93 and the AUC score was 0.95.