Classification of Japanese Documents and Ranking of Representative Documents by Using the Characteristic of the Frequencies of Words

Authors
Jun Kimura, Yasunari Yoshitomi, Masayoshi Tabuse
Corresponding Author
Jun Kimura
Available Online 1 December 2015.
DOI
https://doi.org/10.2991/jrnal.2015.2.3.10How to use a DOI?
Keywords
Clustering, Document classification, Extraction of representative document, Frequency of nouns.
Abstract
We developed a method for classification of Japanese documents and ranking of representative documents by using the characteristic of the frequencies of nouns. A representative document is defined as a document whose feature vector is the closest to the center of gravity of the class in the feature vector space among all documents belonging to the class belonging to the class. The ranking of representative documents is decided in descending order of the number of documents belonging to the class.

Copyright
© 2013, the Authors. Published by ALife Robotics Corp. Ltd
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).