Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles

Translated title of the contribution: Learning in the space of term clusters for a new textual data representation

Young min Kim, Jean François Pessiot, Massih Reza Amini, Patrick Gallinari

Research output: Contribution to conferencePaper

Abstract

We present in this paper an unsupervised learning method for dimensionality reduction of text data.This technique is based on the hypothesis that terms co-occuring in the same context with the same frequency are semantically related.On the basis of this hypothesis we first find term clusters using a classifiant version of the EM algorithm.Documents are then represented in the space of these term clusters.We evaluate this method on the task of document clustering and show the effectiveness of our approach on two standard classification collections of WebKB and Reuters.

Original languageFrench
Pages119-134
Number of pages16
Publication statusPublished - 2008 Dec 1
Event5th Conference on Information Retrieval and Applications, CORIA 2008 - Tregastel, France
Duration: 2008 Mar 122008 Mar 14

Conference

Conference
CountryFrance
CityTregastel
Period08/03/1208/03/14

    Fingerprint

Keywords

  • Document clustering
  • Term clustering
  • Unsupervised learning

Cite this

Kim, Y. M., Pessiot, J. F., Amini, M. R., & Gallinari, P. (2008). Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles. 119-134. Paper presented at 5th Conference on Information Retrieval and Applications, CORIA 2008, Tregastel, France.