Prefix-Suffix trees: A novel scheme for compact representation of large datasets

Radhika M. Pai, V. S. Ananthanarayana

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established.

Original languageEnglish
Title of host publicationPattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings
Pages316-323
Number of pages8
Volume4815 LNCS
Publication statusPublished - 2007
Event2nd International Conference on Pattern Recognition and Machine Intelligence, PReMI 2007 - Kolkata, India
Duration: 18-12-200722-12-2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4815 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Conference on Pattern Recognition and Machine Intelligence, PReMI 2007
CountryIndia
CityKolkata
Period18-12-0722-12-07

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Pai, R. M., & Ananthanarayana, V. S. (2007). Prefix-Suffix trees: A novel scheme for compact representation of large datasets. In Pattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings (Vol. 4815 LNCS, pp. 316-323). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4815 LNCS).