Prefix-Suffix trees

A novel scheme for compact representation of large datasets

Radhika M. Pai, V. S. Ananthanarayana

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established.

Original languageEnglish
Title of host publicationPattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings
Pages316-323
Number of pages8
Volume4815 LNCS
Publication statusPublished - 2007
Event2nd International Conference on Pattern Recognition and Machine Intelligence, PReMI 2007 - Kolkata, India
Duration: 18-12-200722-12-2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4815 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Conference on Pattern Recognition and Machine Intelligence, PReMI 2007
CountryIndia
CityKolkata
Period18-12-0722-12-07

Fingerprint

Suffix Tree
Prefix
Large Data Sets
Data mining
Data Mining
Optical character recognition
Clustering algorithms
Decision making
Cluster Analysis
Decision Making
Numeral
Clustering Algorithm
benzoylprop-ethyl
Datasets
Efficacy
Abstraction
Requirements

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Pai, R. M., & Ananthanarayana, V. S. (2007). Prefix-Suffix trees: A novel scheme for compact representation of large datasets. In Pattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings (Vol. 4815 LNCS, pp. 316-323). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4815 LNCS).
Pai, Radhika M. ; Ananthanarayana, V. S. / Prefix-Suffix trees : A novel scheme for compact representation of large datasets. Pattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings. Vol. 4815 LNCS 2007. pp. 316-323 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{513d1d136e524a819b3fdf4b93ffd9fd,
title = "Prefix-Suffix trees: A novel scheme for compact representation of large datasets",
abstract = "An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established.",
author = "Pai, {Radhika M.} and Ananthanarayana, {V. S.}",
year = "2007",
language = "English",
isbn = "3540770453",
volume = "4815 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "316--323",
booktitle = "Pattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings",

}

Pai, RM & Ananthanarayana, VS 2007, Prefix-Suffix trees: A novel scheme for compact representation of large datasets. in Pattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings. vol. 4815 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4815 LNCS, pp. 316-323, 2nd International Conference on Pattern Recognition and Machine Intelligence, PReMI 2007, Kolkata, India, 18-12-07.

Prefix-Suffix trees : A novel scheme for compact representation of large datasets. / Pai, Radhika M.; Ananthanarayana, V. S.

Pattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings. Vol. 4815 LNCS 2007. p. 316-323 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4815 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Prefix-Suffix trees

T2 - A novel scheme for compact representation of large datasets

AU - Pai, Radhika M.

AU - Ananthanarayana, V. S.

PY - 2007

Y1 - 2007

N2 - An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established.

AB - An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper we propose a novel scheme called Prefix-Suffix trees for compact storage of patterns in data mining, which forms an abstraction of the patterns, and which is generated from the data in a single scan. This abstraction takes less amount of space and hence forms a compact storage of patterns. Further, we propose a clustering algorithm based on this storage and prove experimentally that this type of storage reduces the space and time. This has been established by considering large data sets of handwritten numerals namely the OCR data, the MNIST data and the USPS data. The proposed algorithm is compared with other similar algorithms and the efficacy of our scheme is thus established.

UR - http://www.scopus.com/inward/record.url?scp=38149103161&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38149103161&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540770453

SN - 9783540770459

VL - 4815 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 316

EP - 323

BT - Pattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings

ER -

Pai RM, Ananthanarayana VS. Prefix-Suffix trees: A novel scheme for compact representation of large datasets. In Pattern Recognition and Machine Intelligence - Second International Conference, PReMI 2007, Proceedings. Vol. 4815 LNCS. 2007. p. 316-323. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).