A novel data structure for efficient representation of large data sets in data mining

Radhika M. Pai, V. S. Ananthanarayana

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called Prefix-Postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm.

Original languageEnglish
Title of host publicationProceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006
Pages547-552
Number of pages6
DOIs
Publication statusPublished - 2006
Event14th International Conference on Advanced Computing and Communications, ADCOM 2006 - Surathkal, India
Duration: 20-12-200623-12-2006

Conference

Conference14th International Conference on Advanced Computing and Communications, ADCOM 2006
CountryIndia
CitySurathkal
Period20-12-0623-12-06

Fingerprint

Data mining
Data structures
abstraction
Clustering algorithms
decision making process
Decision making
Scanning
time

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Communication

Cite this

Pai, R. M., & Ananthanarayana, V. S. (2006). A novel data structure for efficient representation of large data sets in data mining. In Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006 (pp. 547-552). [4289952] https://doi.org/10.1109/ADCOM.2006.4289952
Pai, Radhika M. ; Ananthanarayana, V. S. / A novel data structure for efficient representation of large data sets in data mining. Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006. 2006. pp. 547-552
@inproceedings{4fde9641672a43ccb643c4068f0da0a6,
title = "A novel data structure for efficient representation of large data sets in data mining",
abstract = "An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called Prefix-Postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm.",
author = "Pai, {Radhika M.} and Ananthanarayana, {V. S.}",
year = "2006",
doi = "10.1109/ADCOM.2006.4289952",
language = "English",
isbn = "142440715X",
pages = "547--552",
booktitle = "Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006",

}

Pai, RM & Ananthanarayana, VS 2006, A novel data structure for efficient representation of large data sets in data mining. in Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006., 4289952, pp. 547-552, 14th International Conference on Advanced Computing and Communications, ADCOM 2006, Surathkal, India, 20-12-06. https://doi.org/10.1109/ADCOM.2006.4289952

A novel data structure for efficient representation of large data sets in data mining. / Pai, Radhika M.; Ananthanarayana, V. S.

Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006. 2006. p. 547-552 4289952.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A novel data structure for efficient representation of large data sets in data mining

AU - Pai, Radhika M.

AU - Ananthanarayana, V. S.

PY - 2006

Y1 - 2006

N2 - An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called Prefix-Postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm.

AB - An important goal in data mining is to generate an abstraction of the data. Such an abstraction helps in reducing the time and space requirements of the overall decision making process. It is also important that the abstraction be generated from the data in small number of scans. In this paper, we propose a novel data structure called Prefix-Postfix structure(PP-structure), which is an abstraction of the data that can be built by scanning the database only once. We prove that this structure is compact, complete and incremental and therefore is suitable to represent dynamic databases. Further, we propose a clustering algorithm using this structure. The proposed algorithm is tested on different real world datasets and is shown that the algorithm is both space efficient and time efficient for large datasets without sacrificing for the accuracy. We compare our algorithm with other algorithms and show the effectiveness of our algorithm.

UR - http://www.scopus.com/inward/record.url?scp=38149107420&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38149107420&partnerID=8YFLogxK

U2 - 10.1109/ADCOM.2006.4289952

DO - 10.1109/ADCOM.2006.4289952

M3 - Conference contribution

AN - SCOPUS:38149107420

SN - 142440715X

SN - 9781424407156

SP - 547

EP - 552

BT - Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006

ER -

Pai RM, Ananthanarayana VS. A novel data structure for efficient representation of large data sets in data mining. In Proceedings - 2006 14th International Conference on Advanced Computing and Communications, ADCOM 2006. 2006. p. 547-552. 4289952 https://doi.org/10.1109/ADCOM.2006.4289952