Zone-based structural feature extraction for script identification from Indian documents

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set.

Original languageEnglish
Title of host publication2010 5th International Conference on Industrial and Information Systems, ICIIS 2010
Pages420-425
Number of pages6
DOIs
Publication statusPublished - 2010
Event2010 5th International Conference on Industrial and Information Systems, ICIIS 2010 - Mangalore, Karnataka, India
Duration: 29-07-201001-08-2010

Conference

Conference2010 5th International Conference on Industrial and Information Systems, ICIIS 2010
CountryIndia
CityMangalore, Karnataka
Period29-07-1001-08-10

Fingerprint

Feature extraction
Online searching
Optical character recognition
Support vector machines
Classifiers

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management
  • Industrial and Manufacturing Engineering

Cite this

Gopakumar, R., Subbareddy, N. V., Makkithaya, K., & Dinesh Acharya, U. (2010). Zone-based structural feature extraction for script identification from Indian documents. In 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010 (pp. 420-425). [5578668] https://doi.org/10.1109/ICIINFS.2010.5578668
Gopakumar, Rajesh ; Subbareddy, N. V. ; Makkithaya, Krishnamoorthi ; Dinesh Acharya, U. / Zone-based structural feature extraction for script identification from Indian documents. 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010. 2010. pp. 420-425
@inproceedings{0eae24e2efe8414d84f1c74915051ae5,
title = "Zone-based structural feature extraction for script identification from Indian documents",
abstract = "Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100{\%} is achieved on the optimal feature set.",
author = "Rajesh Gopakumar and Subbareddy, {N. V.} and Krishnamoorthi Makkithaya and {Dinesh Acharya}, U.",
year = "2010",
doi = "10.1109/ICIINFS.2010.5578668",
language = "English",
isbn = "9781424466535",
pages = "420--425",
booktitle = "2010 5th International Conference on Industrial and Information Systems, ICIIS 2010",

}

Gopakumar, R, Subbareddy, NV, Makkithaya, K & Dinesh Acharya, U 2010, Zone-based structural feature extraction for script identification from Indian documents. in 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010., 5578668, pp. 420-425, 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Mangalore, Karnataka, India, 29-07-10. https://doi.org/10.1109/ICIINFS.2010.5578668

Zone-based structural feature extraction for script identification from Indian documents. / Gopakumar, Rajesh; Subbareddy, N. V.; Makkithaya, Krishnamoorthi; Dinesh Acharya, U.

2010 5th International Conference on Industrial and Information Systems, ICIIS 2010. 2010. p. 420-425 5578668.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Zone-based structural feature extraction for script identification from Indian documents

AU - Gopakumar, Rajesh

AU - Subbareddy, N. V.

AU - Makkithaya, Krishnamoorthi

AU - Dinesh Acharya, U.

PY - 2010

Y1 - 2010

N2 - Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set.

AB - Automatic identification of a script in a given document image facilitates many important applications such as automatic archiving of multilingual documents, searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this paper a Zone-based Structural feature extraction algorithm scheme towards the recognition of South-Indian scripts along with English and Hindi is proposed. The document images are segmented into lines and the line image is divided into different zones and the structural features are extracted. A total of 37 features were extracted in the first level and then reduced to an optimal number of features using wrapper and filter selection approaches. The K-nearest neighbor and the support vector machine classifiers are used for classification and recognition purpose. A classification accuracy of 100% is achieved on the optimal feature set.

UR - http://www.scopus.com/inward/record.url?scp=77958598509&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958598509&partnerID=8YFLogxK

U2 - 10.1109/ICIINFS.2010.5578668

DO - 10.1109/ICIINFS.2010.5578668

M3 - Conference contribution

SN - 9781424466535

SP - 420

EP - 425

BT - 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010

ER -

Gopakumar R, Subbareddy NV, Makkithaya K, Dinesh Acharya U. Zone-based structural feature extraction for script identification from Indian documents. In 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010. 2010. p. 420-425. 5578668 https://doi.org/10.1109/ICIINFS.2010.5578668