Neural network based system for script identification in Indian documents

S. Basavaraj Patil, N. V. Subbareddy

Research output: Contribution to journalArticle

45 Citations (Scopus)

Abstract

The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 x 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 x 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 x 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.

Original languageEnglish
Pages (from-to)83-97
Number of pages15
JournalSadhana - Academy Proceedings in Engineering Sciences
Volume27
Issue numberPART 1
Publication statusPublished - 01-12-2002

Fingerprint

Classifiers
Pixels
Neural networks
Identification (control systems)
Feedforward neural networks
Masks
Automation
Processing
Experiments

All Science Journal Classification (ASJC) codes

  • General

Cite this

@article{f3327c225e434ee6850d767eb495b9a7,
title = "Neural network based system for script identification in Indian documents",
abstract = "The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 x 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 x 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 x 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.",
author = "{Basavaraj Patil}, S. and Subbareddy, {N. V.}",
year = "2002",
month = "12",
day = "1",
language = "English",
volume = "27",
pages = "83--97",
journal = "Sadhana - Academy Proceedings in Engineering Sciences",
issn = "0256-2499",
publisher = "Springer India",
number = "PART 1",

}

Neural network based system for script identification in Indian documents. / Basavaraj Patil, S.; Subbareddy, N. V.

In: Sadhana - Academy Proceedings in Engineering Sciences, Vol. 27, No. PART 1, 01.12.2002, p. 83-97.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Neural network based system for script identification in Indian documents

AU - Basavaraj Patil, S.

AU - Subbareddy, N. V.

PY - 2002/12/1

Y1 - 2002/12/1

N2 - The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 x 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 x 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 x 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.

AB - The paper describes a neural network-based script identification system which can be used in the machine reading of documents written in English, Hindi and Kannada language scripts. Script identification is a basic requirement in automation of document processing, in multi-script, multi-lingual environments. The system developed includes a feature extractor and a modular neural network. The feature extractor consists of two stages. In the first stage the document image is dilated using 3 x 3 masks in horizontal, vertical, right diagonal, and left diagonal directions. In the next stage, average pixel distribution is found in these resulting images. The modular network is a combination of separately trained feedforward neural network classifiers for each script. The system recognizes 64 x 64 pixel document images. In the next level, the system is modified to perform on single word-document images in the same three scripts. Modified system includes a pre-processor, modified feature extractor and probabilistic neural network classifier. Pre-processor segments the multi-script multi-lingual document into individual words. The feature extractor receives these word-document images of variable size and still produces the discriminative features employed by the probabilistic neural classifier. Experiments are conducted on a manually developed database of document images of size 64 x 64 pixels and on a database of individual words in the three scripts. The results are very encouraging and prove the effectiveness of the approach.

UR - http://www.scopus.com/inward/record.url?scp=0036464677&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036464677&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0036464677

VL - 27

SP - 83

EP - 97

JO - Sadhana - Academy Proceedings in Engineering Sciences

JF - Sadhana - Academy Proceedings in Engineering Sciences

SN - 0256-2499

IS - PART 1

ER -