Kannada morpheme segmentation using machine learning

Sachi Angle, B. Ashwath Rao, S. N. Muralikrishna

Research output: Contribution to journalArticle

Abstract

This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various methods for evaluation were considered and an accuracy of 85.97% was achieved.

Original languageEnglish
Pages (from-to)45-49
Number of pages5
JournalInternational Journal of Engineering and Technology(UAE)
Volume7
Issue number2
DOIs
Publication statusPublished - 01-01-2018
Externally publishedYes

Fingerprint

Learning systems
Language
Support vector machines
Labels
benzoylprop-ethyl
Machine Learning
Support Vector Machine

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Computer Science (miscellaneous)
  • Environmental Engineering
  • Chemical Engineering(all)
  • Engineering(all)
  • Hardware and Architecture

Cite this

Angle, Sachi ; Ashwath Rao, B. ; Muralikrishna, S. N. / Kannada morpheme segmentation using machine learning. In: International Journal of Engineering and Technology(UAE). 2018 ; Vol. 7, No. 2. pp. 45-49.
@article{036609c7969844ad950186ae07661fee,
title = "Kannada morpheme segmentation using machine learning",
abstract = "This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various methods for evaluation were considered and an accuracy of 85.97{\%} was achieved.",
author = "Sachi Angle and {Ashwath Rao}, B. and Muralikrishna, {S. N.}",
year = "2018",
month = "1",
day = "1",
doi = "10.14419/ijet.v7i2.31.13395",
language = "English",
volume = "7",
pages = "45--49",
journal = "International Journal of Engineering and Technology(UAE)",
issn = "2227-524X",
publisher = "Science Publishing Corporation Inc",
number = "2",

}

Kannada morpheme segmentation using machine learning. / Angle, Sachi; Ashwath Rao, B.; Muralikrishna, S. N.

In: International Journal of Engineering and Technology(UAE), Vol. 7, No. 2, 01.01.2018, p. 45-49.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Kannada morpheme segmentation using machine learning

AU - Angle, Sachi

AU - Ashwath Rao, B.

AU - Muralikrishna, S. N.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various methods for evaluation were considered and an accuracy of 85.97% was achieved.

AB - This paper addresses and targets morpheme segmentation of Kannada words using supervised classification. We have used manually annotated Kannada treebank corpus, which is recently developed by us. Kannada bears resemblance to other Dravidian languages in morphological structure. It is an agglutinative language, hence its words have complex morphological form with each word comprising of a root and an optional set of suffixes. These suffixes carry additional meaning, apart from the root word in a context. This paper discusses the extraction of morphemes of a word by using Support Vector Machines for Classification. Additional features representing the properties of the Kannada words were extracted and the different letters were classified into labels that result in the morphological segmentation of the word. Various methods for evaluation were considered and an accuracy of 85.97% was achieved.

UR - http://www.scopus.com/inward/record.url?scp=85047843662&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047843662&partnerID=8YFLogxK

U2 - 10.14419/ijet.v7i2.31.13395

DO - 10.14419/ijet.v7i2.31.13395

M3 - Article

VL - 7

SP - 45

EP - 49

JO - International Journal of Engineering and Technology(UAE)

JF - International Journal of Engineering and Technology(UAE)

SN - 2227-524X

IS - 2

ER -