MIT manipal at ImageCLef 2019 visual question answering in medical domain

Abhishek Thanki, Krishnamoorthi Makkithaya

Research output: Contribution to journalConference article

Abstract

This paper describes the participation of MIT, Manipal in the ImageCLEF 2019 VQA-Med task. The goal of the task was to build a system that takes as input a medical image and a clinically relevant question, and generates a clinically relevant answer to the question by using the medical image. We explored a different approach compared to most VQA systems and focused on the answer generation part. We used a encoder-decoder architecture based on deep learning where a pre-trained CNN on ImageNet was used to extract visual features from input image, a combination of pre-trained word embedding on pub-med articles along with a 2-layer LSTM was used to extract textual features from the question. Both visual and textual features were integrated using a simple element-wise multiplication technique. The integrated features were then passed into a LSTM decoder which then generated a natural language answer. We submitted a total of 8 runs for this task and the best model achieved a BLEU score of 0.462.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2380
Publication statusPublished - 01-01-2019
Event20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019 - Lugano, Switzerland
Duration: 09-09-201912-09-2019

Fingerprint

Deep learning

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

@article{8b671d1ea00b4b55b36f1e8b0e5152dd,
title = "MIT manipal at ImageCLef 2019 visual question answering in medical domain",
abstract = "This paper describes the participation of MIT, Manipal in the ImageCLEF 2019 VQA-Med task. The goal of the task was to build a system that takes as input a medical image and a clinically relevant question, and generates a clinically relevant answer to the question by using the medical image. We explored a different approach compared to most VQA systems and focused on the answer generation part. We used a encoder-decoder architecture based on deep learning where a pre-trained CNN on ImageNet was used to extract visual features from input image, a combination of pre-trained word embedding on pub-med articles along with a 2-layer LSTM was used to extract textual features from the question. Both visual and textual features were integrated using a simple element-wise multiplication technique. The integrated features were then passed into a LSTM decoder which then generated a natural language answer. We submitted a total of 8 runs for this task and the best model achieved a BLEU score of 0.462.",
author = "Abhishek Thanki and Krishnamoorthi Makkithaya",
year = "2019",
month = "1",
day = "1",
language = "English",
volume = "2380",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

MIT manipal at ImageCLef 2019 visual question answering in medical domain. / Thanki, Abhishek; Makkithaya, Krishnamoorthi.

In: CEUR Workshop Proceedings, Vol. 2380, 01.01.2019.

Research output: Contribution to journalConference article

TY - JOUR

T1 - MIT manipal at ImageCLef 2019 visual question answering in medical domain

AU - Thanki, Abhishek

AU - Makkithaya, Krishnamoorthi

PY - 2019/1/1

Y1 - 2019/1/1

N2 - This paper describes the participation of MIT, Manipal in the ImageCLEF 2019 VQA-Med task. The goal of the task was to build a system that takes as input a medical image and a clinically relevant question, and generates a clinically relevant answer to the question by using the medical image. We explored a different approach compared to most VQA systems and focused on the answer generation part. We used a encoder-decoder architecture based on deep learning where a pre-trained CNN on ImageNet was used to extract visual features from input image, a combination of pre-trained word embedding on pub-med articles along with a 2-layer LSTM was used to extract textual features from the question. Both visual and textual features were integrated using a simple element-wise multiplication technique. The integrated features were then passed into a LSTM decoder which then generated a natural language answer. We submitted a total of 8 runs for this task and the best model achieved a BLEU score of 0.462.

AB - This paper describes the participation of MIT, Manipal in the ImageCLEF 2019 VQA-Med task. The goal of the task was to build a system that takes as input a medical image and a clinically relevant question, and generates a clinically relevant answer to the question by using the medical image. We explored a different approach compared to most VQA systems and focused on the answer generation part. We used a encoder-decoder architecture based on deep learning where a pre-trained CNN on ImageNet was used to extract visual features from input image, a combination of pre-trained word embedding on pub-med articles along with a 2-layer LSTM was used to extract textual features from the question. Both visual and textual features were integrated using a simple element-wise multiplication technique. The integrated features were then passed into a LSTM decoder which then generated a natural language answer. We submitted a total of 8 runs for this task and the best model achieved a BLEU score of 0.462.

UR - http://www.scopus.com/inward/record.url?scp=85070528997&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070528997&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85070528997

VL - 2380

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -