MIT manipal at ImageCLef 2019 visual question answering in medical domain

Abhishek Thanki, Krishnamoorthi Makkithaya

Research output: Contribution to journalConference articlepeer-review


This paper describes the participation of MIT, Manipal in the ImageCLEF 2019 VQA-Med task. The goal of the task was to build a system that takes as input a medical image and a clinically relevant question, and generates a clinically relevant answer to the question by using the medical image. We explored a different approach compared to most VQA systems and focused on the answer generation part. We used a encoder-decoder architecture based on deep learning where a pre-trained CNN on ImageNet was used to extract visual features from input image, a combination of pre-trained word embedding on pub-med articles along with a 2-layer LSTM was used to extract textual features from the question. Both visual and textual features were integrated using a simple element-wise multiplication technique. The integrated features were then passed into a LSTM decoder which then generated a natural language answer. We submitted a total of 8 runs for this task and the best model achieved a BLEU score of 0.462.

Original languageEnglish
JournalCEUR Workshop Proceedings
Publication statusPublished - 01-01-2019
Event20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019 - Lugano, Switzerland
Duration: 09-09-201912-09-2019

All Science Journal Classification (ASJC) codes

  • Computer Science(all)


Dive into the research topics of 'MIT manipal at ImageCLef 2019 visual question answering in medical domain'. Together they form a unique fingerprint.

Cite this