This paper describes the participation of MIT, Manipal in the ImageCLEF 2019 VQA-Med task. The goal of the task was to build a system that takes as input a medical image and a clinically relevant question, and generates a clinically relevant answer to the question by using the medical image. We explored a different approach compared to most VQA systems and focused on the answer generation part. We used a encoder-decoder architecture based on deep learning where a pre-trained CNN on ImageNet was used to extract visual features from input image, a combination of pre-trained word embedding on pub-med articles along with a 2-layer LSTM was used to extract textual features from the question. Both visual and textual features were integrated using a simple element-wise multiplication technique. The integrated features were then passed into a LSTM decoder which then generated a natural language answer. We submitted a total of 8 runs for this task and the best model achieved a BLEU score of 0.462.
|Journal||CEUR Workshop Proceedings|
|Publication status||Published - 01-01-2019|
|Event||20th Working Notes of CLEF Conference and Labs of the Evaluation Forum, CLEF 2019 - Lugano, Switzerland|
Duration: 09-09-2019 → 12-09-2019
All Science Journal Classification (ASJC) codes
- Computer Science(all)