GPU has emerged as a platform that off-loads computation intensive work from CPU and performs numerical computations in less time. One such mathematical operation is matrix multiplication. Matrix is one of the fundamental mathematical objects used in the scientific calculation, with applicability in various fields such as computer graphics, analysis of electrical circuits, computer networks, DNA sequence comparison, protein structure prediction, etc. This work presents a comparative analysis of scalar matrix multiplication in three modes, namely: (i) sequential programming in C language (ii) parallel implementations using OpenCL, and (iii) MPI. The testbed comprises of input matrices ranging from small size of 100 × 100 to a higher size of 800 × 12,800. We observe that parallel execution in OpenCL outperforms MPI and sequential C for higher dimensional matrices. In contrast, sequential C outperforms both MPI and OpenCL for small dimension matrices. Besides, we analyze that OpenCL program has attained a speedup of 9 ×. Therefore, we conclude that parallel execution of code is more efficient for data of computationally large sizes and hence provides a potentially useful solution to address NP-complete problems.