Recently, deep hierarchically learned models (such as CNN) have achieved superior performance in various computer vision tasks but limited attention has been paid to biometrics till now. This is major because of the number of samples available in biometrics are limited and are not enough to train CNN efficiently. However, deep learning often requires a lot of training data because of the huge number of parameters to be tuned by the learning algorithm. How about designing an end-to-end deep learning network to match the biometric features when the number of training samples is limited? To address this problem, we propose a new way to design an end-to-end deep neural network that works in two major steps: first an auto-encoder has been trained for learning domain specific features followed by a Siamese network trained via. triplet loss function for matching. A publicly available vein image data set has been utilized as a case study to justify our proposal. We observed that transformations learned from such a network provide domain specific and most discriminative vascular features. Subsequently, the corresponding traits are matched using multimodal pipelined end-to-end network in which the convolutional layers are pre-trained in an unsupervised fashion as an autoencoder. Thorough experimental studies suggest that the proposed framework consistently outperforms several state-of-the-art vein recognition approaches.