Shahed University
Video captioning using boosted and parallel Long Short-Term Memory networks
Masoomeh Nabati | Alireza Behrad
URL :
http://research.shahed.ac.ir/WSR/WebPages/Report/PaperView.aspx?PaperID=127043
Date :
2019/10/11
Publish in :
Computer Vision and Image Understanding
DOI :
https://doi.org/https://doi.org/10.1016/j.cviu.2019.102840
Link :
https://www.sciencedirect.com/science/article/pii/S1077314218301632?dgcid=rss_sd_all
Keywords :
,Video captioning, Boosted and parallel LSTMs, AdaBoost algorithm
Abstract :
Video captioning and its integration with deep learning is one of the most challenging issues in the field of machine vision and artificial intelligence. In this paper, a new boosted and parallel architecture is proposed for video captioning using Long Short-Term Memory (LSTM) Networks. The proposed architecture comprises two LSTM layers and a word selection module. The first LSTM layer has the responsibility of encoding frame features extracted by a pre-trained deep Convolutional Neural Network (CNN). In the second LSTM layer, a novel architecture is used for video captioning by leveraging several decoding LSTMs in a parallel and boosting architecture. This layer, which is called Boosted and Parallel LSTM (BP-LSTM) layer, is constructed by iteratively training LSTM networks using a special kind of AdaBoost algorithm during the training phase. During the testing phase, the outputs of BP-LSTMs are concurrently combined using the maximum probability criterion and word selection module. We tested the proposed algorithm with two well-known video captioning datasets and compared the results with state-of-the-art algorithms. The results show that the proposed architecture considerably improves the accuracy of the generated sentence.
Authors' Home page
Alireza Behrad