• • 上一篇    

LSTM的季节性注意力及在文本情感分类中的应用

刘华玲, 何轶辉   

  1. 上海对外经贸大学统计与信息学院, 上海 201620
  • 收稿日期:2022-05-18 修回日期:2022-11-07 发布日期:2023-05-18
  • 通讯作者: 刘华玲, Email:liuhl@suibe.edu.cn
  • 基金资助:
    国家社科基金重大项目(21ZDA105),上海哲学社会科学规划课题(2018BJB023),上海对外经贸大学研究生科研创新培育项目(2022-030800-07)资助课题.

刘华玲, 何轶辉. LSTM的季节性注意力及在文本情感分类中的应用[J]. 系统科学与数学, 2023, 43(4): 1002-1020.

LIU Hualing, HE Yihui. LSTM Altered by Seasonal Attention and Its Application in Text Sentiment Classification[J]. Journal of Systems Science and Mathematical Sciences, 2023, 43(4): 1002-1020.

LSTM Altered by Seasonal Attention and Its Application in Text Sentiment Classification

LIU Hualing, HE Yihui   

  1. School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai 201620
  • Received:2022-05-18 Revised:2022-11-07 Published:2023-05-18
长短期记忆网络 (LSTM)在序列建模中存在梯度消失的情况,其降低了模型在时序预测任务尤其是中长期多步预测中的精度,同时降低了模型对于序列上下文中关键信息的注意力.梯度消失的根本原因在于LSTM的门控记忆机制对在循环层反向传播的梯度失去控制,故考虑对循环层的门控单元结构进行调整, 并专门对于含有特定成分(如季节成分)的序列进行训练使改进后LSTM模型在序列预测任务中具备针对季节性成分的注意力.文章研究在LSTM模型的基础上采用将已有的单支路的遗忘门调整为具有双支路的季节门,并引入输入序列的极差作为划分支路的选通器的方法, 改进得到季节型LSTM(S-SLTM). 经实验, 在英文电影评论IMDB 的文本二分类情感分析中,单层的S-LSTM 较单层 LSTM 的预测准确率提升了9.8\%.
The gradient disappearance of Long Short-Term Memory (LSTM) network in sequence modeling reduces the accuracy of the model in time series prediction tasks, especially in medium and long-term multi-step prediction, and reduces the attention of the model to the key information in the sequence context. Gradient disappeared the root cause of the memory mechanism lies in the LSTM gating lose control over the layer back propagation gradient in the cycle, so considering adjustment at gating methods of the circle layer structure. And specifically to contain certain elements (such as Seasonal components) training sequence to make improved LSTM model based in sequence prediction tasks have significant attention. In this study, based on the LSTM model, the Seasonal-LSTM (S-SLTM) is improved by adjusting the existing single-branch forgetting gate to a seasonal gate with dual branches, and introducing the range of the input sequence as the pass-selector for dividing the branches. Experimental results show that the prediction accuracy of the single-layer S-LSTM is 9.8% higher than that of the single-layer LSTM in the text binary classification sentiment analysis of English movie reviews IMDB.

MR(2010)主题分类: 

()
[1] Hochreiter S, Schmidhuber J. Long-short term memory. Neural Computation, 1997, 9(8):1735- 1780.
[2] Lai G K, Chang W C, Yang Y M, et al. Modeling long short-term temporal patterns with deep neural networks. The 41st international ACM SIGIR Conference on Research & Development in Information Retrieval, 2018, 95-104.
[3] 罗广诚, 郜家珏, 蔡文学. 基于GRA-LSTM与SARIMA组合模型的季节性时间序列预测. 智能计算机与应用, 2021, 11(6):195-200. (Luo G C, Gao J J, Cai W X. Seasonal time series prediction based on combination model of GRA-LSTM and SARIMA. Intelligent Computer and Applications, 2021, 11(6):195-200.)
[4] 唐晓彬, 董曼茹, 张瑞. 基于机器学习LSTM$\&$US模型的消费者信心指数预测研究. 统计研究, 2020, 37(7):104-115. (Tang X B, Dong M R, Zhang R. Research on the prediction of consumer confidence index based on machine learning LSTM&US model. Statistical Research, 2020, 37(7):104-115.)
[5] 邸浩, 赵学军, 张自力. 基于EEMD-LSTM-Adaboost的商品价格预测. 统计与决策, 2018, 34(13):72-76. (Di H, Zhao X J, Zhang Z L. Commodity price forecasting based on EEMD-LSTM-Adaboost. Statistics & Decision, 2018, 34(13):72-76.)
[6] 杨青, 王晨蔚. 基于深度学习LSTM神经网络的全球股票指数预测研究. 统计研究, 2019, 36(3):65-77. (Yang Q, Wang C W. A study on forecast of global stock indices based on deep LSTM neural network. Statistical Research, 2019, 36(3):65-77.)
[7] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations, ICLR, 2015, 1-15.
[8] 邱锡鹏. 神经网络与深度学习. 北京:机械工业出版社, 2020. (Qiu X P. Nerual Network and Deep Learning. Beijing:China Machine Press, 2020.)
[9] Vaswani A, Shazeer N, Parmar N, et al. Attention is all your need. Proceedings of the 31st International Conference on Neural Information Processing Systems, NISS, 2017, 6000-6010.
[10] 梁斌, 刘全, 徐进, 等. 基于多注意力卷积神经网络的特定目标情感分析. 计算机研究与发展, 2017, 54(8):1724-1735. (Liang B, Liu Q, Xu J, et al. Aspect-based sentiment analysis based on multi-attention CNN. Journal of Computer Research and Development, 2017, 54(8):1724-1735.)
[11] 邱宁佳, 周思丞, 丛琳, 等. 改进CNN的多通道语义合成情感分类模型研究. 计算机工程与应用, 2019, 55(23):136-141. (Qiu N J, Zhou S C, Cong L, et al. Research on multi-channel semantic fusion emotion classification model based on CNN. Journal of Computer Research and Development, 2019, 55(23):136-141.)
[12] 程艳, 尧磊波, 张光河, 等. 基于注意力机制的多通道CNN和BiGRU的文本情感倾向性分析. 计算机研究与发展, 2020, 57(12):2583-2595. (Cheng Y, Yao L B, Zhang G H, et al. Text sentiment orientation analysis of multi-channels CNN and BiGRU based on attention mechanism. Journal of Computer Research and Development, 2020, 57(12):2583-2595.)
[13] 贺波, 马静, 李驰. 基于融合特征的商品文本分类方法研究. 情报理论与实践, 2020, 43(11):162-168. (He B, Ma J, Li C. Research on commodity text classification based on fusion features. Information Studies:Theory & Application, 2020, 43(11):162-168.)
[14] 滕金保, 孔韦韦, 田乔鑫, 等. 基于CNN和LSTM的多通道注意力机制文本分类模型. 计算机工程与应用, 2021, 57(23):154-162. (Teng J B, Kong W W, Tian Q X, et al. Multi-channel attention mechanism text classification model based on CNN and LSTM. Computer Engineering and Applications, 2021, 57(23):154-162.)
[15] 万家山, 吴云志. 基于深度学习的文本分类方法研究综述. 天津理工大学学报, 2021, 37(2):41-47. (Wan J S, Wu Y Z. Review of text classification research based on deep learning. Journal of Tianjin University of Technology, 2021, 37(2):41-47.)
[16] Yang Z, Yang D, Dyer C, et al. Hierarchical attention networks for document classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies, 2016, 1480-1489.
[17] Fan H, Mei X, Prokhorov D, et al. Multi-level contextual RNNs with attention model for scene labeling. IEEE Transactions on Intelligent Transportation Systems, 2018, 19(11):3475-3485.
[18] Kim Y, Denton C, Hoang L, et al. Structured attention networks. Proceedings of 5th International Conference on Learning Representations, 2017, 1-21.
[19] 袁和金, 张旭, 牛为华, 等. 融合注意力机制的多通道卷积与双向GRU模型的文本情感分析研究. 中文信息学报, 2019, 33(10):109-118. (Yuan H J, Zhang X, Niu W H, et al. Sentiment analysis based on multi-channel convolution and bi-directional GRU with attention mechanism. Journal of Chinese Information Processing, 2019, 33(10):109-118.)
[20] 宁尚明, 滕飞, 李天瑞. 基于多通道自注意力机制的电子病历实体关系抽取. 计算机学报, 2020, 43(5):916-929. (Ning S M, Teng F, Li T R. Multi-channel self-attention mechanism for relation extraction in clinical records. Chinese Journal of Computers, 2020, 43(5):916-929.)
[21] Fu X, Yang J, Li J, et al. Lexicon-enhanced LSTM with attention for general sentiment analysis. IEEE Access, 2018, 6(1):71884-71891.
[22] Bengio Y, Ducharme R, Vincent P, et al A neural probabilistic language model Journal of Machine Learning Research, 2003, 3(2):1137-1155
[23] Mikolov T, Sutskever I, Chen K, et al Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems, 2013, 3111-3119
[24] Liu Z D, Zhou W G, Li H Q. AB-LSTM:Attention-based bidirectional LSTM model for scene text detection. ACM Transactions on Multimedia Computing Communications and Applications, 2019, 15(4):1-23.
[25] 王红, 史金钏, 张志伟. 基于注意力机制的LSTM的语义关系抽取. 计算机应用研究, 2018, 35(5):1417-1420, 1440. (Wang H, Shi J C, Zhang Z W. Text semantic relation extraction of LSTM based on attention mechanism. Application Research of Computers, 2018, 35(5):1417-1420, 1440.)
[26] Li M, Miao Z, Xu W. ACRNN-based attention-seq2seq model with fusion feature for automatic labanotation generation. Neurocomputing, 2021, 454(1):430-440.
[27] Bensalah N, Ayad H, Adib A, et al. CRAN:A hybrid CNN-RNN attention-based model for arabic machine translation. Networking, Intelligent Systems and Security, NISS, 2021, 87-102.
[28] 杨兴锐, 赵寿为, 张如学, 等. 结合自注意力和残差的BiLSTM-CNN文本分类模型. 计算机工程与应用, 2022, 58(3):172-180. (Yang X R, Zhao S W, Zhang R X, et al. BiLSTM-CNN classification model based on selfattention and residual network. Computer Engineering and Applications, 2022, 58(3):172-180.)
[29] 谢润忠, 李烨. 基于BERT和双通道注意力的文本情感分类模型. 数据采集与处理,2020, 35(4):642-652. (Xie R X, Li Y. Text sentiment classification model based on BERT and dual channel attention. Journal of Data Acquisition and Processing, 2020, 35(4):642-652.)
[30] Devlin J, Chang M W, Lee K, et al. BERT:Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics, 2019, 4171-4186.
[31] Kelvin X, Jimmy B, Ryan K, et al. Show, attend and tell:Neural image caption generation with visual attention. International Conference on Machine Learning, PMLR, 2015, 2048-2057.
[32] Shih S Y, Sun F K, Lee H Y. Temporal pattern attention for multivariate time series forecasting. Machine Learning, 2019, 108(8-9):1421-1441.
[33] 陆超红. 基于多通道循环卷积神经网络的文本分类方法. 计算机应用与软件, 2020, 37(8):282-288. (Lu C H. Text classification based on multichannel recurrent convolutional neural network. Computer Applications and Software, 2020, 37(8):282-288.)
[34] 曲宗希, 沙勇忠, 李雨桐. 基于灰狼优化与多机器学习的重大传染病集合预测研究——以COVID-19疫情为例. 数据分析与知识发现, 2022, 6(8):122-133. (Qu Z X, Sha Y Z, Li Y T. Predicting major infectious diseases based on grey wolf optimization and multi-machine learning:Case study of COVID-19. Data Analysis and Knowledge Discovery, 2022, 6(8):122-133.)
[35] Fazle K, Somshubra M, Houshang D, et al. Multivariate LSTM-FCNs for time series classifcation. Neural Networks, 2019, 116(1):237-245.
[36] Wang S, Manning C D. Baselines and bigrams:Simple, good sentiment and topic classification. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012, 90-94.
[37] Wang X, Jiang W, Luo Z. Combination of convolutional and recurrent neural network for sentiment analysis of short texts. Proceedings of Conference on COLING, 2016, 2428-2437.
[38] Kasem K, Omar E, Ashok K, et al. Economic LSTM approach for recurrent neural networks. IEEE Transactions on Circuits and Systems II:Express Briefs, 2019, 66(11):1885-1889.
[39] Lei T, Zhang Y, Wang S I, et al. Simple recurrent units for highly parallelizable recurrence. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2018, 4470-4481.
[1] 王鑫, 王莹. 基于LSTM-CNN的中小企业信用风险预测[J]. 系统科学与数学, 2022, 42(10): 2698-2711.
[2] 琚春华,陈冠宇,鲍福光. 基于kNN-Smote-LSTM的消费金融风险检测模型[J]. 系统科学与数学, 2021, 41(2): 481-498.
[3] 张立文,朱周帆,郝鸿. 基于深度学习的乘用车市场预警模型研究[J]. 系统科学与数学, 2020, 40(11): 2136-2150.
阅读次数
全文


摘要