Data-Driven Dynamic Output Feedback Nash Strategy for Multi-Player Non-Zero-Sum Games

XIE Kedi, LU Maobin, DENG Fang, SUN Jian, CHEN Jie

Journal of Systems Science & Complexity ›› 2025, Vol. 38 ›› Issue (2) : 597-612.

PDF(329 KB)
PDF(329 KB)
Journal of Systems Science & Complexity ›› 2025, Vol. 38 ›› Issue (2) : 597-612. DOI: 10.1007/s11424-025-4535-3

Data-Driven Dynamic Output Feedback Nash Strategy for Multi-Player Non-Zero-Sum Games

Author information +
History +

Abstract

This paper investigates the multi-player non-zero-sum game problem for unknown linear continuous-time systems with unmeasurable states. By only accessing the data information of input and output, a data-driven learning control approach is proposed to estimate N-tuple dynamic output feedback control policies which can form Nash equilibrium solution to the multi-player non-zero-sum game problem. In particular, the explicit form of dynamic output feedback Nash strategy is constructed by embedding the internal dynamics and solving coupled algebraic Riccati equations. The coupled policy-iteration based iterative learning equations are established to estimate the N-tuple feedback control gains without prior knowledge of system matrices. Finally, an example is used to illustrate the effectiveness of the proposed approach.

Key words

Adaptive dynamic programming / non-zero-sum games / output feedback / policy-iteration

Cite this article

Download Citations
XIE Kedi , LU Maobin , DENG Fang , SUN Jian , CHEN Jie. Data-Driven Dynamic Output Feedback Nash Strategy for Multi-Player Non-Zero-Sum Games. Journal of Systems Science & Complexity, 2025, 38(2): 597-612 https://doi.org/10.1007/s11424-025-4535-3

References

[1] Wang G and Yu Z, A Pontryagin’s maximum principle for non-zero sum differential games of BSDEs with applications, IEEE Transactions on Automatic Control, 2010, 55(7): 1742-1747.
[2] Vamvoudakis K G and Lewis F L, Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations, Automatica, 2011, 47(8): 1556-1569.
[3] Powell W B, Approximate Dynamic Programming: Solving the Curse of Dimensionality, Wiley, New York, 2004.
[4] Jiang Y and Jiang Z-P, Robust Adaptive Dynamic Programming, Wiley, Hoboken, 2017.
[5] Sutton R S and Barto A G, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 2018.
[6] Jiang Y and Jiang Z-P, Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics, Automatica, 2012, 48(10): 2699-2704.
[7] Pang B, Bian T, and Jiang Z P, Robust policy iteration for continuous-time linear quadratic regulation, IEEE Transactions on Automatic Control, 2021, 67(1): 504-511.
[8] Rizvi S A A and Lin Z, Reinforcement learning-based linear quadratic regulation of continuoustime systems using dynamic output feedback, IEEE Transactions on Cybernetics, 2020, 50(11): 4670-4679.
[9] Jiang Z P and Jiang Y, Robust adaptive dynamic programming for linear and nonlinear systems: An overview, European Journal of Control, 2013, 19(5): 417-425.
[10] Xie K, Zheng Y, Jiang Y, et al., Optimal dynamic output feedback control of unknown linear continuous-time systems by adaptive dynamic programming, Automatica, 2024, 163: 111601.
[11] Chen C, Xie L, Xie K, et al., Adaptive optimal output tracking of continuous-time systems via output-feedback-based reinforcement learning, Automatica, 2022, 146: 110581.
[12] Chen C, Xie L, Jiang Y, et al., Robust output regulation and reinforcement learning-based output tracking design for unknown linear discrete-time systems, IEEE Transactions on Automatic Control, 2023, 68(4): 2391-2398.
[13] Chen C, Lewis F L, Xie K, et al., Distributed output data-driven optimal robust synchronization of heterogeneous multi-agent systems, Automatica, 2023, 153: 111030.
[14] Chen C, Modares H, Xie K, et al., Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics, IEEE Transactions on Automatic control, 2019, 64(11): 4423-4438.
[15] Gao W and Jiang Z-P, Adaptive dynamic programming and adaptive optimal output regulation of linear systems, IEEE Transactions on Automatic Control, 2016, 61(12): 4164-4169.
[16] Gao W, Jiang Z-P, and Lewis F L, Leader-to-Formation stability of multi-agent systems: An adaptive optimal control approach, IEEE Transactions on Automatic Control, 2018, 63(10): 3581-3588.
[17] Gao W, Mynuddin M, Wunsch D C, et al., Reinforcement learning-based cooperative optimal output regulation via distributed adaptive internal model, IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(10): 5229-5240.
[18] Xie K, Yu X, and Lan W, Optimal output regulation for unknown continuous-time linear systems by internal model and adaptive dynamic programming, Automatica, 2022, 146: 110564.
[19] Jiang Y, Gao W, Wu J, et al., Reinforcement learning and cooperative H∞ output regulation of linear continuous-time multi-agent systems, Automatica, 2023, 148: 110768.
[20] Dong Y, Gao W, and Jiang Z-P, New results in cooperative adaptive optimal output regulation, Journal of Systems Science & Complexity, 2024, 37(1): 253-272.
[21] Lewis F L, Vrabie D L, and Syrmos V L, Optimal Control, John Wiley & Sons, Inc., Hoboken, 2012.
[22] Bian T and Jiang Z-P, Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design, Automatica, 2016, 71: 348-360.
[23] Vamvoudakis K G, Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems, Automatica, 2015, 61: 274-281.
[24] Song R, Wei Q, Zhang H, et al., Discrete-time non-zero-sum games with completely unknown dynamics, IEEE Transactions on Cybernetics, 2021, 51(6): 2929-2943.
[25] Zhang Z, Xu J, and Fu M, Q-learning for feedback Nash strategy of finite-horizon nonzero-sum difference games, IEEE Transactions on Cybernetics, 2022, 52(9): 9170-9178.
[26] Zhang H, Jiang H, Luo C, et al., Discrete-time nonzero-sum games for multiplayer using policyiteration-based adaptive dynamic programming algorithms, IEEE Transactions on Cybernetics, 2017, 47(10): 3331-3340.
[27] Xin X, Tu Y, Stojanovic V, et al., Online reinforcement learning multiplayer non-zero sum games of continuous-time Markov jump linear systems, Applied Mathematics and Computation, 2022, 412: 126537.
[28] Wu S, Linear-quadratic non-zero sum backward stochastic differential game with overlapping information, IEEE Transactions on Automatic Control, 2022, 68(3): 1800-1806.
[29] Zhang B, Wang B, and Cao Y, An online Q-learning method for linear-quadratic nonzero-sum stochastic differential games with completely unknown dynamics, Journal of Systems Science & Complexity, 2024, 37(5): 1907-1922.
[30] Wei Q, Zhu L, Song R, et al., Model-free adaptive optimal control for unknown nonlinear multiplayer nonzero-sum game, IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(2): 879-892.
[31] Jiang H and Zhang H, Iterative ADP learning algorithms for discrete-time multi-player games, Artificial Intelligence Review, 2018, 50: 75-91.
[32] Huo Y, Wang D, Qiao J, et al., Off-policy model-free learning for multi-player non-zero-sum games with constrained inputs, IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 70(2): 910-920.
[33] Wang Z, Wei Q, and Liu D, Event-triggered adaptive dynamic programming for discrete-time multi-player games, Information Sciences, 2020, 506: 457-470.
[34] Chen C, Xie L, Xie K, et al., Learning the continuous-time optimal decision law from discretetime rewards, National Science Open, 2024, 3(5): 20230054.
[35] Wen Y, Zhang H, Su H, et al., Optimal tracking control for non-zero-sum games of linear discretetime systems via off-policy reinforcement learning, Optimal Control Applications and Methods, 2020, 41(4): 1233-1250.
[36] Zhou P, Xue H, Wen J, et al., Model-free optimal tracking policies for Markov jump systems by solving non-zero-sum games, Information Sciences, 2023, 647: 119423.
[37] Qian Y, Liu M, Wan Y, et al., Distributed adaptive Nash equilibrium solution for differential graphical games, IEEE Transactions on Cybernetics, 2023, 53(4): 2275-2287.
[38] Odekunle A, Gao W, Davari M, et al., Reinforcement learning and non-zero-sum game output regulation for multi-player linear uncertain systems, Automatica, 2020, 112: 108672.
[39] Chen C, Lewis F L, and Li B, Homotopic policy iteration-based learning design for unknown linear continuous-time systems, Automatica, 2022, 138: 110153.

Funding

This research was supported by National Key R&D Program of China under Grant No. 2021ZD0112600, the National Natural Science Foundation of China under Grant No. 62373058, the Beijing Natural Science Foundation under Grant No. L233003, National Science Fund for Distinguished Young Scholars of China under Grant No. 62025301, the Postdoctoral Fellowship Program of CPSF under Grant No. GZC20233407, and the Basic Science Center Programs of NSFC under Grant No. 62088101.
PDF(329 KB)

59

Accesses

0

Citation

Detail

Sections
Recommended

/