دورية أكاديمية

Reinforcement learning approach for resource allocation in humanitarian logistics.

التفاصيل البيبلوغرافية
العنوان: Reinforcement learning approach for resource allocation in humanitarian logistics.
المؤلفون: Yu, Lina1 (AUTHOR) yulina@cueb.edu.cn, Zhang, Canrong2 (AUTHOR) crzhang@sz.tsinghua.edu.cn, Jiang, Jingyan2 (AUTHOR) jiangjingyan@tsinghua.edu.cn, Yang, Huasheng1,3 (AUTHOR) yyhhss06@gmail.com, Shang, Huayan1 (AUTHOR) shanghuayan@126.com
المصدر: Expert Systems with Applications. Jul2021, Vol. 173, pN.PAG-N.PAG. 1p.
مصطلحات موضوعية: *RESOURCE allocation, *REINFORCEMENT learning, *HEURISTIC programming, *DYNAMIC programming, *NONLINEAR programming, *REWARD (Psychology)
مستخلص: • A Q-learning algorithm (QL) is proposed to solve the resource allocation problem. • A dynamic programming method and a heuristic algorithm are provided to prove the effectiveness of the QL. • Suggestions are given on how to apply the QL algorithm in practical situations. When a disaster strikes, it is important to allocate limited disaster relief resources to those in need. This paper considers the allocation of resources in humanitarian logistics using three critical performance indicators: efficiency, effectiveness and equity. Three separate costs are considered to represent these metrics, namely, the accessibility-based delivery cost, the starting state-based deprivation cost, and the terminal penalty cost. A mixed-integer nonlinear programming model with multiple objectives and multiple periods is proposed. A Q-learning algorithm, a type of reinforcement learning method, is developed to address the complex optimization problem. The principles of the proposed algorithm, including the learning agent and its actions, the environment and its states, and reward functions, are presented in detail. The parameter settings of the proposed algorithm are also discussed in the experimental section. In addition, the solution quality of the proposed algorithm is compared with that of the exact dynamic programming method and a heuristic algorithm. The experimental results show that the efficiency of the algorithm is better than that of the dynamic programming method and the accuracy of the algorithm is higher than that of the heuristic algorithm. Moreover, the Q-learning algorithm provides close to or even optimal solutions to the resource allocation problem by adjusting the value of the training episode K in practical applications. [ABSTRACT FROM AUTHOR]
قاعدة البيانات: Academic Search Index
الوصف
تدمد:09574174
DOI:10.1016/j.eswa.2021.114663