To realise the intelligent decision-making of dynamic scheduling and reconfiguration, we studied the intelligent scheduling and reconfiguration with dynamic job arrival for a reconfigurable flow line (RFL) using deep reinforcement learning (DRL), for the first time. The system architecture of intelligent scheduling and reconfiguration in smart manufacturing is proposed, and the mathematical model is established to minimise total tardiness cost. In addition, a DRL system of scheduling and reconfiguration is proposed by designing state features, actions, and rewards for scheduling and reconfiguration agents. Moreover, the advantage actor-critic (A2C) is adapted to solve the studied problem. The training curve shows the A2C-based agents have effectively learned to generate better solutions for unseen instances. The test results show that the A2C-based approach outperforms two traditional meta-heuristics, iterated greedy (IG) and genetic algorithm (GA), in solution quality and CPU times by a large margin. Specifically, the A2C-based approach outperforms IG and GA by 57.43% and 88.30%, using only 0.46 and 2.20 CPU times of IG and GA. The trained model can generate a scheduling or reconfiguration decision within 1.47 ms, which is almost instantaneous and can satisfy real-time optimisation. Our work shows a promising prospect of using DRL for intelligent scheduling and reconfiguration.
This work is published on International Journal of Production Research (2021):1-18.