Comparison of PPO-DRL and A2C-DRL Algorithms for MPPT in Photovoltaic Systems via Buck-Boost Converter
Abstract
This research investigates the effectiveness of two deep reinforcement learning algorithms, Proximal Policy Optimization (PPO) and Advantage Actor-Critic (A2C), in achieving the MPPT for PV systems implemented via a Buck-Boost converter. The algorithms were trained and evaluated under varying environmental conditions, including different levels of irradiance and temperature. The results are presented through duty cycle heatmaps, power output heatmaps, and performance curves for power, voltage, and current. The PPO algorithm demonstrated stable and consistent control across all scenarios, maintaining a nearly constant duty cycle and achieving high power output. In contrast, A2C exhibited more adaptive control behavior, adjusting the duty cycle based on environmental changes, but showed lower power output under weak irradiance. Overall, PPO outperformed A2C in terms of stability, accuracy, and ability to reach the optimal operating point, making it a more suitable choice for MPPT applications in PV systems under dynamic conditions.
Authors

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.