BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20260619T114000EDT-5397OVStxF@132.216.98.100 DTSTAMP:20260619T154000Z DESCRIPTION:Informal Systems Seminar (ISS)\, Centre for Intelligent Machine s (CIM) and Groupe d'Etudes et de Recherche en Analyse des Decisions (GERA D)\n\nSpeaker: Amit Sinha\n \n ** Note that this is a hybrid event.\n ** This  seminar will be projected at McConnell 437 at 91Ë¿¹ÏÊÓÆµÂ University.\n \n Zoom Link\n Meeting ID: 845 1388 1004       \n Passcode: VISS\n \n Abstract: The tr aditional approach to POMDPs is to convert them into fully observed MDPs b y considering a belief state as an information state. However\, a belief-s tate based approach requires perfect knowledge of the system dynamics and is therefore not applicable in the learning setting where the system model is unknown. Various approaches to circumvent this limitation have been pr oposed in the literature. A unified treatment of these approaches involves considering the 'agent state'\, which is a model-free\, recursively updat eable function of the observation history. Some examples of an agent state include frame stacking and recurrent neural networks. Since the agent sta te is model-free\, it is used to adapt standard RL algorithms to POMDPs. H owever\, standard RL algorithms like Q-learning learn a deterministic stat ionary policy. Since the agent state is not an information state\, we cann ot apply the same results for MDPs and thus\, we must first consider what happens with the different policy classes: stationary/non-stationary and d eterministic/stochastic. Our main thesis that we illustrate via examples i s that because the agent state is not information state\, non-stationary a gent-state based policies can outperform stationary ones. To leverage this feature\, we propose PASQL (periodic agent-state based Q-learning)\, whic h is a variant of agent-state-based Q-learning that learns periodic polici es. By combining ideas from periodic Markov chains and stochastic approxim ation\, we rigorously establish that PASQL converges to a cyclic limit and characterize the approximation error of the converged periodic policy. Fi nally\, we present a numerical experiment to highlight the salient feature s of PASQL and demonstrate the benefit of learning periodic policies over stationary policies.\n \n Affiliation: Amit Sinha is a PhD candidate in the Department of Electrical and Computer Engineering\, 91Ë¿¹ÏÊÓÆµ.\n DTSTART:20240704T140000Z DTEND:20240704T150000Z LOCATION:Zames Seminar Room\, MC 437\, McConnell Engineering Building\, CA\ , QC\, Montreal\, H3A 0E9\, 3480 rue University SUMMARY:Periodic agent-state based Q-learning for POMDPs URL:/cim/channels/event/periodic-agent-state-based-q-l earning-pomdps-357836 END:VEVENT END:VCALENDAR