BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4//
BEGIN:VEVENT
UID:20260619T114000EDT-5397OVStxF@132.216.98.100
DTSTAMP:20260619T154000Z
DESCRIPTION:Informal Systems Seminar (ISS)\, Centre for Intelligent Machine
 s (CIM) and Groupe d'Etudes et de Recherche en Analyse des Decisions (GERA
 D)\n\nSpeaker: Amit Sinha\n	\n	**Â NoteÂ thatÂ thisÂ isÂ aÂ hybridÂ event.\n	**Â This
 Â seminarÂ willÂ beÂ projectedÂ atÂ McConnellÂ 437Â atÂ 91Ë¿¹ÏÊÓÆµÂ University.\n	\n	Zoom 
 Link\n	MeetingÂ ID:Â 845Â 1388Â 1004Â Â Â Â Â Â Â \n	Passcode:Â VISS\n	\n	Abstract: The tr
 aditional approach to POMDPs is to convert them into fully observed MDPs b
 y considering a belief state as an information state. However\, a belief-s
 tate based approach requires perfect knowledge of the system dynamics and 
 is therefore not applicable in the learning setting where the system model
  is unknown. Various approaches to circumvent this limitation have been pr
 oposed in the literature. A unified treatment of these approaches involves
  considering the 'agent state'\, which is a model-free\, recursively updat
 eable function of the observation history. Some examples of an agent state
  include frame stacking and recurrent neural networks. Since the agent sta
 te is model-free\, it is used to adapt standard RL algorithms to POMDPs. H
 owever\, standard RL algorithms like Q-learning learn a deterministic stat
 ionary policy. Since the agent state is not an information state\, we cann
 ot apply the same results for MDPs and thus\, we must first consider what 
 happens with the different policy classes: stationary/non-stationary and d
 eterministic/stochastic. Our main thesis that we illustrate via examples i
 s that because the agent state is not information state\, non-stationary a
 gent-state based policies can outperform stationary ones. To leverage this
  feature\, we propose PASQL (periodic agent-state based Q-learning)\, whic
 h is a variant of agent-state-based Q-learning that learns periodic polici
 es. By combining ideas from periodic Markov chains and stochastic approxim
 ation\, we rigorously establish that PASQL converges to a cyclic limit and
  characterize the approximation error of the converged periodic policy. Fi
 nally\, we present a numerical experiment to highlight the salient feature
 s of PASQL and demonstrate the benefit of learning periodic policies over 
 stationary policies.\n	\n	Affiliation: Amit Sinha is a PhD candidate in the 
 Department of Electrical and Computer Engineering\, 91Ë¿¹ÏÊÓÆµ.\n
DTSTART:20240704T140000Z
DTEND:20240704T150000Z
LOCATION:Zames Seminar Room\, MC 437\, McConnell Engineering Building\, CA\
 , QC\, Montreal\, H3A 0E9\, 3480 rue University
SUMMARY:Periodic agent-state based Q-learning for POMDPs
URL:/cim/channels/event/periodic-agent-state-based-q-l
 earning-pomdps-357836
END:VEVENT
END:VCALENDAR