<html><strong>PhD position in Epistemic Reasoning and Learning<br />Institut de Recherche en Informatique de Toulouse (IRIT)<br />Toulouse University<br />France</strong><br /><br />The proposed PhD thesis will be developed in context of the ANR project entitled “Learning through epistemic reinforcement” (EpiRL) which was accepted in July 2022 and will be carried out between 2023 and 2027. The PhD thesis will start by the end of 2023 and will be funded on a three-year contract with gross salary of approximately 2000€ per month. <br /><br /><strong>Description of the research project</strong><br />The need for an integration of machine learning (ML) and knowledge representation (KR) has been largely emphasized in the artificial intelligence (AI) community. According to (Valiant, 2003), a key challenge for computer science is to come up with an integration of the two most fundamental phenomena of intelligence, namely, the ability to learn from experience and the ability to reason from what has been learned. The project will be focused on the integration of epistemic reasoning and reinforcement learning. The integration will handle two aspects:<br />- how to include in the description of a state used in an agent’s reward function and in the action-state transition function the representation of the agents’ knowledge and beliefs;<br />- how to combine an agent’s capacity to attribute beliefs to other agents and to reason strategically with the capacity to form predictions about future events and future agents’ actions based on its past experiences.<br />To this aim, we plan to combine concepts and methods from epistemic logic and planning (Fagin et al., 1995; Bolander & Andersen, 2011; Bolander et al. 2015), single-agent reinforcement learning (Sutton & Barto, 2018) and multi-agent case reinforcement learning (Fudenberg & Levine, 1998; Tuyls & Weiss, 2012), and the epistemic theory of convention (Lewis, 1969).  We expect the kind of integration proposed in the project to be relevant for AI applications in human-machine interaction, given the importance of combining reasoning and learning as well as prediction and explanation for such applications. <br /><br /><strong>Integration</strong><br />Traditional models used in reinforcement learning are Markov decision processes (MDPs) and their multi-agent extension, so-called Markov games. The first aspect of integration will consist in adding information about the agents’ epistemic attitudes (i.e., their knowledge and beliefs) to the state description in a MDP which represents the interaction between the agents. The importance of this integration is evident in the context of conversational interaction whereby agents interact through communication. In such context the result of an agent’s communicative action depends not only on the properties of the environment but also on the interlocutor’s cognitive state. Therefore, the probabilistic state transition function in the MDP should take the latter into consideration in order to improve and speed up the agents' learning. For example, an agent <em>a</em> (Ann) might want to persuade another agent<em> b </em>(Bob) that cycling to work is better than driving. The first agent has different persuasive strategies at its disposal. It could leverage the fact that using the bike is less expensive than using the car or leverage the health aspects of using the bike. It could also evoke ethical reasons (e.g., pollution reduction). The success of agent <em>a</em>’s communicative action in persuading agent<em> b </em>depends on agent <em>a</em>’s beliefs about the actions’ executability preconditions (e.g.,  whether there are bike lanes to get from home to work) and agent <em>b</em>’s preferences ordering over the different outcomes (e.g., economic benefit, health benefit, environment protection). The second aspect of integration will consist in exploiting an agent's inferential capability for discarding those actions that, according to the agent’s beliefs (i.e., what the agent can infer), violate certain norms or constraints if they are executed a given state. The agent will not need to learn the value of those actions, but it will simply exclude them from the action selection process. In order to achieve the integration at the implementation level and not simply at a conceptual level, we will combine a reinforcement learning  algorithm such as Q-learning with a decision procedure for satisfiability checking or model checking for the logical language used to represent the agents’ epistemic attitudes.<br /><br /><strong>Methodology</strong><br />The integration proposed during the project will rely on a formal language for representing epistemic attitudes of agents. Such language will be interpreted on a semantics which exploits knowledge bases (Lorini 2018, 2019, 2020; Lorini & Romero 2019; Lorini & Song 2023; Lorini & Rapion 2022). It will be used for specifying the agents’ communicative goals and actions as well as the information about the agents’ knowledge and beliefs to be used for describing a state in a MDP. The major advantage of the knowledge base semantics is its succintness which makes it well-suited for formal verification and planning specification in real applications. It has been successfully used in the ANR project CoPains on «Cognitive Planning in Persuasive Multimodal Communication» 2019-2023 (https://www.irit.fr/CoPains/) in which a planning module for a virtual conversational agent was proposed (Fernandez Davila et al. 2021, 2022; Lorini et al. 2022).<br /><br /><strong>References</strong><br />T. Bolander, M. B. Andersen (2011). Epistemic planning for single- and multi-agent systems. Journal of Applied Non-Classical Logics 21(1): 9–34. <br />T. Bolander, M. Holm Jensen, F. Schwarzentruber (2015). Complexity Results in Epistemic Planning. In Pro- ceedings of the Twenty-Fourth International Joint Confer- ence on Artificial Intelligence (IJCAI 2015), 2791–2797. AAAI Press.<br />J. L. Fernandez Davila, D. Longin, E. Lorini, F. Maris (2021). A Simple Framework for Cognitive Planning. In Proceedings of AAAI-21, pp. 6331-6339. <br />J. L. Fernandez Davila, D. Longin, E. Lorini, F. Maris (2022). An Implemented System for Cognitive Planning. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence (ICAART 2022), SCITEPRESS, pp. 492–499. <br />R. Fagin, J. Y. Halpern, Y. Moses, and M. Vardi (1995). Reasoning about Knowledge. MIT Press, Cambridge.<br />D. Fudenberg, D. K. Levine (1998). The Theory of Learning in Games. MIT Press, Cambridge. <br />D. K. Lewis (1969). Convention: a philosophical study. Harvard University Press, Cambridge.<br />E. Lorini (2020). Rethinking epistemic logic with belief bases. Artificial Intelligence, 282.<br />E. Lorini (2018). In Praise of Belief Bases: Doing Epistemic Logic without Possible Worlds. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), AAAI Press, pp. 1915-1922. <br />E. Lorini, E. Perrotin, F. Schwarzentruber (2022). Epistemic Actions: Comparing Multi-agent Belief Bases with Action Models. In Proceedings of the 19th International Conference on Principles of Knowledge Representation and Reasoning (KR 2022). <br />E. Lorini, N. Sabouret, B. Ravenet, J. Fernandez Davila, C. Clavel (2022). Cognitive Planning in Motivational Interviewing. In Proceedings of the 14th International Conference on Agents and Artificial Intelligence (ICAART 2022), SCITEPRESS, pp. 508–517. <br />E. Lorini, P. Song (2023). A Computationally Grounded Logic of Awareness. Journal of Logic and Computation, https://doi.org/ 10.1093/logcom/exac035. <br />E. Lorini, F. Romero (2019). Decision Procedures for Epistemic Logic Exploiting Belief Bases. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2019), ACM, pp. 944- 952. <br />E. Lorini, E. Rapion (2022). Logical Theories of Collective Attitudes and the Belief Base Perspective. In Proceedings of the 21st International Conference on Autonomous Agents and MultiAgent Systems (AAMAS 2022), IFAAMAS, pp. 833-841. <br />R. S. Sutton, A. G. Barto (2018). Reinforcement learning: An Introduction  (second edition). MIT Press, Cambridge.<br />K. Tuyls and G. Weiss (2012). Multiagent Learning: Basics, Challenges, and Prospects. AI Magazine, 33(3):41. <br />L. G. Valiant (2003). Three Problems in Computer Science. Journal of the ACM, 50(1):96-99, 2003. <br /><br /><strong>Candidate profile</strong><br />The PhD is at the intersection of logic, game theory and machine learning. The ideal candidate should have a strong mathematical background and a master’s degree in Logic, Computer Science or Mathematics. Ideally, it should be familiar with propositional logic, epistemic and temporal logics as well as with basic notions of game theory and machine learning.  <br /><br /><strong>PhD supervisor</strong><br />The PhD supervisor is Emiliano Lorini, CNRS research director at the Institut de Recherche en Informatique de Toulouse (IRIT). See https://www.irit.fr/~Emiliano.Lorini/ for more information. <br /><br /><strong>How to apply</strong><br />Please email your detailed CV, a motivation letter, and transcripts of bachelor's degree and master’s degree to Emiliano.Lorini@irit.fr. Samples of published research by the candidate and reference letters will be a plus.<br /><br /><strong>APPLICATION DEADLINE FOR FULL CONSIDERATION: October 20th, 2023.</strong><br /> </html>