<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<font face="Lucida Grande">Hi everyone,<br>
<br>
I wanted to share the initial reviews we received on our ICAPS
submission (which I've also attached). Based on the reviews, I
think the paper is unlikely to be accepted, so we are working to
see whether we can get some new results for an IJCAI submission.
We are making good progress on developing hierarchical learning
methods for AMDPs but we need to (a) move to larger/more complex
domains, (b) develop some theoretical analysis (complexity,
correctness, convergence), and (c) work on more AMDP-specific
hierarchy learning techniques (right now we are using an
off-the-shelf method called HierGen that works well but may not
necessarily find the best hierarchy for an AMDP representation).
<br>
<br>
I'd be very interested to talk more about how this relates to the
work that's happening at Brown, and to hear any feedback/ideas you
might have about this work.<br>
<br>
Michael/Stephanie, could we maybe set up a time for the three of
us to have a teleconference? I'll be on vacation next week but
the week after that would be good. Possible times for me -- Mon
1/29 before 11:30am, between 1-2, or after 4pm; Wed 1/31 before
10am or after 2pm; Thu 2/1 between 11-1:30 or 3-4; Fri 2/2 any
time.<br>
<br>
BTW, these are the Brown students who are on this list. Please
let me know if anyone should be added or removed.<br>
<br>
<a class="moz-txt-link-abbreviated" href="mailto:carl_trimbach@brown.edu">carl_trimbach@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:christopher_grimm@brown.edu">christopher_grimm@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:david_abel@brown.edu">david_abel@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:dilip.arumugam@gmail.com">dilip.arumugam@gmail.com</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:edward_c_williams@brown.edu">edward_c_williams@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:jun_ki_lee@brown.edu">jun_ki_lee@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:kcaluru@brown.edu">kcaluru@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:lsw@brown.edu">lsw@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:lucas_lehnert@brown.edu">lucas_lehnert@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:melrose_roderick@brown.edu">melrose_roderick@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:miles_holland@brown.edu">miles_holland@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:nakul_gopalan@brown.edu">nakul_gopalan@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:oberlin@cs.brown.edu">oberlin@cs.brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:sam_saarinen@brown.edu">sam_saarinen@brown.edu</a><br>
<a class="moz-txt-link-abbreviated" href="mailto:siddharth_karamcheti@brown.edu">siddharth_karamcheti@brown.edu</a><br>
<br>
Marie<br>
</font>
<div class="moz-forward-container"><br>
<br>
-------- Forwarded Message --------
<table class="moz-email-headers-table" cellspacing="0"
cellpadding="0" border="0">
<tbody>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">Subject:
</th>
<td>ICAPS 2018 review response (submission [*NUMBER*])</td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">Date: </th>
<td>Thu, 11 Jan 2018 14:59:19 +0100</td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">From: </th>
<td>ICAPS 2018 <a class="moz-txt-link-rfc2396E" href="mailto:icaps2018@easychair.org"><icaps2018@easychair.org></a></td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">To: </th>
<td>Marie desJardins <a class="moz-txt-link-rfc2396E" href="mailto:mariedj@umbc.edu"><mariedj@umbc.edu></a></td>
</tr>
</tbody>
</table>
<br>
<br>
<pre>Dear Marie,
Thank you for your submission to ICAPS 2018. The ICAPS 2018 review
response period starts now and ends at January 13.
During this time, you will have access to the current state of your
reviews and have the opportunity to submit a response. Please keep in
mind the following during this process:
* Most papers have a so-called placeholder review, which was
necessary to give the discussion leaders access to the reviewer
discussion. Some of these reviews list questions that already came
up during the discussion and which you may address in your response but
in all cases the (usually enthusiastic) scores are meaningless and you
should ignore them. Placeholder reviews are clearly indicated as such in
the review.
* Almost all papers have three reviews. Some may have four. A very
low number of papers are missing one review. We hope to get that
review completed in the next day. We apologize for this.
* The deadline for entering a response is January 13th (at 11:59pm
UTC-12 i.e. anywhere in the world).
* Responses must be submitted through EasyChair.
* Responses are limited to 1000 words in total. You can only enter
one response, not one per review.
* You will not be able to change your response after it is submitted.
* The response must focus on any factual errors in the reviews and any
questions posed by the reviewers. Try to be as concise and as to the
point as possible.
* The review response period is an opportunity to react to the
reviews, but not a requirement to do so. Thus, if you feel the reviews
are accurate and the reviewers have not asked any questions, then you
do not have to respond.
* The reviews are as submitted by the PC members, without much
coordination between them. Thus, there may be inconsistencies.
Furthermore, these are not the final versions of the reviews. The
reviews can later be updated to take into account the discussions at
the program committee meeting, and we may find it necessary to solicit
other outside reviews after the review response period.
* The program committee will read your responses carefully and
take this information into account during the discussions. On the
other hand, the program committee may not directly respond to your
responses in the final versions of the reviews.
The reviews on your paper are attached to this letter. To submit your
response you should log on the EasyChair Web page for ICAPS 2018 and
select your submission on the menu.
----------------------- REVIEW 1 ---------------------
PAPER: 46
TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes
AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins
Significance: 2 (modest contribution or average impact)
Soundness: 3 (correct)
Scholarship: 3 (excellent coverage of related work)
Clarity: 3 (well written)
Reproducibility: 3 (authors describe the implementation and domains in sufficient detail)
Overall evaluation: 1 (weak accept)
Reviewer's confidence: 2 (medium)
Suitable for a demo?: 1 (no)
Nominate for Best Paper Award: 1 (no)
Nominate for Best Student Paper Award (if eligible): 1 (no)
[Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))
[Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))
[Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))
[Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))
[Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))
[Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))
----------- Review -----------
The paper proposes a method for learning abstract Markov decision processes (AMDP) from demonstration trajectories and model based reinforcement learning. Experiments show that the method is more effective than the baseline.
On the positive side, a complete method for learning AMDP is given and is shown to be work on the problems used in the experiments. The proposed model based reinforcement learning method based on R-MAX is also shown to outperform the baseline R-MAXQ.
On the negative side, the method for learning the hierarchy, HierGen, is taken from a prior work, leaving the adaptation of R-MAX to learn with hierarchy as the main algorithmic novelty. No convergence proof for the earning method is provided, although it is empirically shown to outperform the baseline R-MAXQ. The experiments are done on toy problems, indicating that the method is probably not ready for more demanding practical problems.
Overall, I am inclined to vote weak accept. The problem is difficult, so I think that the work does represent progress, although it is not yet compelling.
----------------------- REVIEW 2 ---------------------
PAPER: 46
TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes
AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins
Significance: 2 (modest contribution or average impact)
Soundness: 3 (correct)
Scholarship: 2 (relevant literature cited but could be expanded)
Clarity: 3 (well written)
Reproducibility: 3 (authors describe the implementation and domains in sufficient detail)
Overall evaluation: -1 (weak reject)
Reviewer's confidence: 4 (expert)
Suitable for a demo?: 2 (maybe)
Nominate for Best Paper Award: 1 (no)
Nominate for Best Student Paper Award (if eligible): 1 (no)
[Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))
[Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))
[Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))
[Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))
[Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))
[Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))
----------- Review -----------
The authors introduce a reinforcement learning algorithm for AMDPs that learns a hierarchical structure and a set of hierarchical models. To learn the hierarchical structure, they rely on an existing algorithm called HierGen. This algorithm extracts causal structure from a set of expert trajectories in a factored state environment.
While R-AMDP outperforms R-MAXQ on the two toy problems, I think there is a lot more work to do to show that R-AMDP is a good basis for developing more general algorithms. First, it would be nice to examine the computational complexity of R-AMDP (rather than just empirical comparison in Figure 3). Second, what if R-AMDP is just getting lucky in the two toy tasks presented. Maybe there are other problems where R-AMDP performs poorly. Further, stopping the plots at 50 or 60 trials may just be misleading since R-AMDP could be converging to a suboptimal but pretty good policy early on. It’s also not clear that R-AMDP can be scaled to huge state or action spaces. Does the hierarchical structure discovered by HierGen lend itself to transfer when the dynamics change? It would be nice to have a more rigorous analysis of R-AMDP and a longer discussion of its potential pitfalls (when should we expected it to succeed and when should it fail?). There is a hind of this in the discussio!
n about HierGen’s inability to distinguish between correlation and causation.
While reading the abstract I expected the contribution to be in learning the hierarchy. The authors should probably change the abstract to avoid this confusion.
----------------------- REVIEW 3 ---------------------
PAPER: 46
TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes
AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins
Significance: 3 (substantial contribution or strong impact)
Soundness: 3 (correct)
Scholarship: 3 (excellent coverage of related work)
Clarity: 3 (well written)
Reproducibility: 5 (code and domains (whichever apply) are already publicly available)
Overall evaluation: 3 (strong accept)
Reviewer's confidence: 4 (expert)
Suitable for a demo?: 3 (yes)
Nominate for Best Paper Award: 1 (no)
Nominate for Best Student Paper Award (if eligible): 1 (no)
[Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))
[Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))
[Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))
[Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))
[Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))
[Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))
----------- Review -----------
This is only a placeholder review. Please ignore it.
----------------------- REVIEW 4 ---------------------
PAPER: 46
TITLE: Learning Abstracted Models and Hierarchies of Markov Decision Processes
AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie Milani, Shane Parr and Marie desJardins
Significance: 2 (modest contribution or average impact)
Soundness: 2 (minor inconsistencies or small fixable errors)
Scholarship: 3 (excellent coverage of related work)
Clarity: 1 (hard to follow)
Reproducibility: 2 (some details missing but still appears to be replicable with some effort)
Overall evaluation: -1 (weak reject)
Reviewer's confidence: 3 (high)
Suitable for a demo?: 2 (maybe)
Nominate for Best Paper Award: 1 (no)
Nominate for Best Student Paper Award (if eligible): 1 (no)
[Applications track ONLY]: Importance and novelty of the application: 6 (N/A (not an Applications track paper))
[Applications track ONLY]: Importance of planning/scheduling technology to the solution of the problem: 5 (N/A (not an Applications track paper))
[Applications track ONLY] Maturity: 7 (N/A (not an Applications track paper))
[Robotics track ONLY]: Balance of Robotics and Automated Planning and Scheduling: 6 (N/A (not a Robotics track paper))
[Robotics Track ONLY]: Evaluation on physical platforms/simulators: 6 (N/A (not a Robotics track paper))
[Robotics Track ONLY]: Significance of the contribution: 6 (N/A (not a Robotics track paper))
----------- Review -----------
The paper describes an approach for learning abstract models and hierarchies for hierarchies of AMDPs. These hierarchies are similar, if not exactly the same, as those used by frameworks such as MAXQ, where each task in the hierarchy is an MDP with actions corresponding to child tasks. Prior AMDP work apparently uses hand-specified models of each task/AMDP, which are directly used for planning. This paper extends that work by learning the models of each task/AMDP. This is done using RMAX at each task. There is not a discussion of convergence guarantees of the approach. Apparently convergence must occur in a bottom-up way. Experiments are shown in two domains and with two hierarchies in one of the domains (Taxi). The approach appears to learn more efficiently than a prior approach R-MAXQ. The exact reasons for the increased efficiency were not exactly clear based on my understanding from the paper.
The paper is well-written at a high level, but the more technical and formal descriptions could be improved quite a bit. For example, the key object AMDP, is only described informally (the tuple is not described in detail). Most of the paper is written quite informally. Another example is that Table 1 talks about "max planner rollouts", but I didn't see where rollouts are used anywhere in the algorithm description.
After reading the abstract and introduction, I expected that a big part of the contribution would be about actually learning the hierarchy. However, that does not seem to be the case. Rather, an off-the-shelf approach is used to learn hierarchies and then plugged into the proposed algorithm for learning the models of tasks. Further, this is only tried for one of the two experimental domains. The abstract and introduction should be more clear about the contributions of the paper.
Overall, I was unclear about what to learn from the paper. The main contribution is apparently algorithm 1, which uses R-MAX to learn the models of each AMPD in a given hierarchy. Perhaps this is a novel algorithm, but it feels like more of a baseline in the sense that it is the first thing that one might try given the problem setup. I may not be appreciating some type of complexity that makes this not be straightforward. This baseline approach would have been more interesting if some form of convergence result was provided, similar to what was provided for R-MAXQ.
The experiments show that R-AMDP learns faster and is more computationally efficient than R-MAXQ. I was unable to get a good understanding for why this was the case. This is likely due to the fact that I was not able to revisit the R-MAXQ algorithm and it was not described in detail in this paper. The authors do try to explain the reasons for the performance improvement, but I was unable to follow exactly. My best guess based on the discussion is that R-MAXQ does not try to exploit the state abstraction provided for each task by the hierarchy ("R-MAXQ must compute a model over all possible future states in a planning envelope after each action"). Is this the primary reason or is there some other reason? Adding the ability to exploit abstractions in R-MAXQ seems straightforward, though maybe I'm missing something.
------------------------------------------------------
Best wishes,
Gabi Röger and Sven Koenig
ICAPS 2018 program chairs
</pre>
</div>
</body>
</html>