[Robot-learning] Fwd: ICAPS 2018 review response (submission [*NUMBER*])
Littman, Michael
mlittman at cs.brown.edu
Mon Jan 22 12:29:58 EST 2018
I can do 2/2 @ 11am (or several other times).
On Mon, Jan 22, 2018 at 11:38 AM, Marie desJardins <mariedj at umbc.edu> wrote:
> Fri 2/2 would be good for a phone call with Stefanie and Michael. (Does
> that work for both of you? -- if so, what time is good? I'm fairly
> unconstrained.)
>
> Fri 2/2 won't work for a larger group meeting, though -- John will be at
> AAAI. I'll be traveling on Fri 2/9 but John should be back by then, so
> maybe we could plan a joint group meeting that day -- do you still have
> your regular meetings on Fridays?
>
> Marie
>
>
> On 1/21/18 11:23 AM, Stefanie Tellex wrote:
>
> I agree, for after the winter deadlines.
>
> Stefanie
>
> On 01/21/2018 04:51 AM, Lawson Wong wrote:
>
> So have kcaluru at brown.edu <mailto:kcaluru at brown.edu> <kcaluru at brown.edu>
> and miles_holland at brown.edu <mailto:miles_holland at brown.edu>
> <miles_holland at brown.edu>
>
> The reviews look like typical planning-community reviews -- generally
> sensible requests but clearly impossible to accomplish within the page
> limit. I guess it's generally hard to please planning reviewers unless
> there are some theoretical results. Review 2 actually reads a little like
> one that Nakul got for his paper...
>
> I don't know if Michael and Stefanie have answered separately regarding a
> meeting; it certainly sounds helpful to continue discussing (AMDP)
> hierarchy learning. Both the IJCAI and RSS deadlines are on that week (1/31
> and 2/1 respectively), so if possible it may be best to meet after those
> deadlines, such as on Fri 2/2 -- unless the intent was to discuss before
> the IJCAI deadline.
>
> -Lawson
>
>
> On Sat, Jan 20, 2018 at 6:04 AM, Littman, Michael <mlittman at cs.brown.edu
> <mailto:mlittman at cs.brown.edu> <mlittman at cs.brown.edu>> wrote:
>
> christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu>
> <christopher_grimm at brown.edu> has
> graduated.
>
>
> On Fri, Jan 19, 2018 at 12:06 PM, Marie desJardins
> <mariedj at cs.umbc.edu <mailto:mariedj at cs.umbc.edu>
> <mariedj at cs.umbc.edu>> wrote:
>
> Hi everyone,
>
> I wanted to share the initial reviews we received on our ICAPS
> submission (which I've also attached). Based on the reviews, I
> think the paper is unlikely to be accepted, so we are working to
> see whether we can get some new results for an IJCAI submission.
> We are making good progress on developing hierarchical learning
> methods for AMDPs but we need to (a) move to larger/more complex
> domains, (b) develop some theoretical analysis (complexity,
> correctness, convergence), and (c) work on more AMDP-specific
> hierarchy learning techniques (right now we are using an
> off-the-shelf method called HierGen that works well but may not
> necessarily find the best hierarchy for an AMDP representation).
>
> I'd be very interested to talk more about how this relates to
> the work that's happening at Brown, and to hear any
> feedback/ideas you might have about this work.
>
> Michael/Stephanie, could we maybe set up a time for the three of
> us to have a teleconference? I'll be on vacation next week but
> the week after that would be good. Possible times for me -- Mon
> 1/29 before 11:30am, between 1-2, or after 4pm; Wed 1/31 before
> 10am or after 2pm; Thu 2/1 between 11-1:30 or 3-4; Fri 2/2 any
> time.
>
> BTW, these are the Brown students who are on this list. Please
> let me know if anyone should be added or removed.
>
> carl_trimbach at brown.edu <mailto:carl_trimbach at brown.edu>
> <carl_trimbach at brown.edu>
> christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu>
> <christopher_grimm at brown.edu>
> david_abel at brown.edu <mailto:david_abel at brown.edu>
> <david_abel at brown.edu>
> dilip.arumugam at gmail.com <mailto:dilip.arumugam at gmail.com>
> <dilip.arumugam at gmail.com>
> edward_c_williams at brown.edu <mailto:edward_c_williams at brown.edu>
> <edward_c_williams at brown.edu>
> jun_ki_lee at brown.edu <mailto:jun_ki_lee at brown.edu>
> <jun_ki_lee at brown.edu>
> kcaluru at brown.edu <mailto:kcaluru at brown.edu> <kcaluru at brown.edu>
> lsw at brown.edu <mailto:lsw at brown.edu> <lsw at brown.edu>
> lucas_lehnert at brown.edu <mailto:lucas_lehnert at brown.edu>
> <lucas_lehnert at brown.edu>
> melrose_roderick at brown.edu <mailto:melrose_roderick at brown.edu>
> <melrose_roderick at brown.edu>
> miles_holland at brown.edu <mailto:miles_holland at brown.edu>
> <miles_holland at brown.edu>
> nakul_gopalan at brown.edu <mailto:nakul_gopalan at brown.edu>
> <nakul_gopalan at brown.edu>
> oberlin at cs.brown.edu <mailto:oberlin at cs.brown.edu>
> <oberlin at cs.brown.edu>
> sam_saarinen at brown.edu <mailto:sam_saarinen at brown.edu>
> <sam_saarinen at brown.edu>
> siddharth_karamcheti at brown.edu
> <mailto:siddharth_karamcheti at brown.edu>
> <siddharth_karamcheti at brown.edu>
>
> Marie
>
>
> -------- Forwarded Message --------
> Subject: ICAPS 2018 review response (submission [*NUMBER*])
> Date: Thu, 11 Jan 2018 14:59:19 +0100
> From: ICAPS 2018 <icaps2018 at easychair.org>
> <icaps2018 at easychair.org>
> <mailto:icaps2018 at easychair.org> <icaps2018 at easychair.org>
> To: Marie desJardins <mariedj at umbc.edu> <mariedj at umbc.edu>
> <mailto:mariedj at umbc.edu> <mariedj at umbc.edu>
>
>
>
> Dear Marie,
>
> Thank you for your submission to ICAPS 2018. The ICAPS 2018 review
> response period starts now and ends at January 13.
>
> During this time, you will have access to the current state of
> your
> reviews and have the opportunity to submit a response. Please
> keep in
> mind the following during this process:
>
> * Most papers have a so-called placeholder review, which was
> necessary to give the discussion leaders access to the reviewer
> discussion. Some of these reviews list questions that already
> came
> up during the discussion and which you may address in your
> response but
> in all cases the (usually enthusiastic) scores are meaningless
> and you
> should ignore them. Placeholder reviews are clearly indicated
> as such in
> the review.
>
> * Almost all papers have three reviews. Some may have four. A very
> low number of papers are missing one review. We hope to get
> that
> review completed in the next day. We apologize for this.
>
> * The deadline for entering a response is January 13th (at 11:59pm
> UTC-12 i.e. anywhere in the world).
>
> * Responses must be submitted through EasyChair.
>
> * Responses are limited to 1000 words in total. You can only enter
> one response, not one per review.
>
> * You will not be able to change your response after it is
> submitted.
>
> * The response must focus on any factual errors in the reviews and
> any
> questions posed by the reviewers. Try to be as concise and as
> to the
> point as possible.
>
> * The review response period is an opportunity to react to the
> reviews, but not a requirement to do so. Thus, if you feel the
> reviews
> are accurate and the reviewers have not asked any questions,
> then you
> do not have to respond.
>
> * The reviews are as submitted by the PC members, without much
> coordination between them. Thus, there may be inconsistencies.
> Furthermore, these are not the final versions of the reviews.
> The
> reviews can later be updated to take into account the
> discussions at
> the program committee meeting, and we may find it necessary to
> solicit
> other outside reviews after the review response period.
>
> * The program committee will read your responses carefully and
> take this information into account during the discussions. On
> the
> other hand, the program committee may not directly respond to
> your
> responses in the final versions of the reviews.
>
> The reviews on your paper are attached to this letter. To submit
> your
> response you should log on the EasyChair Web page for ICAPS 2018
> and
> select your submission on the menu.
>
> ----------------------- REVIEW 1 ---------------------
> PAPER: 46
> TITLE: Learning Abstracted Models and Hierarchies of Markov
> Decision Processes
> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
> Milani, Shane Parr and Marie desJardins
>
> Significance: 2 (modest contribution or average impact)
> Soundness: 3 (correct)
> Scholarship: 3 (excellent coverage of related work)
> Clarity: 3 (well written)
> Reproducibility: 3 (authors describe the implementation and
> domains in sufficient detail)
> Overall evaluation: 1 (weak accept)
> Reviewer's confidence: 2 (medium)
> Suitable for a demo?: 1 (no)
> Nominate for Best Paper Award: 1 (no)
> Nominate for Best Student Paper Award (if eligible): 1 (no)
> [Applications track ONLY]: Importance and novelty of the
> application: 6 (N/A (not an Applications track paper))
> [Applications track ONLY]: Importance of planning/scheduling
> technology to the solution of the problem: 5 (N/A (not an Applications
> track paper))
> [Applications track ONLY] Maturity: 7 (N/A (not an Applications
> track paper))
> [Robotics track ONLY]: Balance of Robotics and Automated Planning
> and Scheduling: 6 (N/A (not a Robotics track paper))
> [Robotics Track ONLY]: Evaluation on physical
> platforms/simulators: 6 (N/A (not a Robotics track paper))
> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
> (not a Robotics track paper))
>
> ----------- Review -----------
> The paper proposes a method for learning abstract Markov decision
> processes (AMDP) from demonstration trajectories and model based
> reinforcement learning. Experiments show that the method is more effective
> than the baseline.
>
> On the positive side, a complete method for learning AMDP is given
> and is shown to be work on the problems used in the experiments. The
> proposed model based reinforcement learning method based on R-MAX is also
> shown to outperform the baseline R-MAXQ.
>
> On the negative side, the method for learning the hierarchy,
> HierGen, is taken from a prior work, leaving the adaptation of R-MAX to
> learn with hierarchy as the main algorithmic novelty. No convergence proof
> for the earning method is provided, although it is empirically shown to
> outperform the baseline R-MAXQ. The experiments are done on toy problems,
> indicating that the method is probably not ready for more demanding
> practical problems.
>
> Overall, I am inclined to vote weak accept. The problem is
> difficult, so I think that the work does represent progress, although it is
> not yet compelling.
>
> ----------------------- REVIEW 2 ---------------------
> PAPER: 46
> TITLE: Learning Abstracted Models and Hierarchies of Markov
> Decision Processes
> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
> Milani, Shane Parr and Marie desJardins
>
> Significance: 2 (modest contribution or average impact)
> Soundness: 3 (correct)
> Scholarship: 2 (relevant literature cited but could be expanded)
> Clarity: 3 (well written)
> Reproducibility: 3 (authors describe the implementation and
> domains in sufficient detail)
> Overall evaluation: -1 (weak reject)
> Reviewer's confidence: 4 (expert)
> Suitable for a demo?: 2 (maybe)
> Nominate for Best Paper Award: 1 (no)
> Nominate for Best Student Paper Award (if eligible): 1 (no)
> [Applications track ONLY]: Importance and novelty of the
> application: 6 (N/A (not an Applications track paper))
> [Applications track ONLY]: Importance of planning/scheduling
> technology to the solution of the problem: 5 (N/A (not an Applications
> track paper))
> [Applications track ONLY] Maturity: 7 (N/A (not an Applications
> track paper))
> [Robotics track ONLY]: Balance of Robotics and Automated Planning
> and Scheduling: 6 (N/A (not a Robotics track paper))
> [Robotics Track ONLY]: Evaluation on physical
> platforms/simulators: 6 (N/A (not a Robotics track paper))
> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
> (not a Robotics track paper))
>
> ----------- Review -----------
> The authors introduce a reinforcement learning algorithm for AMDPs
> that learns a hierarchical structure and a set of hierarchical models. To
> learn the hierarchical structure, they rely on an existing algorithm called
> HierGen. This algorithm extracts causal structure from a set of expert
> trajectories in a factored state environment.
>
> While R-AMDP outperforms R-MAXQ on the two toy problems, I think
> there is a lot more work to do to show that R-AMDP is a good basis for
> developing more general algorithms. First, it would be nice to examine the
> computational complexity of R-AMDP (rather than just empirical comparison
> in Figure 3). Second, what if R-AMDP is just getting lucky in the two toy
> tasks presented. Maybe there are other problems where R-AMDP performs
> poorly. Further, stopping the plots at 50 or 60 trials may just be
> misleading since R-AMDP could be converging to a suboptimal but pretty good
> policy early on. It’s also not clear that R-AMDP can be scaled to huge
> state or action spaces. Does the hierarchical structure discovered by
> HierGen lend itself to transfer when the dynamics change? It would be nice
> to have a more rigorous analysis of R-AMDP and a longer discussion of its
> potential pitfalls (when should we expected it to succeed and when should
> it fail?). There is a hind of this in the discussio!
> n about HierGen’s inability to distinguish between correlation
> and causation.
>
> While reading the abstract I expected the contribution to be in
> learning the hierarchy. The authors should probably change the abstract to
> avoid this confusion.
>
> ----------------------- REVIEW 3 ---------------------
> PAPER: 46
> TITLE: Learning Abstracted Models and Hierarchies of Markov
> Decision Processes
> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
> Milani, Shane Parr and Marie desJardins
>
> Significance: 3 (substantial contribution or strong impact)
> Soundness: 3 (correct)
> Scholarship: 3 (excellent coverage of related work)
> Clarity: 3 (well written)
> Reproducibility: 5 (code and domains (whichever apply) are already
> publicly available)
> Overall evaluation: 3 (strong accept)
> Reviewer's confidence: 4 (expert)
> Suitable for a demo?: 3 (yes)
> Nominate for Best Paper Award: 1 (no)
> Nominate for Best Student Paper Award (if eligible): 1 (no)
> [Applications track ONLY]: Importance and novelty of the
> application: 6 (N/A (not an Applications track paper))
> [Applications track ONLY]: Importance of planning/scheduling
> technology to the solution of the problem: 5 (N/A (not an Applications
> track paper))
> [Applications track ONLY] Maturity: 7 (N/A (not an Applications
> track paper))
> [Robotics track ONLY]: Balance of Robotics and Automated Planning
> and Scheduling: 6 (N/A (not a Robotics track paper))
> [Robotics Track ONLY]: Evaluation on physical
> platforms/simulators: 6 (N/A (not a Robotics track paper))
> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
> (not a Robotics track paper))
>
> ----------- Review -----------
> This is only a placeholder review. Please ignore it.
>
> ----------------------- REVIEW 4 ---------------------
> PAPER: 46
> TITLE: Learning Abstracted Models and Hierarchies of Markov
> Decision Processes
> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
> Milani, Shane Parr and Marie desJardins
>
> Significance: 2 (modest contribution or average impact)
> Soundness: 2 (minor inconsistencies or small fixable errors)
> Scholarship: 3 (excellent coverage of related work)
> Clarity: 1 (hard to follow)
> Reproducibility: 2 (some details missing but still appears to be
> replicable with some effort)
> Overall evaluation: -1 (weak reject)
> Reviewer's confidence: 3 (high)
> Suitable for a demo?: 2 (maybe)
> Nominate for Best Paper Award: 1 (no)
> Nominate for Best Student Paper Award (if eligible): 1 (no)
> [Applications track ONLY]: Importance and novelty of the
> application: 6 (N/A (not an Applications track paper))
> [Applications track ONLY]: Importance of planning/scheduling
> technology to the solution of the problem: 5 (N/A (not an Applications
> track paper))
> [Applications track ONLY] Maturity: 7 (N/A (not an Applications
> track paper))
> [Robotics track ONLY]: Balance of Robotics and Automated Planning
> and Scheduling: 6 (N/A (not a Robotics track paper))
> [Robotics Track ONLY]: Evaluation on physical
> platforms/simulators: 6 (N/A (not a Robotics track paper))
> [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
> (not a Robotics track paper))
>
> ----------- Review -----------
> The paper describes an approach for learning abstract models and
> hierarchies for hierarchies of AMDPs. These hierarchies are similar, if not
> exactly the same, as those used by frameworks such as MAXQ, where each task
> in the hierarchy is an MDP with actions corresponding to child tasks. Prior
> AMDP work apparently uses hand-specified models of each task/AMDP, which
> are directly used for planning. This paper extends that work by learning
> the models of each task/AMDP. This is done using RMAX at each task. There
> is not a discussion of convergence guarantees of the approach. Apparently
> convergence must occur in a bottom-up way. Experiments are shown in two
> domains and with two hierarchies in one of the domains (Taxi). The approach
> appears to learn more efficiently than a prior approach R-MAXQ. The exact
> reasons for the increased efficiency were not exactly clear based on my
> understanding from the paper.
>
> The paper is well-written at a high level, but the more technical
> and formal descriptions could be improved quite a bit. For example, the key
> object AMDP, is only described informally (the tuple is not described in
> detail). Most of the paper is written quite informally. Another example is
> that Table 1 talks about "max planner rollouts", but I didn't see where
> rollouts are used anywhere in the algorithm description.
>
> After reading the abstract and introduction, I expected that a big
> part of the contribution would be about actually learning the hierarchy.
> However, that does not seem to be the case. Rather, an off-the-shelf
> approach is used to learn hierarchies and then plugged into the proposed
> algorithm for learning the models of tasks. Further, this is only tried for
> one of the two experimental domains. The abstract and introduction should
> be more clear about the contributions of the paper.
>
> Overall, I was unclear about what to learn from the paper. The
> main contribution is apparently algorithm 1, which uses R-MAX to learn the
> models of each AMPD in a given hierarchy. Perhaps this is a novel
> algorithm, but it feels like more of a baseline in the sense that it is the
> first thing that one might try given the problem setup. I may not be
> appreciating some type of complexity that makes this not be
> straightforward. This baseline approach would have been more interesting if
> some form of convergence result was provided, similar to what was provided
> for R-MAXQ.
>
>
> The experiments show that R-AMDP learns faster and is more
> computationally efficient than R-MAXQ. I was unable to get a good
> understanding for why this was the case. This is likely due to the fact
> that I was not able to revisit the R-MAXQ algorithm and it was not
> described in detail in this paper. The authors do try to explain the
> reasons for the performance improvement, but I was unable to follow
> exactly. My best guess based on the discussion is that R-MAXQ does not try
> to exploit the state abstraction provided for each task by the hierarchy
> ("R-MAXQ must compute a model over all possible future states in a planning
> envelope after each action"). Is this the primary reason or is there some
> other reason? Adding the ability to exploit abstractions in R-MAXQ seems
> straightforward, though maybe I'm missing something.
>
> ------------------------------------------------------
>
> Best wishes,
> Gabi Röger and Sven Koenig
> ICAPS 2018 program chairs
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
> <Robot-learning at cs.umbc.edu>
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
> <Robot-learning at cs.umbc.edu>
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>
>
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
> --
> Dr. Marie desJardins
> Associate Dean for Academic Affairs
> College of Engineering and Information Technology
> University of Maryland, Baltimore County
> 1000 Hilltop Circle
> Baltimore MD 21250
>
> Email: mariedj at umbc.edu
> Voice: 410-455-3967 <(410)%20455-3967>
> Fax: 410-455-3559 <(410)%20455-3559>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cs.umbc.edu/pipermail/robot-learning/attachments/20180122/bbecade2/attachment-0001.html>
More information about the Robot-learning
mailing list