[Robot-learning] Fwd: ICAPS 2018 review response (submission [NUMBER])

Mon Jan 22 12:29:58 EST 2018

I can do 2/2 @ 11am (or several other times).

On Mon, Jan 22, 2018 at 11:38 AM, Marie desJardins <mariedj at umbc.edu> wrote:

> Fri 2/2 would be good for a phone call with Stefanie and Michael.  (Does
> that work for both of you? -- if so, what time is good?  I'm fairly
> unconstrained.)
>
> Fri 2/2 won't work for a larger group meeting, though -- John will be at
> AAAI.  I'll be traveling on Fri 2/9 but John should be back by then, so
> maybe we could plan a joint group meeting that day -- do you still have
> your regular meetings on Fridays?
>
> Marie
>
>
> On 1/21/18 11:23 AM, Stefanie Tellex wrote:
>
> I agree, for after the winter deadlines.
>
> Stefanie
>
> On 01/21/2018 04:51 AM, Lawson Wong wrote:
>
> So have kcaluru at brown.edu <mailto:kcaluru at brown.edu> <kcaluru at brown.edu>
> and miles_holland at brown.edu <mailto:miles_holland at brown.edu>
> <miles_holland at brown.edu>
>
> The reviews look like typical planning-community reviews -- generally
> sensible requests but clearly impossible to accomplish within the page
> limit. I guess it's generally hard to please planning reviewers unless
> there are some theoretical results. Review 2 actually reads a little like
> one that Nakul got for his paper...
>
> I don't know if Michael and Stefanie have answered separately regarding a
> meeting; it certainly sounds helpful to continue discussing (AMDP)
> hierarchy learning. Both the IJCAI and RSS deadlines are on that week (1/31
> and 2/1 respectively), so if possible it may be best to meet after those
> deadlines, such as on Fri 2/2 -- unless the intent was to discuss before
> the IJCAI deadline.
>
> -Lawson
>
>
> On Sat, Jan 20, 2018 at 6:04 AM, Littman, Michael <mlittman at cs.brown.edu
> <mailto:mlittman at cs.brown.edu> <mlittman at cs.brown.edu>> wrote:
>
>     christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu>
> <christopher_grimm at brown.edu> has
>     graduated.
>
>
>     On Fri, Jan 19, 2018 at 12:06 PM, Marie desJardins
>     <mariedj at cs.umbc.edu <mailto:mariedj at cs.umbc.edu>
> <mariedj at cs.umbc.edu>> wrote:
>
>         Hi everyone,
>
>         I wanted to share the initial reviews we received on our ICAPS
>         submission (which I've also attached).  Based on the reviews, I
>         think the paper is unlikely to be accepted, so we are working to
>         see whether we can get some new results for an IJCAI submission.
>         We are making good progress on developing hierarchical learning
>         methods for AMDPs but we need to (a) move to larger/more complex
>         domains, (b) develop some theoretical analysis (complexity,
>         correctness, convergence), and (c) work on more AMDP-specific
>         hierarchy learning techniques (right now we are using an
>         off-the-shelf method called HierGen that works well but may not
>         necessarily find the best hierarchy for an AMDP representation).
>
>         I'd be very interested to talk more about how this relates to
>         the work that's happening at Brown, and to hear any
>         feedback/ideas you might have about this work.
>
>         Michael/Stephanie, could we maybe set up a time for the three of
>         us to have a teleconference?  I'll be on vacation next week but
>         the week after that would be good.  Possible times for me -- Mon
>         1/29 before 11:30am, between 1-2, or after 4pm; Wed 1/31 before
>         10am or after 2pm; Thu 2/1 between 11-1:30 or 3-4; Fri 2/2 any
> time.
>
>         BTW, these are the Brown students who are on this list.  Please
>         let me know if anyone should be added or removed.
>
>         carl_trimbach at brown.edu <mailto:carl_trimbach at brown.edu>
> <carl_trimbach at brown.edu>
>         christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu>
> <christopher_grimm at brown.edu>
>         david_abel at brown.edu <mailto:david_abel at brown.edu>
> <david_abel at brown.edu>
>         dilip.arumugam at gmail.com <mailto:dilip.arumugam at gmail.com>
> <dilip.arumugam at gmail.com>
>         edward_c_williams at brown.edu <mailto:edward_c_williams at brown.edu>
> <edward_c_williams at brown.edu>
>         jun_ki_lee at brown.edu <mailto:jun_ki_lee at brown.edu>
> <jun_ki_lee at brown.edu>
>         kcaluru at brown.edu <mailto:kcaluru at brown.edu> <kcaluru at brown.edu>
>         lsw at brown.edu <mailto:lsw at brown.edu> <lsw at brown.edu>
>         lucas_lehnert at brown.edu <mailto:lucas_lehnert at brown.edu>
> <lucas_lehnert at brown.edu>
>         melrose_roderick at brown.edu <mailto:melrose_roderick at brown.edu>
> <melrose_roderick at brown.edu>
>         miles_holland at brown.edu <mailto:miles_holland at brown.edu>
> <miles_holland at brown.edu>
>         nakul_gopalan at brown.edu <mailto:nakul_gopalan at brown.edu>
> <nakul_gopalan at brown.edu>
>         oberlin at cs.brown.edu <mailto:oberlin at cs.brown.edu>
> <oberlin at cs.brown.edu>
>         sam_saarinen at brown.edu <mailto:sam_saarinen at brown.edu>
> <sam_saarinen at brown.edu>
>         siddharth_karamcheti at brown.edu
>         <mailto:siddharth_karamcheti at brown.edu>
> <siddharth_karamcheti at brown.edu>
>
>         Marie
>
>
>         -------- Forwarded Message --------
>         Subject:     ICAPS 2018 review response (submission [*NUMBER*])
>         Date:     Thu, 11 Jan 2018 14:59:19 +0100
>         From:     ICAPS 2018 <icaps2018 at easychair.org>
> <icaps2018 at easychair.org>
>         <mailto:icaps2018 at easychair.org> <icaps2018 at easychair.org>
>         To:     Marie desJardins <mariedj at umbc.edu> <mariedj at umbc.edu>
> <mailto:mariedj at umbc.edu> <mariedj at umbc.edu>
>
>
>
>         Dear Marie,
>
>         Thank you for your submission to ICAPS 2018. The ICAPS 2018 review
>         response period starts now and ends at January 13.
>
>         During this time, you will have access to the current state of
> your
>         reviews and have the opportunity to submit a response.  Please
> keep in
>         mind the following during this process:
>
>         * Most papers have a so-called placeholder review, which was
>            necessary to give the discussion leaders access to the reviewer
>            discussion. Some of these reviews list questions that already
> came
>            up during the discussion and which you may address in your
> response but
>            in all cases the (usually enthusiastic) scores are meaningless
> and you
>            should ignore them. Placeholder reviews are clearly indicated
> as such in
>            the review.
>
>         * Almost all papers have three reviews. Some may have four. A very
>            low number of papers are missing one review. We hope to get
> that
>            review completed in the next day. We apologize for this.
>
>         * The deadline for entering a response is January 13th (at 11:59pm
>            UTC-12 i.e. anywhere in the world).
>
>         * Responses must be submitted through EasyChair.
>
>         * Responses are limited to 1000 words in total. You can only enter
>            one response, not one per review.
>
>         * You will not be able to change your response after it is
> submitted.
>
>         * The response must focus on any factual errors in the reviews and
> any
>            questions posed by the reviewers. Try to be as concise and as
> to the
>            point as possible.
>
>         * The review response period is an opportunity to react to the
>            reviews, but not a requirement to do so. Thus, if you feel the
> reviews
>            are accurate and the reviewers have not asked any questions,
> then you
>            do not have to respond.
>
>         * The reviews are as submitted by the PC members, without much
>            coordination between them. Thus, there may be inconsistencies.
>            Furthermore, these are not the final versions of the reviews.
> The
>            reviews can later be updated to take into account the
> discussions at
>            the program committee meeting, and we may find it necessary to
> solicit
>            other outside reviews after the review response period.
>
>         * The program committee will read your responses carefully and
>            take this information into account during the discussions. On
> the
>            other hand, the program committee may not directly respond to
> your
>            responses in the final versions of the reviews.
>
>         The reviews on your paper are attached to this letter. To submit
> your
>         response you should log on the EasyChair Web page for ICAPS 2018
> and
>         select your submission on the menu.
>
>         ----------------------- REVIEW 1 ---------------------
>         PAPER: 46
>         TITLE: Learning Abstracted Models and Hierarchies of Markov
> Decision Processes
>         AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
> Milani, Shane Parr and Marie desJardins
>
>         Significance: 2 (modest contribution or average impact)
>         Soundness: 3 (correct)
>         Scholarship: 3 (excellent coverage of related work)
>         Clarity: 3 (well written)
>         Reproducibility: 3 (authors describe the implementation and
> domains in sufficient detail)
>         Overall evaluation: 1 (weak accept)
>         Reviewer's confidence: 2 (medium)
>         Suitable for a demo?: 1 (no)
>         Nominate for Best Paper Award: 1 (no)
>         Nominate for Best Student Paper Award (if eligible): 1 (no)
>         [Applications track ONLY]: Importance and novelty of the
> application: 6 (N/A (not an Applications track paper))
>         [Applications track ONLY]: Importance of planning/scheduling
> technology to the solution of the problem: 5 (N/A (not an Applications
> track paper))
>         [Applications track ONLY] Maturity: 7 (N/A (not an Applications
> track paper))
>         [Robotics track ONLY]: Balance of Robotics and Automated Planning
> and Scheduling: 6 (N/A (not a Robotics track paper))
>         [Robotics Track ONLY]: Evaluation on physical
> platforms/simulators: 6 (N/A (not a Robotics track paper))
>         [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
> (not a Robotics track paper))
>
>         ----------- Review -----------
>         The paper proposes a method for learning abstract Markov decision
> processes (AMDP) from demonstration trajectories and model based
> reinforcement learning. Experiments show that the method is more effective
> than the baseline.
>
>         On the positive side, a complete method for learning AMDP is given
> and is shown to be work on the problems used in the experiments. The
> proposed model based reinforcement learning method based on R-MAX is also
> shown to outperform the baseline R-MAXQ.
>
>         On the negative side, the method for learning the hierarchy,
> HierGen, is taken from a prior work, leaving the adaptation of R-MAX to
> learn with hierarchy as the main algorithmic novelty. No convergence proof
> for the earning method is provided, although it is empirically shown to
> outperform the baseline R-MAXQ. The experiments are done on toy problems,
> indicating that the method is probably not ready for more demanding
> practical problems.
>
>         Overall, I am inclined to vote weak accept. The problem is
> difficult, so I think that the work does represent progress, although it is
> not yet compelling.
>
>         ----------------------- REVIEW 2 ---------------------
>         PAPER: 46
>         TITLE: Learning Abstracted Models and Hierarchies of Markov
> Decision Processes
>         AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
> Milani, Shane Parr and Marie desJardins
>
>         Significance: 2 (modest contribution or average impact)
>         Soundness: 3 (correct)
>         Scholarship: 2 (relevant literature cited but could be expanded)
>         Clarity: 3 (well written)
>         Reproducibility: 3 (authors describe the implementation and
> domains in sufficient detail)
>         Overall evaluation: -1 (weak reject)
>         Reviewer's confidence: 4 (expert)
>         Suitable for a demo?: 2 (maybe)
>         Nominate for Best Paper Award: 1 (no)
>         Nominate for Best Student Paper Award (if eligible): 1 (no)
>         [Applications track ONLY]: Importance and novelty of the
> application: 6 (N/A (not an Applications track paper))
>         [Applications track ONLY]: Importance of planning/scheduling
> technology to the solution of the problem: 5 (N/A (not an Applications
> track paper))
>         [Applications track ONLY] Maturity: 7 (N/A (not an Applications
> track paper))
>         [Robotics track ONLY]: Balance of Robotics and Automated Planning
> and Scheduling: 6 (N/A (not a Robotics track paper))
>         [Robotics Track ONLY]: Evaluation on physical
> platforms/simulators: 6 (N/A (not a Robotics track paper))
>         [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
> (not a Robotics track paper))
>
>         ----------- Review -----------
>         The authors introduce a reinforcement learning algorithm for AMDPs
> that learns a hierarchical structure and a set of hierarchical models. To
> learn the hierarchical structure, they rely on an existing algorithm called
> HierGen. This algorithm extracts causal structure from a set of expert
> trajectories in a factored state environment.
>
>         While R-AMDP outperforms R-MAXQ on the two toy problems, I think
> there is a lot more work to do to show that R-AMDP is a good basis for
> developing more general algorithms. First, it would be nice to examine the
> computational complexity of R-AMDP (rather than just empirical comparison
> in Figure 3). Second, what if R-AMDP is just getting lucky in the two toy
> tasks presented. Maybe there are other problems where R-AMDP performs
> poorly. Further, stopping the plots at 50 or 60 trials may just be
> misleading since R-AMDP could be converging to a suboptimal but pretty good
> policy early on. It’s also not clear that R-AMDP can be scaled to huge
> state or action spaces. Does the hierarchical structure discovered by
> HierGen lend itself to transfer when the dynamics change? It would be nice
> to have a more rigorous analysis of R-AMDP and a longer discussion of its
> potential pitfalls (when should we expected it to succeed and when should
> it fail?). There is a hind of this in the discussio!
>           n about HierGen’s inability to distinguish between correlation
> and causation.
>
>         While reading the abstract I expected the contribution to be in
> learning the hierarchy. The authors should probably change the abstract to
> avoid this confusion.
>
>         ----------------------- REVIEW 3 ---------------------
>         PAPER: 46
>         TITLE: Learning Abstracted Models and Hierarchies of Markov
> Decision Processes
>         AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
> Milani, Shane Parr and Marie desJardins
>
>         Significance: 3 (substantial contribution or strong impact)
>         Soundness: 3 (correct)
>         Scholarship: 3 (excellent coverage of related work)
>         Clarity: 3 (well written)
>         Reproducibility: 5 (code and domains (whichever apply) are already
> publicly available)
>         Overall evaluation: 3 (strong accept)
>         Reviewer's confidence: 4 (expert)
>         Suitable for a demo?: 3 (yes)
>         Nominate for Best Paper Award: 1 (no)
>         Nominate for Best Student Paper Award (if eligible): 1 (no)
>         [Applications track ONLY]: Importance and novelty of the
> application: 6 (N/A (not an Applications track paper))
>         [Applications track ONLY]: Importance of planning/scheduling
> technology to the solution of the problem: 5 (N/A (not an Applications
> track paper))
>         [Applications track ONLY] Maturity: 7 (N/A (not an Applications
> track paper))
>         [Robotics track ONLY]: Balance of Robotics and Automated Planning
> and Scheduling: 6 (N/A (not a Robotics track paper))
>         [Robotics Track ONLY]: Evaluation on physical
> platforms/simulators: 6 (N/A (not a Robotics track paper))
>         [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
> (not a Robotics track paper))
>
>         ----------- Review -----------
>         This is only a placeholder review. Please ignore it.
>
>         ----------------------- REVIEW 4 ---------------------
>         PAPER: 46
>         TITLE: Learning Abstracted Models and Hierarchies of Markov
> Decision Processes
>         AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
> Milani, Shane Parr and Marie desJardins
>
>         Significance: 2 (modest contribution or average impact)
>         Soundness: 2 (minor inconsistencies or small fixable errors)
>         Scholarship: 3 (excellent coverage of related work)
>         Clarity: 1 (hard to follow)
>         Reproducibility: 2 (some details missing but still appears to be
> replicable with some effort)
>         Overall evaluation: -1 (weak reject)
>         Reviewer's confidence: 3 (high)
>         Suitable for a demo?: 2 (maybe)
>         Nominate for Best Paper Award: 1 (no)
>         Nominate for Best Student Paper Award (if eligible): 1 (no)
>         [Applications track ONLY]: Importance and novelty of the
> application: 6 (N/A (not an Applications track paper))
>         [Applications track ONLY]: Importance of planning/scheduling
> technology to the solution of the problem: 5 (N/A (not an Applications
> track paper))
>         [Applications track ONLY] Maturity: 7 (N/A (not an Applications
> track paper))
>         [Robotics track ONLY]: Balance of Robotics and Automated Planning
> and Scheduling: 6 (N/A (not a Robotics track paper))
>         [Robotics Track ONLY]: Evaluation on physical
> platforms/simulators: 6 (N/A (not a Robotics track paper))
>         [Robotics Track ONLY]: Significance of the contribution: 6 (N/A
> (not a Robotics track paper))
>
>         ----------- Review -----------
>         The paper describes an approach for learning abstract models and
> hierarchies for hierarchies of AMDPs. These hierarchies are similar, if not
> exactly the same, as those used by frameworks such as MAXQ, where each task
> in the hierarchy is an MDP with actions corresponding to child tasks. Prior
> AMDP work apparently uses hand-specified models of each task/AMDP, which
> are directly used for planning. This paper extends that work by learning
> the models of each task/AMDP. This is done using RMAX at each task. There
> is not a discussion of convergence guarantees of the approach. Apparently
> convergence must occur in a bottom-up way. Experiments are shown in two
> domains and with two hierarchies in one of the domains (Taxi). The approach
> appears to learn more efficiently than a prior approach R-MAXQ. The exact
> reasons for the increased efficiency were not exactly clear based on my
> understanding from the paper.
>
>         The paper is well-written at a high level, but the more technical
> and formal descriptions could be improved quite a bit. For example, the key
> object AMDP, is only described informally (the tuple is not described in
> detail). Most of the paper is written quite informally.  Another example is
> that Table 1 talks about "max planner rollouts", but I didn't see where
> rollouts are used anywhere in the algorithm description.
>
>         After reading the abstract and introduction, I expected that a big
> part of the contribution would be about actually learning the hierarchy.
> However, that does not seem to be the case. Rather, an off-the-shelf
> approach is used to learn hierarchies and then plugged into the proposed
> algorithm for learning the models of tasks. Further, this is only tried for
> one of the two experimental domains. The abstract and introduction should
> be more clear about the contributions of the paper.
>
>         Overall, I was unclear about what to learn from the paper. The
> main contribution is apparently algorithm 1, which uses R-MAX to learn the
> models of each AMPD in a given hierarchy. Perhaps this is a novel
> algorithm, but it feels like more of a baseline in the sense that it is the
> first thing that one might try given the problem setup. I may not be
> appreciating some type of complexity that makes this not be
> straightforward. This baseline approach would have been more interesting if
> some form of convergence result was provided, similar to what was provided
> for R-MAXQ.
>
>
>         The experiments show that R-AMDP learns faster and is more
> computationally efficient than R-MAXQ. I was unable to get a good
> understanding for why this was the case. This is likely due to the fact
> that I was not able to revisit the R-MAXQ algorithm and it was not
> described in detail in this paper. The authors do try to explain the
> reasons for the performance improvement, but I was unable to follow
> exactly. My best guess based on the discussion is that R-MAXQ does not try
> to exploit the state abstraction provided for each task by the hierarchy
> ("R-MAXQ must compute a model over all possible future states in a planning
> envelope after each action"). Is this the primary reason or is there some
> other reason? Adding the ability to exploit abstractions in R-MAXQ seems
> straightforward, though maybe I'm missing something.
>
>         ------------------------------------------------------
>
>         Best wishes,
>         Gabi Röger and Sven Koenig
>         ICAPS 2018 program chairs
>
>
>         _______________________________________________
>         Robot-learning mailing list
>         Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
> <Robot-learning at cs.umbc.edu>
>         https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>         <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>
>
>
>     _______________________________________________
>     Robot-learning mailing list
>     Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
> <Robot-learning at cs.umbc.edu>
>     https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>     <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>
>
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
> --
> Dr. Marie desJardins
> Associate Dean for Academic Affairs
> College of Engineering and Information Technology
> University of Maryland, Baltimore County
> 1000 Hilltop Circle
> Baltimore MD 21250
>
> Email: mariedj at umbc.edu
> Voice: 410-455-3967 <(410)%20455-3967>
> Fax: 410-455-3559 <(410)%20455-3559>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.cs.umbc.edu/pipermail/robot-learning/attachments/20180122/bbecade2/attachment-0001.html>

[Robot-learning] Fwd: ICAPS 2018 review response (submission [*NUMBER*])

[Robot-learning] Fwd: ICAPS 2018 review response (submission [NUMBER])