[Robot-learning] Fwd: ICAPS 2018 review response (submission [*NUMBER*])
Stefanie Tellex
stefie10 at cs.brown.edu
Mon Jan 22 15:18:56 EST 2018
I can do 2/2 any time in the afternoon.
Stefanie
On 01/22/2018 11:38 AM, Marie desJardins wrote:
> Fri 2/2 would be good for a phone call with Stefanie and Michael. (Does
> that work for both of you? -- if so, what time is good? I'm fairly
> unconstrained.)
>
> Fri 2/2 won't work for a larger group meeting, though -- John will be at
> AAAI. I'll be traveling on Fri 2/9 but John should be back by then, so
> maybe we could plan a joint group meeting that day -- do you still have
> your regular meetings on Fridays?
>
> Marie
>
>
> On 1/21/18 11:23 AM, Stefanie Tellex wrote:
>> I agree, for after the winter deadlines.
>>
>> Stefanie
>>
>> On 01/21/2018 04:51 AM, Lawson Wong wrote:
>>> So have kcaluru at brown.edu <mailto:kcaluru at brown.edu> and
>>> miles_holland at brown.edu <mailto:miles_holland at brown.edu>
>>>
>>> The reviews look like typical planning-community reviews -- generally
>>> sensible requests but clearly impossible to accomplish within the
>>> page limit. I guess it's generally hard to please planning reviewers
>>> unless there are some theoretical results. Review 2 actually reads a
>>> little like one that Nakul got for his paper...
>>>
>>> I don't know if Michael and Stefanie have answered separately
>>> regarding a meeting; it certainly sounds helpful to continue
>>> discussing (AMDP) hierarchy learning. Both the IJCAI and RSS
>>> deadlines are on that week (1/31 and 2/1 respectively), so if
>>> possible it may be best to meet after those deadlines, such as on Fri
>>> 2/2 -- unless the intent was to discuss before the IJCAI deadline.
>>>
>>> -Lawson
>>>
>>>
>>> On Sat, Jan 20, 2018 at 6:04 AM, Littman, Michael
>>> <mlittman at cs.brown.edu <mailto:mlittman at cs.brown.edu>> wrote:
>>>
>>> christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu> has
>>> graduated.
>>>
>>>
>>> On Fri, Jan 19, 2018 at 12:06 PM, Marie desJardins
>>> <mariedj at cs.umbc.edu <mailto:mariedj at cs.umbc.edu>> wrote:
>>>
>>> Hi everyone,
>>>
>>> I wanted to share the initial reviews we received on our ICAPS
>>> submission (which I've also attached). Based on the reviews, I
>>> think the paper is unlikely to be accepted, so we are working to
>>> see whether we can get some new results for an IJCAI submission.
>>> We are making good progress on developing hierarchical learning
>>> methods for AMDPs but we need to (a) move to larger/more complex
>>> domains, (b) develop some theoretical analysis (complexity,
>>> correctness, convergence), and (c) work on more AMDP-specific
>>> hierarchy learning techniques (right now we are using an
>>> off-the-shelf method called HierGen that works well but may not
>>> necessarily find the best hierarchy for an AMDP representation).
>>>
>>> I'd be very interested to talk more about how this relates to
>>> the work that's happening at Brown, and to hear any
>>> feedback/ideas you might have about this work.
>>>
>>> Michael/Stephanie, could we maybe set up a time for the three of
>>> us to have a teleconference? I'll be on vacation next week but
>>> the week after that would be good. Possible times for me -- Mon
>>> 1/29 before 11:30am, between 1-2, or after 4pm; Wed 1/31 before
>>> 10am or after 2pm; Thu 2/1 between 11-1:30 or 3-4; Fri 2/2
>>> any time.
>>>
>>> BTW, these are the Brown students who are on this list. Please
>>> let me know if anyone should be added or removed.
>>>
>>> carl_trimbach at brown.edu <mailto:carl_trimbach at brown.edu>
>>> christopher_grimm at brown.edu <mailto:christopher_grimm at brown.edu>
>>> david_abel at brown.edu <mailto:david_abel at brown.edu>
>>> dilip.arumugam at gmail.com <mailto:dilip.arumugam at gmail.com>
>>> edward_c_williams at brown.edu <mailto:edward_c_williams at brown.edu>
>>> jun_ki_lee at brown.edu <mailto:jun_ki_lee at brown.edu>
>>> kcaluru at brown.edu <mailto:kcaluru at brown.edu>
>>> lsw at brown.edu <mailto:lsw at brown.edu>
>>> lucas_lehnert at brown.edu <mailto:lucas_lehnert at brown.edu>
>>> melrose_roderick at brown.edu <mailto:melrose_roderick at brown.edu>
>>> miles_holland at brown.edu <mailto:miles_holland at brown.edu>
>>> nakul_gopalan at brown.edu <mailto:nakul_gopalan at brown.edu>
>>> oberlin at cs.brown.edu <mailto:oberlin at cs.brown.edu>
>>> sam_saarinen at brown.edu <mailto:sam_saarinen at brown.edu>
>>> siddharth_karamcheti at brown.edu
>>> <mailto:siddharth_karamcheti at brown.edu>
>>>
>>> Marie
>>>
>>>
>>> -------- Forwarded Message --------
>>> Subject: ICAPS 2018 review response (submission [*NUMBER*])
>>> Date: Thu, 11 Jan 2018 14:59:19 +0100
>>> From: ICAPS 2018 <icaps2018 at easychair.org>
>>> <mailto:icaps2018 at easychair.org>
>>> To: Marie desJardins <mariedj at umbc.edu>
>>> <mailto:mariedj at umbc.edu>
>>>
>>>
>>>
>>> Dear Marie,
>>>
>>> Thank you for your submission to ICAPS 2018. The ICAPS 2018
>>> review
>>> response period starts now and ends at January 13.
>>>
>>> During this time, you will have access to the current state
>>> of your
>>> reviews and have the opportunity to submit a response. Please
>>> keep in
>>> mind the following during this process:
>>>
>>> * Most papers have a so-called placeholder review, which was
>>> necessary to give the discussion leaders access to the
>>> reviewer
>>> discussion. Some of these reviews list questions that
>>> already came
>>> up during the discussion and which you may address in your
>>> response but
>>> in all cases the (usually enthusiastic) scores are
>>> meaningless and you
>>> should ignore them. Placeholder reviews are clearly
>>> indicated as such in
>>> the review.
>>>
>>> * Almost all papers have three reviews. Some may have four. A
>>> very
>>> low number of papers are missing one review. We hope to
>>> get that
>>> review completed in the next day. We apologize for this.
>>>
>>> * The deadline for entering a response is January 13th (at
>>> 11:59pm
>>> UTC-12 i.e. anywhere in the world).
>>>
>>> * Responses must be submitted through EasyChair.
>>>
>>> * Responses are limited to 1000 words in total. You can only
>>> enter
>>> one response, not one per review.
>>>
>>> * You will not be able to change your response after it is
>>> submitted.
>>>
>>> * The response must focus on any factual errors in the
>>> reviews and any
>>> questions posed by the reviewers. Try to be as concise and
>>> as to the
>>> point as possible.
>>>
>>> * The review response period is an opportunity to react to the
>>> reviews, but not a requirement to do so. Thus, if you feel
>>> the reviews
>>> are accurate and the reviewers have not asked any
>>> questions, then you
>>> do not have to respond.
>>>
>>> * The reviews are as submitted by the PC members, without much
>>> coordination between them. Thus, there may be
>>> inconsistencies.
>>> Furthermore, these are not the final versions of the
>>> reviews. The
>>> reviews can later be updated to take into account the
>>> discussions at
>>> the program committee meeting, and we may find it
>>> necessary to solicit
>>> other outside reviews after the review response period.
>>>
>>> * The program committee will read your responses carefully and
>>> take this information into account during the discussions.
>>> On the
>>> other hand, the program committee may not directly respond
>>> to your
>>> responses in the final versions of the reviews.
>>>
>>> The reviews on your paper are attached to this letter. To
>>> submit your
>>> response you should log on the EasyChair Web page for ICAPS
>>> 2018 and
>>> select your submission on the menu.
>>>
>>> ----------------------- REVIEW 1 ---------------------
>>> PAPER: 46
>>> TITLE: Learning Abstracted Models and Hierarchies of Markov
>>> Decision Processes
>>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
>>> Milani, Shane Parr and Marie desJardins
>>>
>>> Significance: 2 (modest contribution or average impact)
>>> Soundness: 3 (correct)
>>> Scholarship: 3 (excellent coverage of related work)
>>> Clarity: 3 (well written)
>>> Reproducibility: 3 (authors describe the implementation and
>>> domains in sufficient detail)
>>> Overall evaluation: 1 (weak accept)
>>> Reviewer's confidence: 2 (medium)
>>> Suitable for a demo?: 1 (no)
>>> Nominate for Best Paper Award: 1 (no)
>>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>>> [Applications track ONLY]: Importance and novelty of the
>>> application: 6 (N/A (not an Applications track paper))
>>> [Applications track ONLY]: Importance of planning/scheduling
>>> technology to the solution of the problem: 5 (N/A (not an
>>> Applications track paper))
>>> [Applications track ONLY] Maturity: 7 (N/A (not an
>>> Applications track paper))
>>> [Robotics track ONLY]: Balance of Robotics and Automated
>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>> [Robotics Track ONLY]: Evaluation on physical
>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>> [Robotics Track ONLY]: Significance of the contribution: 6
>>> (N/A (not a Robotics track paper))
>>>
>>> ----------- Review -----------
>>> The paper proposes a method for learning abstract Markov
>>> decision processes (AMDP) from demonstration trajectories and model
>>> based reinforcement learning. Experiments show that the method is
>>> more effective than the baseline.
>>>
>>> On the positive side, a complete method for learning AMDP is
>>> given and is shown to be work on the problems used in the
>>> experiments. The proposed model based reinforcement learning method
>>> based on R-MAX is also shown to outperform the baseline R-MAXQ.
>>>
>>> On the negative side, the method for learning the hierarchy,
>>> HierGen, is taken from a prior work, leaving the adaptation of R-MAX
>>> to learn with hierarchy as the main algorithmic novelty. No
>>> convergence proof for the earning method is provided, although it is
>>> empirically shown to outperform the baseline R-MAXQ. The experiments
>>> are done on toy problems, indicating that the method is probably not
>>> ready for more demanding practical problems.
>>>
>>> Overall, I am inclined to vote weak accept. The problem is
>>> difficult, so I think that the work does represent progress, although
>>> it is not yet compelling.
>>>
>>> ----------------------- REVIEW 2 ---------------------
>>> PAPER: 46
>>> TITLE: Learning Abstracted Models and Hierarchies of Markov
>>> Decision Processes
>>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
>>> Milani, Shane Parr and Marie desJardins
>>>
>>> Significance: 2 (modest contribution or average impact)
>>> Soundness: 3 (correct)
>>> Scholarship: 2 (relevant literature cited but could be expanded)
>>> Clarity: 3 (well written)
>>> Reproducibility: 3 (authors describe the implementation and
>>> domains in sufficient detail)
>>> Overall evaluation: -1 (weak reject)
>>> Reviewer's confidence: 4 (expert)
>>> Suitable for a demo?: 2 (maybe)
>>> Nominate for Best Paper Award: 1 (no)
>>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>>> [Applications track ONLY]: Importance and novelty of the
>>> application: 6 (N/A (not an Applications track paper))
>>> [Applications track ONLY]: Importance of planning/scheduling
>>> technology to the solution of the problem: 5 (N/A (not an
>>> Applications track paper))
>>> [Applications track ONLY] Maturity: 7 (N/A (not an
>>> Applications track paper))
>>> [Robotics track ONLY]: Balance of Robotics and Automated
>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>> [Robotics Track ONLY]: Evaluation on physical
>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>> [Robotics Track ONLY]: Significance of the contribution: 6
>>> (N/A (not a Robotics track paper))
>>>
>>> ----------- Review -----------
>>> The authors introduce a reinforcement learning algorithm for
>>> AMDPs that learns a hierarchical structure and a set of hierarchical
>>> models. To learn the hierarchical structure, they rely on an existing
>>> algorithm called HierGen. This algorithm extracts causal structure
>>> from a set of expert trajectories in a factored state environment.
>>>
>>> While R-AMDP outperforms R-MAXQ on the two toy problems, I
>>> think there is a lot more work to do to show that R-AMDP is a good
>>> basis for developing more general algorithms. First, it would be nice
>>> to examine the computational complexity of R-AMDP (rather than just
>>> empirical comparison in Figure 3). Second, what if R-AMDP is just
>>> getting lucky in the two toy tasks presented. Maybe there are other
>>> problems where R-AMDP performs poorly. Further, stopping the plots at
>>> 50 or 60 trials may just be misleading since R-AMDP could be
>>> converging to a suboptimal but pretty good policy early on. It’s also
>>> not clear that R-AMDP can be scaled to huge state or action spaces.
>>> Does the hierarchical structure discovered by HierGen lend itself to
>>> transfer when the dynamics change? It would be nice to have a more
>>> rigorous analysis of R-AMDP and a longer discussion of its potential
>>> pitfalls (when should we expected it to succeed and when should it
>>> fail?). There is a hind of this in the discussio!
>>> n about HierGen’s inability to distinguish between
>>> correlation and causation.
>>>
>>> While reading the abstract I expected the contribution to be
>>> in learning the hierarchy. The authors should probably change the
>>> abstract to avoid this confusion.
>>>
>>> ----------------------- REVIEW 3 ---------------------
>>> PAPER: 46
>>> TITLE: Learning Abstracted Models and Hierarchies of Markov
>>> Decision Processes
>>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
>>> Milani, Shane Parr and Marie desJardins
>>>
>>> Significance: 3 (substantial contribution or strong impact)
>>> Soundness: 3 (correct)
>>> Scholarship: 3 (excellent coverage of related work)
>>> Clarity: 3 (well written)
>>> Reproducibility: 5 (code and domains (whichever apply) are
>>> already publicly available)
>>> Overall evaluation: 3 (strong accept)
>>> Reviewer's confidence: 4 (expert)
>>> Suitable for a demo?: 3 (yes)
>>> Nominate for Best Paper Award: 1 (no)
>>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>>> [Applications track ONLY]: Importance and novelty of the
>>> application: 6 (N/A (not an Applications track paper))
>>> [Applications track ONLY]: Importance of planning/scheduling
>>> technology to the solution of the problem: 5 (N/A (not an
>>> Applications track paper))
>>> [Applications track ONLY] Maturity: 7 (N/A (not an
>>> Applications track paper))
>>> [Robotics track ONLY]: Balance of Robotics and Automated
>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>> [Robotics Track ONLY]: Evaluation on physical
>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>> [Robotics Track ONLY]: Significance of the contribution: 6
>>> (N/A (not a Robotics track paper))
>>>
>>> ----------- Review -----------
>>> This is only a placeholder review. Please ignore it.
>>>
>>> ----------------------- REVIEW 4 ---------------------
>>> PAPER: 46
>>> TITLE: Learning Abstracted Models and Hierarchies of Markov
>>> Decision Processes
>>> AUTHORS: Matthew Landen, John Winder, Shawn Squire, Stephanie
>>> Milani, Shane Parr and Marie desJardins
>>>
>>> Significance: 2 (modest contribution or average impact)
>>> Soundness: 2 (minor inconsistencies or small fixable errors)
>>> Scholarship: 3 (excellent coverage of related work)
>>> Clarity: 1 (hard to follow)
>>> Reproducibility: 2 (some details missing but still appears to
>>> be replicable with some effort)
>>> Overall evaluation: -1 (weak reject)
>>> Reviewer's confidence: 3 (high)
>>> Suitable for a demo?: 2 (maybe)
>>> Nominate for Best Paper Award: 1 (no)
>>> Nominate for Best Student Paper Award (if eligible): 1 (no)
>>> [Applications track ONLY]: Importance and novelty of the
>>> application: 6 (N/A (not an Applications track paper))
>>> [Applications track ONLY]: Importance of planning/scheduling
>>> technology to the solution of the problem: 5 (N/A (not an
>>> Applications track paper))
>>> [Applications track ONLY] Maturity: 7 (N/A (not an
>>> Applications track paper))
>>> [Robotics track ONLY]: Balance of Robotics and Automated
>>> Planning and Scheduling: 6 (N/A (not a Robotics track paper))
>>> [Robotics Track ONLY]: Evaluation on physical
>>> platforms/simulators: 6 (N/A (not a Robotics track paper))
>>> [Robotics Track ONLY]: Significance of the contribution: 6
>>> (N/A (not a Robotics track paper))
>>>
>>> ----------- Review -----------
>>> The paper describes an approach for learning abstract models
>>> and hierarchies for hierarchies of AMDPs. These hierarchies are
>>> similar, if not exactly the same, as those used by frameworks such as
>>> MAXQ, where each task in the hierarchy is an MDP with actions
>>> corresponding to child tasks. Prior AMDP work apparently uses
>>> hand-specified models of each task/AMDP, which are directly used for
>>> planning. This paper extends that work by learning the models of each
>>> task/AMDP. This is done using RMAX at each task. There is not a
>>> discussion of convergence guarantees of the approach. Apparently
>>> convergence must occur in a bottom-up way. Experiments are shown in
>>> two domains and with two hierarchies in one of the domains (Taxi).
>>> The approach appears to learn more efficiently than a prior approach
>>> R-MAXQ. The exact reasons for the increased efficiency were not
>>> exactly clear based on my understanding from the paper.
>>>
>>> The paper is well-written at a high level, but the more
>>> technical and formal descriptions could be improved quite a bit. For
>>> example, the key object AMDP, is only described informally (the tuple
>>> is not described in detail). Most of the paper is written quite
>>> informally. Another example is that Table 1 talks about "max planner
>>> rollouts", but I didn't see where rollouts are used anywhere in the
>>> algorithm description.
>>>
>>> After reading the abstract and introduction, I expected that
>>> a big part of the contribution would be about actually learning the
>>> hierarchy. However, that does not seem to be the case. Rather, an
>>> off-the-shelf approach is used to learn hierarchies and then plugged
>>> into the proposed algorithm for learning the models of tasks.
>>> Further, this is only tried for one of the two experimental domains.
>>> The abstract and introduction should be more clear about the
>>> contributions of the paper.
>>>
>>> Overall, I was unclear about what to learn from the paper.
>>> The main contribution is apparently algorithm 1, which uses R-MAX to
>>> learn the models of each AMPD in a given hierarchy. Perhaps this is a
>>> novel algorithm, but it feels like more of a baseline in the sense
>>> that it is the first thing that one might try given the problem
>>> setup. I may not be appreciating some type of complexity that makes
>>> this not be straightforward. This baseline approach would have been
>>> more interesting if some form of convergence result was provided,
>>> similar to what was provided for R-MAXQ.
>>>
>>>
>>> The experiments show that R-AMDP learns faster and is more
>>> computationally efficient than R-MAXQ. I was unable to get a good
>>> understanding for why this was the case. This is likely due to the
>>> fact that I was not able to revisit the R-MAXQ algorithm and it was
>>> not described in detail in this paper. The authors do try to explain
>>> the reasons for the performance improvement, but I was unable to
>>> follow exactly. My best guess based on the discussion is that R-MAXQ
>>> does not try to exploit the state abstraction provided for each task
>>> by the hierarchy ("R-MAXQ must compute a model over all possible
>>> future states in a planning envelope after each action"). Is this the
>>> primary reason or is there some other reason? Adding the ability to
>>> exploit abstractions in R-MAXQ seems straightforward, though maybe
>>> I'm missing something.
>>>
>>> ------------------------------------------------------
>>>
>>> Best wishes,
>>> Gabi Röger and Sven Koenig
>>> ICAPS 2018 program chairs
>>>
>>>
>>> _______________________________________________
>>> Robot-learning mailing list
>>> Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>>>
>>>
>>>
>>> _______________________________________________
>>> Robot-learning mailing list
>>> Robot-learning at cs.umbc.edu <mailto:Robot-learning at cs.umbc.edu>
>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>> <https://lists.cs.umbc.edu/mailman/listinfo/robot-learning>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Robot-learning mailing list
>>> Robot-learning at cs.umbc.edu
>>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>>>
>>
>> _______________________________________________
>> Robot-learning mailing list
>> Robot-learning at cs.umbc.edu
>> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
> --
> Dr. Marie desJardins
> Associate Dean for Academic Affairs
> College of Engineering and Information Technology
> University of Maryland, Baltimore County
> 1000 Hilltop Circle
> Baltimore MD 21250
>
> Email: mariedj at umbc.edu
> Voice: 410-455-3967
> Fax: 410-455-3559
>
>
> _______________________________________________
> Robot-learning mailing list
> Robot-learning at cs.umbc.edu
> https://lists.cs.umbc.edu/mailman/listinfo/robot-learning
>
More information about the Robot-learning
mailing list