ARS Reinforcement Learning using Gymnasium
ARS - Coursework Guide – 24/25
Version History 1.0
29/09/24First version.1.1
Fleshed out marking criteria for task 2 reportSummary Title:Reinforcement Learning using Gymnasium environments
Hand-in:Programs AND a written report will need to be submitted online via Moodle. Checkthe module’s Moodle page for the precise deadline.
Late policy:The coursework deadlines (task 1 and task 2) are absolute. Late submissions aresubject to a 5% deduction of the overall coursework mark per day.
Informal Description The coursework consists of two tasks as described below. Your aim is to build several reinforcementlearning agents and to design, implement and un several basic research-based experiments. Youwill hand-in software and a report that discusses your work on these tasks. Briefly, task 1 is aboutimplementing some asic RL prototypes (with noise injection and basic modularity) for your chosenenvironment(s) and identification of key literature, gaps, and research questions, whereas task 2 iabout designing, developing and running experiments based on the research questions identified intask 1.
ims and Outcomes
- If you take the labs seriously, at the end of the semester you should be:o comfortable with implementing and modifying reinforcement learning agents,o capable of adapting your RL solutions to different kinds of robotic problems withwelldefined states, actions and rewardso comfortable with neural network approaches for the mapping of complex highdimensional states to actions (if you choose to use neural network based Rsolutions),o comfortable with setting up experiments pertaining to noise and studying andmitigating its impact, comfortable with designing modular AI solutions,o capable of scanning the literature in order to understand modernRL techniques, andincorporating/extending these in your own solutions,o capable of identifying gaps, and/or weaknesses/limitations in state-of-theartresearch, and using this to define research 代 写ARS Reinforcement Learning using Gymnasium questions for guiding your research,o capable of studying and evaluating algorithm performance objectively,o capable of designing innovative algorithms and experiments, and reporting theresults of these in a clear and well-structured manner.Rough Timetable Laboratory notesYou will work individually.
- We need to start working hard from the very first day to make the most of the lab sessions.In the first week you will learn the basics of Gymnasium, will experiment with severalenvironments, and will even try some small heuristics on simple control problems(e.g.cartpole).Rough time estimation:o Total hours: 20 credits ≈ 200 hourso Subtract lectures (22 hours) and labs (20 hours) = 200 – 42 = 158 ivide the remainder by 12 weeks = 158 / 12 ≈ 13 hours per week for everythingelse, e.g.: studying, researching, reading, thinking, coding, testing, analyzing, writing.Getting Started Preliminary steps
- Check the following three main Gymnasium resources:o Farama’s general documentation page for Gymnasium.o Basic usage page in the above documentation.o Gymnasium GitHub page – includes installation instructions.Install Gymnasium.For the purpose of the coursework it is sufficient to work with the “classic control” set ofenvironments, however do feel free to install and use other categories of environments (e.g.
MuJoCo and Atari), if you wish.Go through the Basic Usage page.
You can install Gym on your own machines, or in your local directory in UNM’s HPC, or youcan also use Google Colaboratory. Please note that in the past there were ways to renderenvironments properly in Colab (e.g. have a look at this tutorial) however this may changefrom time to time. For an example of a Jupyter notebook for the cartpole example, refer to the module’s Moodle page. I suggest not bothering with rendering, except for someexercises, since performance metrics are the key concern.mentioned, if you want to use any of the MuJoCo environments you can. Deep Mind
recently bought MuJoCo and made it open source, which means there are no more licensingissues. You are not required to use MuJoCo, but if you really want to, you are t,and get the environments setup.To see what environments are available use:mport gymnasium as gymprint(gym.envs.registry.keys())To better understand someGymnasium environments consult this Wiki or scroll to“environments” in the Gymnasium’s GitHub page, and search for your environment. Forexample for the cart pole environment have a look at this page.
ry to come up with some heuristic solutions for Cart Pole
Try to come up with some simple heuristics to keep the pole up based on yourunderstanding of the environment. You can start from and modify the (failing) heuristicexample provided in the Moodle page (i.e. sol-H1-cart-pole-v0).
Difficult? Let's see whether reinforcement learning helps.Have a look at a Q-learning solutioExample: s1cart-pole-v0-sol1.Try to run the code.
- Read the code. Try to understand it as much as possible, although note, it will only fullymake sense once we have done Q-Learning in the lectures.Task Description
- Requirements for Task 1:
o Title. Prototypes, literature, gaps, and research questions.o Prototypes: ▪ Environment selection. Select two environments to work on throughoutthe whole assignment. Select one environment from within the controlcategory (e.g. CartPolev1) and one environment from any category(including the controlone).Please recall that different environmentsmay impose significant changes to your reinforcement learning
since, for example, they may involve continual action spaces,
or other representational differences. To simplify matters you might
want to constrain yourself to environments with discrete action spaces.
▪ Core method required: reinforcement learning. If you want to use other
methods for other integrated modules, that is fine.▪ Additional requirements: (1) noise injection at the inputs and/or
outputs, (2) some modularity (e.g. RL component and denoisingcomponent).▪ Aim: for each environment develop at least one viable proof of conceptbased on RL.o Literature: ▪ Steps:Explore the recent RL literature in relation to the topic of noisepaperswillbeyour “core/seed” papers, you should still study the literaturemore broadly(i.e. your report should citeotherpapers apartom the core papers).Select your gaps for further investigation. Justify your choices.
- Design at least 2 research questions based on your selectedgaps.▪ Aim: clearly outline 1-3 selected papers, overall gaps, selected gaps, andresearch questions. Note that it is crucial for the papers, gaps andresearch questions to be 100% credible, i.e.: (1) the papers must berecent and good, (2) the gaps must be genuine open problems, and (3)the research questions must sit squarely in the gaps andmustpoint inuseful directions.▪ Constraint 1: Every student must have a different set of core papersand/or a different set of gaps and/or a different set of researchquestions (RQs). Once a student has defined their selected papers, gaps,nd RQs, they must email them to me, in order for me to check andpprove them. Please note that this processwill operate on a “firstcome first served” basis. Please also note that if two students share thesame papers, they can stillbedifferent in terms of the chosen gaps orRQs, however, it is preferable if all elements are distinct.▪ Constraint 2: The selected research questions must include, or focus on,
(Requirements for Task 2:o Title. Research questions and experiments.o Environment selection. You must use the same two environment you selected
for task 1.
o Core method required: reinforcement learning. As before, if you want to use
other methods for other integrated modules, that is fine.
o Goals. Keywords: novel experiments and insights. The aim of this task is for yoo design, develop, run, and analyze, experiments that address the researchquestions your listed in task 1. The mains tasks would be: (1) design experimentsassessanswered the research questions, (6) eitherproceed backto step 1 with adjustments to the experiments/solutions, orproceed with additional experiments (depending on ime and completionstatus). Document your findings.Requirements for all tasks (i.e. tasks 1 and 2): o Performance. Define one or more valid performance measures, apart from thedefault/compulsory one, i.e.: the average number of episodes needed before
a problem (see below for more information).
o Evaluation. Run your experiments and report your results for both of your
chosen environments consistently.
o Four I’s. Try to maximize your work along the following dimensions: (1)informedness (i.e. it is based on a solid understanding of the literature), (2)innovativeness (i.e.novel), (3) inventiveness (i.e. not technically trivial), (4)impactfulness (e.g. generates new knowledge).o Core themes. The core themes for both tasks are: (1) reinforcement learning, (2)noise, (3) modularity. Please note that the research questions can be exclusivelyaboutnoise, or modularity, or both, however, the models must always includeelements of noise and modularity.
- Demo. Show and explain the performance of your solutions, and the results of yourexperiments.Performance Evaluation
- you will be injecting noise into your sensor data and/or actions, your results are directly comparable to solutions on external leaderboards (e.g.:https://github.com/openai/gym/wiki/Leaderboard). Your focus will be on internalcomparisons (i.e. your own experimental conditions) and innovation.One key performance measure that you should recall is the number of episodes requiredbefore solving the problem. In other words, here you are interested in the speedoflearning. Care must be taken in being explicit and consistent regarding what constituteshaving solved theproblem.Assessment – OverallComponentMarks (100) Description
Main Criteria Task 1 - demo5mo of work sopages)summarizing task1Are the core papers (1-3) well explained? Are the overall gapswell identified and explained? Are the selected gapsjustifiedproperly? Are the research questions grounded in the gaps,and are they clear, concrete, and heading in the rightdirection?Task 2 - demo5Demo of work sofar.Evidence ofunderstanding of the base code. Good explanationof gaps, question, experimental design, results, analyses, andconclusions. Solid argumentation vis-à-vis the 4 I’s. Strongjustifications and arguments. Clear communication.Task 2 - paper50Mini-conferencepaper (4 pages)summarizing all ofthe work done onboth tasks.Are the structure, grammar and argumentation of thepaper/report good? Are the introduction,background,methods, results and analyses, clear, comprehensive andinsightful? Does the paper show critical and creativethinking?Task 2 - software20Multiple filesorganized with aclear structure.Is the code complete? Is the code well-designed, clean,elegant, and well commented? Is the codecomplex/challengingenough?Assessment Criteria for theReport (task 1) and Paper (task 2) 1st an excellent, well-written report/paper demonstrating extensive understanding andgood insight.2:1 a comprehensive, well-written report/paper demonstrating thorough understanding andsome insight2:2 a competent report/paper demonstrating good understanding of the implementation.
3rd an adequate report/paper covering all specified topics at a basic level of understanding.
- F an inadequate report/paper failing to cover the specified topics.Report guide (task 1)
- The report for task 1 has no fixed format, as long as it is well structured and well organized.The only constraint is that it should be 1-2 pages long. No appendices areallowed, and to befair to all, no material on page 3 onwards (if you exceed 2 pages) will be included in theassessment. The font size of the main text should not be smaller than 11.
This report will exclusively focus on: (1) a very brief summary of your prototypes, (2) briefsummaries of your selected core papers, and why they were chosen, (3) lengthier explanations on the weaknesses/gaps of the papers, (4) an explanation and justification ofyour selected gaps, and (5) an explanation and justification ofyourresearch questions, andhow they are grounded in the gaps.
Paper Guide (task 2) You should design your final report as a conference paper. The paper should contain:
[8 marks] Introduction (about 1 page). Brief explanation of the motivation and mainconcepts, a problem statement, an extremely brief overview of the key papersand theirgaps, the research questions, and a brief summary of your main contributions. Key marking : (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation,(5) Insightfulness, (6) Critical and creative thinking[8 marks] Background (about 0.5 pages). Brief overview of the field and the key papersclosely related to your work (this will include the core 1-3 papers and other relevant papers). core selected papers with their gaps, and why there were chosen selected, must beclearly explained. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.
- [8 marks] Methods (about 1 page). A detailed and concise description of how yomplemented task 2 (e.g. algorithms and experimental design). Key marking criteria: (1)Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation.
[10 marks] Results (about 1 page). An overview of your key results encompassinperformance measures and other results leading to insights about the problem and/or yoursolutions. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness.
[10 marks] Discussion (about 0.5 pages). Your interpretation of the results, your conclusions,and proposed future work. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.
[6 marks] References & Appendices (not included in the word count). Key marking criteria:(1) Consistency of references, (2) Comprehensiveness of references, (3) Structure and clarity appendices, (4) Insightfulness of appendices.Note: Writing a concise report/paper is a core part of the assignment. The total number of pages for paper (i.e. main sections, excluding references and Appendices) cannot exceed 4 pages (with aminimum page margin of 2.5cm on each side), using single line spacing, a two-column format, and aminimum font size of 11