美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇

COMP9414代寫、Python語言編程代做

時間:2024-07-06  來源:  作者: 我要糾錯



COMP9414 24T2
Artificial Intelligence
Assignment 2 - Reinforcement Learning
Due: Week 9, Wednesday, 26 July 2024, 11:55 PM.
1 Problem context
Taxi Navigation with Reinforcement Learning: In this assignment,
you are asked to implement Q-learning and SARSA methods for a taxi nav-
igation problem. To run your experiments and test your code, you should
make use of the Gym library1, an open-source Python library for developing
and comparing reinforcement learning algorithms. You can install Gym on
your computer simply by using the following command in your command
prompt:
pip i n s t a l l gym
In the taxi navigation problem, there are four designated locations in the
grid world indicated by R(ed), G(reen), Y(ellow), and B(lue). When the
episode starts, one taxi starts off at a random square and the passenger is
at a random location (one of the four specified locations). The taxi drives
to the passenger’s location, picks up the passenger, drives to the passenger’s
destination (another one of the four specified locations), and then drops off
the passenger. Once the passenger is dropped off, the episode ends. To show
the taxi grid world environment, you can use the following code:

env = gym .make(”Taxi?v3 ” , render mode=”ans i ” ) . env
s t a t e = env . r e s e t ( )
rendered env = env . render ( )
p r i n t ( rendered env )
In order to render the environment, there are three modes known as
“human”, “rgb array, and “ansi”. The “human” mode visualizes the envi-
ronment in a way suitable for human viewing, and the output is a graphical
window that displays the current state of the environment (see Fig. 1). The
“rgb array” mode provides the environment’s state as an RGB image, and
the output is a numpy array representing the RGB image of the environment.
The “ansi” mode provides a text-based representation of the environment’s
state, and the output is a string that represents the current state of the
environment using ASCII characters (see Fig. 2).
Figure 1: “human” mode presentation for the taxi navigation problem in
Gym library.
You are free to choose the presentation mode between “human” and
“ansi”, but for simplicity, we recommend “ansi” mode. Based on the given
description, there are six discrete deterministic actions that are presented in
Table 1.
For this assignment, you need to implement the Q-learning and SARSA
algorithms for the taxi navigation environment. The main objective for this
assignment is for the agent (taxi) to learn how to navigate the gird-world
and drive the passenger with the minimum possible steps. To accomplish
the learning task, you should empirically determine hyperparameters, e.g.,
the learning rate α, exploration parameters (such as ? or T ), and discount
factor γ for your algorithm. Your agent should be penalized -1 per step it
2
Figure 2: “ansi” mode presentation for the taxi navigation problem in Gym
library. Gold represents the taxi location, blue is the pickup location, and
purple is the drop-off location.
Table 1: Six possible actions in the taxi navigation environment.
Action Number of the action
Move South 0
Move North 1
Move East 2
Move West 3
Pickup Passenger 4
Drop off Passenger 5
takes, receive a +20 reward for delivering the passenger, and incur a -10
penalty for executing “pickup” and “drop-off” actions illegally. You should
try different exploration parameters to find the best value for exploration
and exploitation balance.
As an outcome, you should plot the accumulated reward per episode and
the number of steps taken by the agent in each episode for at least 1000
learning episodes for both the Q-learning and SARSA algorithms. Examples
of these two plots are shown in Figures 3–6. Please note that the provided
plots are just examples and, therefore, your plots will not be exactly like the
provided ones, as the learning parameters will differ for your algorithm.
After training your algorithm, you should save your Q-values. Based on
your saved Q-table, your algorithms will be tested on at least 100 random
grid-world scenarios with the same characteristics as the taxi environment for
both the Q-learning and SARSA algorithms using the greedy action selection
3
Figure 3: Q-learning reward. Figure 4: Q-learning steps.
Figure 5: SARSA reward. Figure 6: SARSA steps.
method. Therefore, your Q-table will not be updated during testing for the
new steps.
Your code should be able to visualize the trained agent for both the Q-
learning and SARSA algorithms. This means you should render the “Taxi-
v3” environment (you can use the “ansi” mode) and run your trained agent
from a random position. You should present the steps your agent is taking
and how the reward changes from one state to another. An example of the
visualized agent is shown in Fig. 7, where only the first six steps of the taxi
are displayed.
2 Testing and discussing your code
As part of the assignment evaluation, your code will be tested by tutors
along with you in a discussion carried out in the tutorial session in week 10.
The assignment has a total of 25 marks. The discussion is mandatory and,
therefore, we will not mark any assignment not discussed with tutors.
Before your discussion session, you should prepare the necessary code for
this purpose by loading your Q-table and the “Taxi-v3” environment. You
should be able to calculate the average number of steps per episode and the
4
Figure 7: The first six steps of a trained agent (taxi) based on Q-learning
algorithm.
average accumulated reward (for a maximum of 100 steps for each episode)
for the test episodes (using the greedy action selection method).
You are expected to propose and build your algorithms for the taxi nav-
igation task. You will receive marks for each of these subsections as shown
in Table 2. Except for what has been mentioned in the previous section, it is
fine if you want to include any other outcome to highlight particular aspects
when testing and discussing your code with your tutor.
For both Q-learning and SARSA algorithms, your tutor will consider the
average accumulated reward and the average taken steps for the test episodes
in the environment for a maximum of 100 steps for each episode. For your Q-
learning algorithm, the agent should perform at most 13 steps per episode on
average and obtain a minimum of 7 average accumulated reward. Numbers
worse than that will result in a score of 0 marks for that specific section.
For your SARSA algorithm, the agent should perform at most 15 steps per
episode on average and obtain a minimum of 5 average accumulated reward.
Numbers worse than that will result in a score of 0 marks for that specific
section.
Finally, you will receive 1 mark for code readability for each task, and
your tutor will also give you a maximum of 5 marks for each task depending
on the level of code understanding as follows: 5. Outstanding, 4. Great,
3. Fair, 2. Low, 1. Deficient, 0. No answer.
5
Table 2: Marks for each task.
Task Marks
Results obtained from agent learning
Accumulated rewards and steps per episode plots for Q-learning
algorithm.
2 marks
Accumulated rewards and steps per episode plots for SARSA
algorithm.
2 marks
Results obtained from testing the trained agent
Average accumulated rewards and average steps per episode for
Q-learning algorithm.
2.5 marks
Average accumulated rewards and average steps per episode for
SARSA algorithm.
2.5 marks
Visualizing the trained agent for Q-learning algorithm. 2 marks
Visualizing the trained agent for SARSA algorithm. 2 marks
Code understanding and discussion
Code readability for Q-learning algorithm 1 mark
Code readability for SARSA algorithm 1 mark
Code understanding and discussion for Q-learning algorithm 5 mark
Code understanding and discussion for SARSA algorithm 5 mark
Total marks 25 marks
3 Submitting your assignment
The assignment must be done individually. You must submit your assignment
solution by Moodle. This will consist of a single .zip file, including three
files, the .ipynb Jupyter code, and your saved Q-tables for Q-learning and
SARSA (you can choose the format for the Q-tables). Remember your files
with your Q-tables will be called during your discussion session to run the
test episodes. Therefore, you should also provide a script in your Python
code at submission to perform these tests. Additionally, your code should
include short text descriptions to help markers better understand your code.
Please be mindful that providing clean and easy-to-read code is a part of
your assignment.
Please indicate your full name and your zID at the top of the file as a
comment. You can submit as many times as you like before the deadline –
later submissions overwrite earlier ones. After submitting your file a good
6
practice is to take a screenshot of it for future reference.
Late submission penalty: UNSW has a standard late submission
penalty of 5% per day from your mark, capped at five days from the as-
sessment deadline, after that students cannot submit the assignment.
4 Deadline and questions
Deadline: Week 9, Wednesday 24 of July 2024, 11:55pm. Please use the
forum on Moodle to ask questions related to the project. We will prioritise
questions asked in the forum. However, you should not share your code to
avoid making it public and possible plagiarism. If that’s the case, use the
course email cs9414@cse.unsw.edu.au as alternative.
Although we try to answer questions as quickly as possible, we might take
up to 1 or 2 business days to reply, therefore, last-moment questions might
not be answered timely.
For any questions regarding the discussion sessions, please contact directly
your tutor. You can have access to your tutor email address through Table
3.
5 Plagiarism policy
Your program must be entirely your own work. Plagiarism detection software
might be used to compare submissions pairwise (including submissions for
any similar projects from previous years) and serious penalties will be applied,
particularly in the case of repeat offences.
Do not copy from others. Do not allow anyone to see your code.
Please refer to the UNSW Policy on Academic Honesty and Plagiarism if you
require further clarification on this matter.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp









 

標簽:

掃一掃在手機打開當前頁
  • 上一篇:FINS5510代寫、代做Python/c++程序語言
  • 下一篇:代寫公式指標 代寫指標股票公式定制開發
  • 無相關信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風景名勝區
    昆明西山國家級風景名勝區
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇
    永久久久久久久| 蜜桃无码一区二区三区| 成人午夜福利一区二区| 亚洲二区在线播放| 青青草自拍偷拍| 三级黄色免费观看| 国产小视频自拍| 久久久久亚洲av片无码v| 日本少妇高潮喷水xxxxxxx| 又嫩又硬又黄又爽的视频| 中文字幕无码日韩专区免费 | jizz18女人高潮| 国产极品国产极品| 亚洲一区二区三区蜜桃| 欧美体内she精高潮| 97超碰在线免费观看| 久久久久久久久久久久国产精品| 成人午夜剧场视频网站| 欧美性生给视频| 日本黄色网址大全| 97免费公开视频| 毛片久久久久久| 亚洲欧美va天堂人熟伦 | 欧美日韩生活片| 无遮挡aaaaa大片免费看| 久久嫩草捆绑紧缚| 永久免费看黄网站| 谁有免费的黄色网址| 在线观看日韩精品视频| 三上悠亚 电影| 9.1人成人免费视频网站| 2019男人天堂| 成人在线观看小视频| 午夜国产小视频| 亚洲欧美一区二区三区四区五区| 亚洲精品国产精品国自| 成人免费看aa片| 国产又黄又粗又猛又爽的| 亚洲色图27p| www.色小姐com| 魔女鞋交玉足榨精调教| a级在线观看视频| 欧美黄色高清视频| 日韩黄色一区二区| 国产sm调教视频| 国产熟女高潮一区二区三区| 久久人妻一区二区| 日本精品在线免费观看| 国产精品久久无码| 无码黑人精品一区二区| 性久久久久久久久久| 欧美一级特黄高清视频| 五级黄高潮片90分钟视频| 国产午夜手机精彩视频| av电影在线不卡| 国产精品无码在线| 亚洲成a人无码| 久久综合桃花网| 黄色激情在线观看| 亚洲性图第一页| 北京富婆泄欲对白| 91精品国产高清91久久久久久 | av黄色免费网站| 我和岳m愉情xxxⅹ视频| 免费a在线观看播放| 国产亚洲精品精品精品| 无码少妇一区二区| 五月婷婷综合激情网| 国产天堂av在线| 成人观看免费视频| 国产肉体xxxx裸体784大胆| 久久国产精品无码一级毛片| 91黄色免费视频| 一本在线免费视频| 91 在线视频| 亚洲国产第一区| 三级电影在线看| av电影网站在线观看| 无码国产精品久久一区免费| 欧美大喷水吹潮合集在线观看| 懂色av粉嫩av蜜乳av| 三级全黄做爰视频| 欧美精品黑人猛交高潮| 国产黄a三级三级| 国产精品815.cc红桃| 亚洲一区二区三区黄色| 女同久久另类69精品国产| 午夜在线观看一区| 亚洲精品理论片| 久久中文字幕人妻| 亚洲色图欧美日韩| 国产性猛交96| 国产一级免费片| 性色av浪潮av| 99re这里只有| 蜜桃无码一区二区三区| 亚洲国产欧美视频| 亚洲精品视频大全| 手机看片福利视频| 欧美黄色aaa| 亚洲一二三四五| 妺妺窝人体色WWW精品| 亚洲av综合一区二区| 一本一本久久a久久| 夫妻性生活毛片| 大地资源二中文在线影视观看| 在线视频 日韩| 久草福利资源在线| 最新中文字幕日本| 亚洲女同二女同志奶水| 91丨porny丨九色| 五月激情四射婷婷| 亚洲一二三四五| 伊人在线视频观看| 天天摸日日摸狠狠添| 特种兵之深入敌后| 欧美性x x x| 俄罗斯毛片基地| 爱爱免费小视频| 永久免费未满蜜桃| 日韩欧美123区| 亚洲精品自拍视频在线观看| 国产精品无码永久免费不卡| 日本黄色www| 色婷婷狠狠18禁久久| 婷婷久久综合网| www.99re6| 成人免费精品动漫网站| 中国一级片在线观看| 欧美另类69xxxx| 国产精品嫩草影院俄罗斯| 欧美xxxx精品| 欧美三级日本三级| 中国免费黄色片| 37p粉嫩大胆色噜噜噜| 久久国产柳州莫菁门| 中文字幕免费在线看线人动作大片| 青青草成人免费视频| 最新日韩免费视频| 绯色av蜜臀vs少妇| theav精尽人亡av| 亚洲综合网在线| 性少妇bbw张开| 黄色av电影网站| 久久久久久久久福利| www.四虎精品| 久久久久亚洲av无码麻豆| 亚洲第九十七页| 在线观看欧美一区二区| av电影在线不卡| 最新版天堂资源在线| 国产黄色的视频| 五月婷婷六月香| wwwwww日本| 久久亚洲AV无码专区成人国产| 91porn在线视频| 国产精品一级无码| 青青草原在线免费观看| 日日噜噜夜夜狠狠久久波多野| 中文字幕 亚洲一区| 大乳护士喂奶hd| av直播在线观看| 人人妻人人澡人人爽人人精品 | 亚洲一区二区三区三州| 长河落日免费高清观看| 中文字幕求饶的少妇| 国产大屁股喷水视频在线观看| 嘿嘿视频在线观看| 永久免费毛片在线观看| www.xx日本| 看片网站在线观看| 先锋资源av在线| 国产精品一二三区在线观看| 日韩av在线看免费观看| 日日碰狠狠添天天爽| 一二三区视频在线观看| 91丝袜在线观看| 少妇被躁爽到高潮无码文| 99热这里只有精品2| a级在线观看视频| 国产高清视频免费在线观看| 欧美做受高潮中文字幕| 一级特黄曰皮片视频| 视频免费在线观看| 欧美体内she精高潮| 性欧美精品男男| 波多野结衣加勒比| 色婷婷在线视频观看| 免费看黄色av| 夫妇露脸对白88av| 成人影视免费观看| 久久精品女同亚洲女同13| 在线观看网站黄| 佐佐木明希电影| 亚洲成人精品在线播放| 中文字幕乱妇无码av在线| 国产这里有精品| aaa黄色大片| 黑人巨大精品欧美| 性欧美13一14内谢|