美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇

IEMS 5730代做、c++,Java語言編程代寫

時間:2024-03-12  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the
submitted homework.
I declare that the assignment submitted on Elearning system is original
except for source material explicitly acknowledged, and that the same or
related material has not been previously submitted for another course. I
also acknowledge that I am aware of University policy and regulations on
honesty in academic work, and of the disciplinary guidelines and
procedures applicable to breaches of such policy and regulations, as
contained in the website
http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________
Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must
be created COMPLETELY by oneself ALONE. A student may not share ANY written work or
pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has
discussed or worked with. If the answer includes content from any other source, the
student MUST STATE THE SOURCE. Failure to do so is cheating and will result in
sanctions. Copying answers from someone else is cheating even if one lists their name(s) on
the homework.
If there is information you need to solve a problem, but the information is not stated in the
problem, try to find the data somewhere. If you cannot find it, state what data you need,
make a reasonable estimate of its value, and justify any assumptions you make. You will be
graded not only on whether your answer is correct, but also on whether you have done an
intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.
Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of
Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in
books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference
[1] and [2] to download the two datasets. Each line in these two files has the following format
(TAB separated):
bigram year match_count volume_count
An example for 1-grams would be:
circumvallate 1978 335 91
circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall,
from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop
cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over
the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7]
to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per
year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared.
Assume the data set contains all the 1-grams in the last 100 years, and the above
records are the only records for the word ‘circumvallate’. Then the average value is:
(335 + 261) / 2 = 298,
instead of
(335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences
per year along with their corresponding average values sorted in descending order. If
multiple bigrams have the same average value, write down anyone you like (that is,
break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform
this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance
between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your
Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive
2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop
cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with
the same datasets stored in the HDFS. Rerun the Pig script in this cluster and
compare the performance between Pig and Hive in terms of overall run-time and
explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small
subset of the data instead of the whole data set. Once your Hive commands/ scripts
work as desired, you can then run them up on the complete data set.
Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in
the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is
aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this
homework, you will implement a similar-users-detection algorithm for the online movie rating
system. Basically, users who rate similar scores for the same movies may have common
tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this
homework, the similarity between a given pair of users (e.g. A and B) is measured as the
total number of movies both A and B have watched divided by the total number of
movies watched by either A or B. The following is the formal definition of similarity: Let
M(A) be the set of all the movies user A has watched. Then the similarity between user A
and user B is defined as:
………..(**) 𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) =
|𝑀(𝐴)∩𝑀(𝐵)|
|𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented
by its unique userID and each movie is represented by its unique movieID. The format of the
data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google
Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of
movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the
list of the 10 pairs of users having the largest number of movies watched by
both users in the pair within the corresponding dataset. The format of your
answer should be as follows:
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標(biāo)簽:

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:COMP 315代寫、Java程序語言代做
  • 下一篇:代做CSCI 2525、c/c++,Java程序語言代寫
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇
    性高潮免费视频| 日本午夜在线观看| 精品国产乱码久久久久夜深人妻| 黄色工厂在线观看| 国产精品久久无码| 国产熟女高潮一区二区三区 | 男人在线观看视频| 午夜精品一区二区三区视频| 极品色av影院| 无码人妻丰满熟妇区毛片蜜桃精品| 日本成人免费在线观看| 一起操在线播放| 最新中文字幕av| www..com.cn蕾丝视频在线观看免费版| 一级黄色性视频| 国产免费嫩草影院| 国产chinesehd精品露脸| 97中文字幕在线观看| xxxxx在线观看| 日本亚洲一区二区三区| 三级视频网站在线观看| 成人无码av片在线观看| 黄色片子免费看| 蜜桃精品一区二区| 国产精品视频一区二区三| 激情综合激情五月| 在线观看免费黄色网址| 蜜臀av粉嫩av懂色av| 国产精品理论在线| 中文在线观看免费视频| av片在线免费看| 波多野结衣先锋影音| 黄色片网站在线播放| 欧美肉大捧一进一出免费视频 | 最新国产精品自拍| 免费看污片网站| 18禁一区二区三区| 顶级黑人搡bbw搡bbbb搡| v天堂中文在线| 久久高清内射无套| 国产传媒在线看| 蜜桃传媒一区二区亚洲av| 丰满少妇中文字幕| 日韩三级在线观看视频| 久久无码人妻精品一区二区三区| 三级在线观看免费大全| www亚洲色图| 少妇真人直播免费视频| 丝袜熟女一区二区三区| 美女福利视频在线观看| 女人十八毛片嫩草av| 国产av自拍一区| 久久久久亚洲av无码专区桃色| 午夜诱惑痒痒网| 亚洲区 欧美区| 中文字幕在线观看视频www| www.av视频| 女王人厕视频2ⅴk| 五月天丁香社区| 亚洲少妇一区二区三区| 香蕉久久久久久av成人| 俄罗斯女人裸体性做爰| 国产人妻精品午夜福利免费| 欧美激情精品久久久久久免费| 国产又粗又猛又爽又黄| 久久久国产一级片| 小早川怜子一区二区的演员表| 国产精品一区二区亚洲| 欧美精品久久久久久久久46p| 国产成人免费在线观看视频| 日韩精品123区| 免费看的av网站| 日本不卡视频一区| 免费中文字幕av| 美女av免费看| 国产亚洲色婷婷久久| 日韩av无码一区二区三区不卡| 大尺度在线观看| 自拍偷拍视频亚洲| 男人晚上看的视频| 国产大学生av| 这里只有久久精品| av女名字大全列表| 精品无码人妻一区二区免费蜜桃 | 久久午夜福利电影| 性高潮久久久久久| 国产视频不卡在线| 欧美xxxx日本和非洲| 蜜桃av免费看| 日本成人在线免费| 欧美亚洲色综久久精品国产| 99热这里只有精品2| 无码少妇精品一区二区免费动态| 国产成人精品视频免费| 黑丝av在线播放| 爱爱视频免费在线观看| 90岁老太婆乱淫| 中文字幕永久免费| 五月天婷婷丁香网| 年下总裁被打光屁股sp| 在线观看福利片| 男人添女人荫蒂国产| 青青草自拍偷拍| 在线观看一区二区三区四区| 精品无码人妻一区| 日本中文在线视频| 免费a级黄色片| 中文字幕一区二区人妻电影丶| 粉嫩av蜜桃av蜜臀av| 美国一级片在线观看| 捆绑裸体绳奴bdsm亚洲| 亚洲色偷偷综合亚洲av伊人| 日韩av片在线免费观看| 波多野结衣av在线观看| 日韩高清一二三区| 亚洲一二三四五六区| free性中国hd国语露脸| caoporn91| 任你操精品视频| aaaaa级少妇高潮大片免费看| av影片在线播放| 一区二区三区在线播放视频| 欲求不满的岳中文字幕| 日韩福利小视频| 久久国产免费视频| 中文字幕人妻一区二| 第一次破处视频| 日韩欧美综合视频| 夜夜春很很躁夜夜躁| 污污免费在线观看| 天天躁日日躁狠狠躁av| 熟女av一区二区| 欧美性生给视频| 国精产品一区一区| 纪美影视在线观看电视版使用方法| 91av在线免费| 黄色片网站免费| 欧美大波大乳巨大乳| 免费毛片视频网站| 蜜桃av乱码一区二区三区| av天堂一区二区| 日本一区二区视频在线播放| 免费福利视频网站| 欧洲美女女同性互添| 日韩免费av一区| 91狠狠综合久久久| 日韩网站在线播放| 亚洲色图欧美色| 久久精品一区二区三区四区五区| www.av免费| 国产精品91av| 美女福利视频网| 伊人久久久久久久久| 美女又爽又黄视频毛茸茸| 免费一级做a爰片久久毛片潮| 国产高潮流白浆| 国产极品一区二区| 青青青手机在线视频| 精品人体无码一区二区三区| 四虎精品免费视频| 黄色在线观看av| 久久爱一区二区| 国产精品边吃奶边做爽| 国产免费无码一区二区视频| 88av在线播放| 久久噜噜色综合一区二区| 中文字幕无码毛片免费看| 国产全是老熟女太爽了| 东京热av一区| 成人18视频免费69| 黄色国产在线观看| 黄视频网站免费看| 亚洲天堂av中文字幕| 中国黄色片视频| 亚洲做受高潮无遮挡| 在线看的片片片免费| av小说在线观看| 调教驯服丰满美艳麻麻在线视频 | 色哟哟无码精品一区二区三区| 免费网站在线高清观看| 先锋资源av在线| 99久久免费看精品国产一区| 在线免费日韩av| 国产视频不卡在线| 日韩人妻无码精品综合区| 免费视频91蜜桃| 国产小视频你懂的| 自拍偷拍视频亚洲| 国产男女猛烈无遮挡a片漫画| 99热这里只有精品2| 黄色一级大片在线免费观看| 欧美人与禽zoz0善交| 性欧美13一14内谢| www.超碰97| 精品伦一区二区三区| 三级黄色片网站| 国产精品久久不卡| 泷泽萝拉在线播放| av男人的天堂av| 国精产品一区一区二区三区mba|