美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇

代做IEMS 5730、代寫 c++,Java 程序設(shè)計(jì)

時(shí)間:2024-03-11  來源:  作者: 我要糾錯(cuò)



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請(qǐng)加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標(biāo)簽:

掃一掃在手機(jī)打開當(dāng)前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級(jí)風(fēng)景名勝區(qū)
    昆明西山國家級(jí)風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗(yàn)證碼平臺(tái) 理財(cái) WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號(hào)-3 公安備 42010502001045

    美女扒开腿免费视频_蜜桃传媒一区二区亚洲av_先锋影音av在线_少妇一级淫片免费放播放_日本泡妞xxxx免费视频软件_一色道久久88加勒比一_熟女少妇一区二区三区_老司机免费视频_潘金莲一级黄色片_精品国产精品国产精品_黑人巨大猛交丰满少妇
    国产黄a三级三级| 中文字幕国产综合| www色com| 中文字幕日韩三级片| 国模无码视频一区| 精品人妻无码中文字幕18禁| 天天操夜夜操av| 国产精品嫩草影院俄罗斯| 农村黄色一级片| 国产伦精品一区二区三区妓女下载 | 人人澡人人澡人人看| 欧美日韩国产一二三区| 亚洲综合视频网站| 2025中文字幕| 少妇大叫太粗太大爽一区二区| 无码熟妇人妻av| 色哟哟一一国产精品| 手机在线免费看片| 中文字幕免费在线播放| 女女互磨互喷水高潮les呻吟| 先锋影音av在线| 俄罗斯女人裸体性做爰| 国产真实乱人偷精品| 性の欲びの女javhd| 日本中文在线视频| 国产精品成人一区二区三区电影毛片 | 日本中文字幕有码| 亚洲综合网在线观看| 91精品国产闺蜜国产在线闺蜜| 美女搡bbb又爽又猛又黄www| av男人的天堂av| 在线看黄色的网站| 一区二区视频免费看| 国产伦理片在线观看| 亚洲香蕉中文网| 成年人网站在线观看视频| 手机免费看av片| 69久久精品无码一区二区| 欧美大片免费播放器| 黄色一级片中国| 国产一区二区三区四区在线| 日本69式三人交| 日本黄色一级网站| 毛片久久久久久| 国产全是老熟女太爽了| 动漫性做爰视频| 国产又粗又长免费视频| 午夜av免费看| 亚洲麻豆一区二区三区| av黄色免费网站| 人妻丰满熟妇aⅴ无码| 这里只有精品在线观看视频 | 中文字幕免费高清| 中文字幕在线看高清电影| 国产高清成人久久| v天堂中文在线| 久久99久久99精品免费看小说| 欧美黄色一级生活片| 日韩人妻无码一区二区三区| 久久国产美女视频| 国产一区二区视频在线观看免费| 波多野结衣欲乱| 久久国产波多野结衣| 中文字幕电影av| 师生出轨h灌满了1v1| 动漫美女无遮挡免费| 特级特黄刘亦菲aaa级| 四虎永久免费观看| 久久国产精品影院| 国产又粗又长免费视频| 国产精品国产三级国产传播| 免费成年人视频在线观看| 国产探花在线播放| 免费无码一区二区三区| 亚洲综合网在线观看| 国产在线观看免费视频软件| 91日韩中文字幕| 丰满大乳奶做爰ⅹxx视频| 四虎永久免费在线观看| 日韩av片在线免费观看| 永久看看免费大片| 欧美性xxxx图片| 久久精品日韩无码| 国产麻豆xxxvideo实拍| 一级免费黄色录像| 国产精品无码在线| 亚洲成人生活片| 少妇精品无码一区二区免费视频| 免费人成视频在线播放| 亚洲一级中文字幕| 国产精品嫩草69影院| 手机看片日韩av| 永久免费看黄网站| 国产大学生视频| 少妇毛片一区二区三区| 国精产品一区二区三区| 99久久99久久精品国产| 91中文字幕永久在线| 亚洲成a人无码| 手机在线免费看毛片| 国产aⅴ激情无码久久久无码| 精品人妻伦九区久久aaa片| 性猛交娇小69hd| 亚洲国产第一区| 中文在线观看免费视频| a在线视频播放观看免费观看| 天堂在线中文视频| 免费黄色在线视频| 美女100%无挡| 一级少妇精品久久久久久久| 99久久99久久精品国产| 野战少妇38p| 久久久久久久久久久影视| 少妇无套高潮一二三区| 成人啪啪18免费游戏链接| b站大片免费直播| www.男人天堂| 蜜臀视频在线观看| 少妇献身老头系列| 免费在线黄色网| 人妻少妇精品一区二区三区| 一本在线免费视频| 久久日免费视频| 国产精品久久久免费看| 女同久久另类69精品国产| 懂色av蜜桃av| 国产成人精品视频免费| 中文字幕求饶的少妇| 在线免费日韩av| 成人免费看片载| 中文幕无线码中文字蜜桃| 天堂资源在线视频| 亚洲人与黑人屁股眼交| 男女性高潮免费网站| av地址在线观看| 中文字幕在线免费看线人| 国产一区二区三区在线视频观看| 欧美美女性生活视频| 日本老熟俱乐部h0930| 日本黄色免费观看| 亚洲一级黄色录像| 亚洲区 欧美区| 久久午夜福利电影| 无码人妻丰满熟妇啪啪网站| 麻豆国产精品一区| 女人18毛片毛片毛片毛片区二 | 日本女人黄色片| 亚洲激情 欧美| 国产成人免费在线观看视频| avtt中文字幕| 国产jjizz一区二区三区视频| 欧美另类videoxo高潮| 亚洲欧美视频在线播放| 国产精品成人69xxx免费视频| 污污污www精品国产网站| 日本成人午夜影院| 亚洲视频天天射| 一本在线免费视频| 中出视频在线观看| 国产亚洲色婷婷久久| 亚洲成人黄色av| 国产精品成人99一区无码 | 久久久久亚洲av无码a片| 中国极品少妇xxxx| 国产又爽又黄网站| 久久久精品少妇| 国产av自拍一区| 中文字幕在线永久| av黄色在线免费观看| 丰满人妻一区二区三区53视频| 久久丫精品国产亚洲av不卡| 日本黄色www| 男女做暖暖视频| 天天做夜夜爱爱爱| 亚洲天堂一级片| 91麻豆精品久久毛片一级| 伊人网在线视频观看| 西西大胆午夜视频| 师生出轨h灌满了1v1| 日本少妇高清视频| www.99re7| 亚洲成人生活片| 日本少妇xxx| 免费黄色a级片| 天天插天天射天天干| 国产精品久久久久久久无码| 欧美xxxxx少妇| 亚洲永久精品ww.7491进入| 中日韩精品一区二区三区| 性色av蜜臀av浪潮av老女人 | 日韩精品卡通动漫网站| 少妇被狂c下部羞羞漫画| 超碰caoprom| 无码人妻精品一区二区中文| 91成年人网站| 看黄色录像一级片| 成年人av电影| 国产成人精品综合久久久久99 | 校园春色 亚洲| 女性生殖扒开酷刑vk|