日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

代做IEMS 5730、代寫 c++,Java 程序設(shè)計

時間:2024-03-11  來源:  作者: 我要糾錯



IEMS 5730 Spring 2024 Homework 2
Release date: Feb 23, 2024
Due date: Mar 11, 2024 (Monday) 11:59:00 pm
We will discuss the solution soon after the deadline. No late homework will be accepted!
Every Student MUST include the following statement, together with his/her signature in the submitted homework.
I declare that the assignment submitted on Elearning system is original except for source material explicitly acknowledged, and that the same or related material has not been previously submitted for another course. I also acknowledge that I am aware of University policy and regulations on honesty in academic work, and of the disciplinary guidelines and procedures applicable to breaches of such policy and regulations, as contained in the website http://www.cuhk.edu.hk/policy/academichonesty/.
Signed (Student_________________________) Date:______________________________ Name_________________________________ SID_______________________________
Submission notice:
● Submit your homework via the elearning system.
● All students are required to submit this assignment.
General homework policies:
A student may discuss the problems with others. However, the work a student turns in must be created COMPLETELY by oneself ALONE. A student may not share ANY written work or pictures, nor may one copy answers from any source other than one’s own brain.
Each student MUST LIST on the homework paper the name of every person he/she has discussed or worked with. If the answer includes content from any other source, the student MUST STATE THE SOURCE. Failure to do so is cheating and will result in sanctions. Copying answers from someone else is cheating even if one lists their name(s) on the homework.
If there is information you need to solve a problem, but the information is not stated in the problem, try to find the data somewhere. If you cannot find it, state what data you need, make a reasonable estimate of its value, and justify any assumptions you make. You will be graded not only on whether your answer is correct, but also on whether you have done an intelligent analysis.
Submit your output, explanation, and your commands/ scripts in one SINGLE pdf file.

 Q1 [20 marks + 5 Bonus marks]: Basic Operations of Pig
You are required to perform some simple analysis using Pig on the n-grams dataset of Google books. An ‘n-gram’ is a phrase with n words. The dataset lists all n-grams present in books from books.google.com along with some statistics.
In this question, you only use the Google books bigram (1-grams). Please go to Reference [1] and [2] to download the two datasets. Each line in these two files has the following format (TAB separated):
bigram year match_count
An example for 1-grams would be:
volume_count
circumvallate 1978 335 91 circumvallate 1979 261 95
This means that in 1978(1979), the word "circumvallate" occurred 335(261) times overall, from 91(95) distinct books.
(a) [Bonus 5 marks] Install Pig in your Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Pig 0.17.0 over the master node of your Hadoop cluster :
http://pig.apache.org/docs/r0.17.0/start.html#Pig+Setup
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7] to complete the following parts of the question:
(b) [5 marks] Upload these two files to HDFS and join them into one table.
(c) [5 marks] For each unique bigram, compute its average number of occurrences per year. In the above example, the result is:
circumvallate (335 + 261) / 2 = 298
Notes: The denominator is the number of years in which that word has appeared. Assume the data set contains all the 1-grams in the last 100 years, and the above records are the only records for the word ‘circumvallate’. Then the average value is:
 instead of
(335 + 261) / 2 = 298, (335 + 261) / 100 = 5.96
(d) [10 marks] Output the 20 bigrams with the highest average number of occurrences per year along with their corresponding average values sorted in descending order. If multiple bigrams have the same average value, write down anyone you like (that is,

 break ties as you wish).
You need to write a Pig script to perform this task and save the output into HDFS.
Hints:
● This problem is very similar to the word counting example shown in the lecture notes
of Pig. You can use the code there and just make some minor changes to perform this task.
Q2 [20 marks + 5 bonus marks]: Basic Operations of Hive
In this question, you are asked to repeat Q1 using Hive and then compare the performance between Hive and Pig.
(a) [Bonus 5 marks] Install Hive on top of your own Hadoop cluster. You can reuse your Hadoop cluster in IEMS 5730 HW#0 and refer to the following link to install Hive 2.3.8 over the master node of your Hadoop cluster.
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
Submit the screenshot(s) of your installation process.
If you choose not to do the bonus question in (a), you can use any well-installed Hadoop cluster, e.g., the IE DIC, or the Hadoop cluster provided by the Google Cloud/AWS [5, 6, 7].
(b) [20 marks] Write a Hive script to perform exactly the same task as that of Q1 with the same datasets stored in the HDFS. Rerun the Pig script in this cluster and compare the performance between Pig and Hive in terms of overall run-time and explain your observation.
Hints:
● Hive will store its tables on HDFS and those locations needs to be bootstrapped:
$ hdfs dfs -mkdir /tmp
$ hdfs dfs -mkdir /user/hive/warehouse
$ hdfs dfs -chmod g+w /tmp
$ hdfs dfs -chmod g+w /user/hive/warehouse
● While working with the interactive shell (or otherwise), you should first test on a small subset of the data instead of the whole data set. Once your Hive commands/ scripts work as desired, you can then run them up on the complete data set.
 
 Q3 [30 marks + 10 Bonus marks]: Similar Users Detection in the MovieLens Dataset using Pig
Similar user detection has drawn lots of attention in the machine learning field which is aimed at grouping users with similar interests, behaviors, actions, or general patterns. In this homework, you will implement a similar-users-detection algorithm for the online movie rating system. Basically, users who rate similar scores for the same movies may have common tastes or interests and be grouped as similar users.
To detect similar users, we need to calculate the similarity between each user pair. In this homework, the similarity between a given pair of users (e.g. A and B) is measured as the total number of movies both A and B have watched divided by the total number of movies watched by either A or B. The following is the formal definition of similarity: Let M(A) be the set of all the movies user A has watched. Then the similarity between user A and user B is defined as:
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝐴, 𝐵) = |𝑀(𝐴)∩𝑀(𝐵)| ...........(**) |𝑀(𝐴)∪𝑀(𝐵)|
where |S| means the cardinality of set S.
(Note: if |𝑀(𝐴)∪𝑀(𝐵)| = 0, we set the similarity to be 0.)
The following figure illustrates the idea:
Two datasets [3][4] with different sizes are provided by MovieLens. Each user is represented by its unique userID and each movie is represented by its unique movieID. The format of the data set is as follows:
<userID>, <movieID>
Write a program in Pig to detect the TOP K similar users for each user. You can use the
  
 cluster you built for Q1 and Q2 or you can use the IE DIC or one provided by the Google Cloud/AWS [5, 6, 7].
(a) [10 marks] For each pair of users in the dataset [3] and [4], output the number of movies they have both watched.
For your homework submission, you need to submit i) the Pig script and ii) the list of the 10 pairs of users having the largest number of movies watched by both users in the pair within the corresponding dataset. The format of your answer should be as follows:
<userID A>, <userID B>, <the number of movie both A and B have watched> //top 1 ...
<userID X>, <userID Y>, <the number of movie both X and Y have watched> //top 10
(b) [20 marks] By modifying/ extending part of your codes in part (a), find the Top-K (K=3) most similar users (as defined by Equation (**)) for every user in the datasets [3], [4]. If multiple users have the same similarity, you can just pick any three of them.
(c)
Hint:
1. In part (b), to facilitate the computation of the similarity measure as
defined in (**), you can use the inclusion-exclusion principle, i.e.
請加QQ:99515681  郵箱:99515681@qq.com   WX:codehelp 

標簽:

掃一掃在手機打開當(dāng)前頁
  • 上一篇:&#160;ICT239 代做、代寫 java/c/c++程序
  • 下一篇:代寫COMP9334 Capacity Planning of Computer
  • 無相關(guān)信息
    昆明生活資訊

    昆明圖文信息
    蝴蝶泉(4A)-大理旅游
    蝴蝶泉(4A)-大理旅游
    油炸竹蟲
    油炸竹蟲
    酸筍煮魚(雞)
    酸筍煮魚(雞)
    竹筒飯
    竹筒飯
    香茅草烤魚
    香茅草烤魚
    檸檬烤魚
    檸檬烤魚
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明西山國家級風(fēng)景名勝區(qū)
    昆明旅游索道攻略
    昆明旅游索道攻略
  • 短信驗證碼平臺 理財 WPS下載

    關(guān)于我們 | 打賞支持 | 廣告服務(wù) | 聯(lián)系我們 | 網(wǎng)站地圖 | 免責(zé)聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 kmw.cc Inc. All Rights Reserved. 昆明網(wǎng) 版權(quán)所有
    ICP備06013414號-3 公安備 42010502001045

    日本欧洲视频一区_国模极品一区二区三区_国产熟女一区二区三区五月婷_亚洲AV成人精品日韩一区18p

              9000px;">

                        久青草视频在线观看| 欧美极品视频在线观看| 亚洲天堂久久新| 最近中文字幕无免费| 亚洲AV无码片久久精品| 中文字幕在线2019| www.成人免费视频| 国产一级淫片a| 刘亦菲毛片一区二区三区| 日韩av一二三区| 影音先锋黄色网址| 日韩欧美在线视频播放| 熟女少妇内射日韩亚洲| 亚洲精品成人区在线观看| jlzzjlzz亚洲女人18| 狠狠人妻久久久久久综合| 久久性爱视频网站| www.天堂av.com| 国产偷人爽久久久久久老妇app| 国产一二三四在线| 日本久久综合网| 亚洲黄色片免费| 国产高潮久久久| 欧美在线一级片| 亚洲天堂视频网站| 精品国产视频一区二区三区| 十八禁一区二区三区| 亚洲调教欧美在线| 国产亚洲成人精品| 特级毛片www| 成人午夜视频一区二区播放| 欧美精品一级片| 亚洲精品午夜视频| 黄色av小说在线观看| 五月婷婷综合久久| 国产精品xxxxxx| 天堂在线观看视频| 亚洲第一中文av| 丰满少妇高潮在线观看| 日本免费观看网站| 91国产精品视频在线观看| 另类小说色综合| 91黄色在线视频| 日本二区在线观看| 97人妻人人揉人人躁人人| 欧美性猛交xxx乱久交| 69视频免费看| 久久久久久无码午夜精品直播| 色婷婷在线视频| 国产精品传媒在线观看| 色欲一区二区三区精品a片| www.免费av| 午夜影院免费体验区| 国产视频91在线| 在线观看国产精品入口男同| 黄色在线观看国产| 中文字幕在线观看视频一区| 中文字幕 自拍| 日本三级午夜理伦三级三| 亚洲男人天堂网址| 久久只有这里有精品| 一本一道久久a久久综合蜜桃| 老女人性生活视频| 超碰成人在线播放| 先锋资源在线视频| 看欧美ab黄色大片视频免费| 97在线公开视频| 亚洲成人福利视频| 欧美熟妇精品黑人巨大一二三区| 成人三级视频在线观看| 午夜精品一区二区三区视频| 国产馆在线观看| 亚洲免费av一区二区三区| 日本一级特级毛片视频| 国产伦理一区二区| 亚洲中文字幕无码av| 日韩特黄一级片| 亚洲黄色在线播放| 蜜桃无码一区二区三区| 91久久国产视频| 在线观看你懂的网站| 日本一区二区不卡在线| 久草视频免费在线播放| 伊人久久久久久久久久久久| 国产伦精品一区二区三区88av| 亚洲国产成人在线观看| 色婷婷激情视频| 老熟妇仑乱一区二区av| 韩国视频一区二区三区| 国产福利短视频| eeuss中文字幕| 亚洲欧美激情一区二区三区| 亚洲aⅴ在线观看| 人人妻人人澡人人爽| 美女av免费看| 久久久久中文字幕亚洲精品| 国产精品一区二区羞羞答答| 91蝌蚪视频在线观看| 午夜影院免费在线观看| 国产精品自在自线| a级片免费观看| 亚洲一区 中文字幕| 在线观看国产黄| 亚洲第一精品在线观看| 午夜天堂在线视频| 性欧美18一19性猛交| 香蕉视频黄色在线观看| 无码人妻aⅴ一区二区三区玉蒲团| 日本成人免费在线观看| 青春草免费视频| 日产精品久久久久久久| 日本三级视频在线| 国产黄色片在线| 国产精品久久久久久69| 国产精品久久久久精| 国产欧美第一页| 黄色a级三级三级三级| 国内av在线播放| 国产中文字幕视频| 人妻精品无码一区二区| 日韩影视一区二区三区| 欧美成人aaa片一区国产精品| 久久精品国产亚洲av久| 精品无码一区二区三区电影桃花| 国产中文字幕免费| 欧美性猛交乱大交| 日本人视频jizz页码69| 午夜视频福利在线| 亚洲熟女综合色一区二区三区| 91嫩草丨国产丨精品| 国产jjizz一区二区三区视频| 国产精品二区视频| 久草手机视频在线观看| 人妻熟人中文字幕一区二区| 神马久久久久久久久久久| 中文字幕第31页| 99国产在线播放| 韩国av在线免费观看| 日本护士做爰视频| 中文字幕人妻一区二区在线视频| 99热国产在线观看| 精品一区二区三区人妻| 日本中文字幕有码| 亚洲日本精品视频| 国产微拍精品一区| 日韩一区二区三区四区视频| 中文字幕一区二区人妻| 国产大尺度视频| 欧美日韩国产精品综合| 在线免费视频一区| 国产福利在线导航| 欧美熟妇精品黑人巨大一二三区| 亚洲av无码一区二区三区人| av黄色免费在线观看| 久久精品一级片| 中国黄色片视频| 国产精品主播一区二区| 日本一级一片免费视频| 亚洲一区二区三区四区精品| 黄色激情小视频| 亚洲乱码国产乱码精品精软件| 亚洲欧美精品久久| 国产精品系列视频| 色欲av永久无码精品无码蜜桃| 一二三四在线观看视频| 精品一区二区三孕妇视频| 亚洲第一免费视频| 国产婷婷色一区二区在线观看| 色播五月综合网| www男人天堂| 日韩va在线观看| 99久久99久久精品国产| 美国黑人一级大黄| 一级特黄色大片| 蜜桃av中文字幕| 亚洲视频在线观看一区二区| 久久aaaa片一区二区| 中文字幕+乱码+中文| 国产一级18片视频| 中文天堂在线播放| 好吊一区二区三区视频| 午夜精品免费观看| 国产女人高潮时对白| 中文字幕在线看人| 欧美国产日韩另类| www.成人精品| 色乱码一区二区三区在线| 国产农村妇女精品一区| 中文字幕二区三区| 欧美丰满少妇人妻精品| 一级片视频免费| 日日噜噜夜夜狠狠| 国产一级免费片| 亚洲熟妇一区二区三区| 欧美成人精品网站| 福利所第一导航| 中文字幕有码视频| 免费看一级一片| 国产精品国产三级国产aⅴ| 在线观看国产精品视频|