forked from lintool/bigdata-2018w
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathassignment2-451.html
150 lines (116 loc) · 5.86 KB
/
assignment2-451.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<meta name="description" content="Course homepage for CS 451/651 431/631 Data-Intensive Distributed Computing (Winter 2018) at the University of Waterloo">
<meta name="author" content="Jimmy Lin">
<title>Data-Intensive Distributed Computing</title>
<!-- Bootstrap core CSS -->
<link href="css/bootstrap.min.css" rel="stylesheet">
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<link href="css/ie10-viewport-bug-workaround.css" rel="stylesheet">
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<script src="js/ie-emulation-modes-warning.js"></script>
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
<style>
body {
padding-top: 60px; /* 60px to make the container go all the way to the bottom of the topbar */
}
</style>
</head>
<body>
<nav class="navbar navbar-inverse navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav">
<li><a href="index.html">Overview</a></li>
<li><a href="organization.html">Organization</a></li>
<li><a href="syllabus.html">Syllabus</a></li>
<li class="active"><a href="assignments.html">Assignments</a></li>
<li><a href="software.html">Software</a></li>
</ul>
</div><!--/.nav-collapse -->
</div>
</nav>
<div class="container">
<div class="page-header">
<div style="float: right"><img width="250" src="images/waterloo_logo.png" alt="University of Waterloo logo"/></div>
<h1>Assignments <br/><small>Data-Intensive Distributed Computing (Winter 2018)</small></h1>
</div>
<p>Note that there separate sets of assignments for CS 451/651 and CS
431/631. Make sure you work on the correct asssignments!</p>
<p><a href="assignments-451.html" class="btn btn-success btn-large">CS 451/651 Assignments</a></p>
<div class="subnav">
<ul class="nav nav-pills">
<li><a href="assignment0-451.html">0</a></li>
<li><a href="assignment1-451.html">1</a></li>
<li><a href="assignment2-451.html">2</a></li>
<li><a href="assignment3-451.html">3</a></li>
<li><a href="assignment4-451.html">4</a></li>
<li><a href="assignment5-451.html">5</a></li>
<li><a href="assignment6-451.html">6</a></li>
<li><a href="assignment7-451.html">7</a></li>
<li><a href="project-451.html">Final Project</a></li>
</ul>
</div>
<h3>Assignment 2: Counting in Spark <small>due 2:30pm January 30</small></h3>
<p>In this assignment you will do two things:</p>
<ol>
<li>"Port" the MapReduce implementations of the bigram frequency count
program from <a href="http://bespin.io">Bespin</a> over to Spark (in
Scala).</li>
<li>"Port" the MapReduce implementations
of <a href="assignment1-451.html">assignment 1</a> over to Spark (in Scala).</li>
</ol>
<h4 style="padding-top: 10px">Bigram Relative Frequency</h4>
<p>Your starting points
are <code>ComputeBigramRelativeFrequencyPairs</code>
and <code>ComputeBigramRelativeFrequencyStripes</code> in
package <code>io.bespin.java.mapreduce.bigram</code> (in Java).
You are welcome to build on the <code>BigramCount</code> (Scala)
implementation <a href="https://github.com/lintool/bespin/blob/master/src/main/scala/io/bespin/scala/spark/bigram/BigramCount.scala">here</a>
for tokenization and "boilerplate" code like command-line argument
parsing. To be consistent in tokenization, you should use the
<code>Tokenizer</code> trait
<a href="https://github.com/lintool/bespin/blob/master/src/main/scala/io/bespin/scala/util/Tokenizer.scala">here</a>.</p>
<p>Put your code in the
package <code>ca.uwaterloo.cs451.a2</code>. Since
you'll be writing Scala code, your source files should go
into <code>src/main/scala/ca/uwaterloo/cs451/a2/</code>.
The repository is designed so that Scala/Spark code will also
compile with the same Maven build command:</p>
<pre>
$ mvn clean package
</pre>
<p>Following the Java implementations, you will write both a "pairs"
and a "stripes" implementation in Spark. Not that although Spark has a
different API than MapReduce, the algorithmic concepts are still very
much applicable. Your pairs and stripes implementation should follow
the same logic as in the MapReduce implementations. In particular,
your program should only take one pass through the input data.</p>
<p style="padding-top: 20px"><a href="#">Back to top</a></p>
<div style="padding-bottom: 100px"></div>
</div><!-- /.container -->
<!-- Placed at the end of the document so the pages load faster -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
<script src="js/bootstrap.min.js"></script>
<!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
<script src="js/ie10-viewport-bug-workaround.js"></script>
</body>
</html>