diff --git a/README.md b/README.md
index 4dfa34037..577d1614e 100644
--- a/README.md
+++ b/README.md
@@ -13,7 +13,7 @@ The python notebooks are written in [Jupyter](http://jupyter.org/).
   [github](https://github.com/dmlc/mxnet-notebooks/blob/master/python/outline.ipynb)
   or
   [nbviewer](http://nbviewer.jupyter.org/github/dmlc/mxnet-notebooks/blob/master/python/outline.ipynb). But
-  note that the former may be failed to render a page, while the latter has
+  note that the former may fail to render a page, while the latter has
   delays to view the recent changes.
 
 - **Run** We can run and modify these notebooks if both [mxnet](http://mxnet.io/get_started/index.html#setup-and-installation) and [jupyter](http://jupyter.org/) are
@@ -23,21 +23,21 @@ The python notebooks are written in [Jupyter](http://jupyter.org/).
 
   1.  Launch a g2.2xlarge or p2.2xlarge instance by using AMI `ami-fe217de9` on N. Virginia (us-east-1). This AMI is built by using  [this script](https://gist.github.com/mli/b64322f446b2043e3350ddcbfa5957be). Remember to open the TCP port 8888 in the security group.
 
-  2.  Once launch is succeed, setup the following variable with proper value
+  2.  Once the instance successfully launches, setup the following variables with proper values:
 
     ```bash
-      export HOSTNAME=ec2-107-22-159-132.compute-1.amazonaws.com
-      export PERM=~/Downloads/my.pem
+      export HOSTNAME=
+      export PERM=
     ```
 
-   3. Now we should be able to ssh to the machine by
+   3. Now, we should be able to ssh to the machine by
 
       ```bash
         chmod 400 $PERM
         ssh -i $PERM -L 8888:localhost:8888 ubuntu@HOSTNAME
       ```
 
-      Here we forward the EC2 machine's 8888 port into localhost.
+      Here we forward the EC2 machine's 8888 port to local port 8888.
 
    4. Clone this repo on the EC2 machine and run jupyter
 
@@ -47,7 +47,66 @@ The python notebooks are written in [Jupyter](http://jupyter.org/).
       ```
    	  We can optional run `~/update_mxnet.sh` to update MXNet to the newest version.
 
-   5. Now we are able to view and edit the notebooks on the browser using the URL: http://localhost:8888/tree/mxnet-notebooks/python/outline.ipynb
+   5. Now, we are able to view and edit the notebooks from the browser using the URL: http://localhost:8888/tree/mxnet-notebooks/python/outline.ipynb
+
+
+### Scala
+
+The scala notebooks are written in [Jupyter](http://jupyter.org/) using [Jupyter-Scala Kernel V0.3.x](https://github.com/alexarchambault/jupyter-scala).
+
+- **Run** We can run and modify these notebooks if both [mxnet scala package](http://mxnet.io/get_started/index.html#setup-and-installation), [jupyter](http://jupyter.org/) and Jupyter-Scala Kernel are installed. There are various options for jupyter scala kernel. You can choose whichever you like.
+
+  If you have a AWS account, here is an easier way to run the notebooks:
+
+  1.  Launch a g2.2xlarge or p2.2xlarge instance by using AMI `ami-fe217de9` on N. Virginia (us-east-1). This AMI is built by using  [this script](https://gist.github.com/mli/b64322f446b2043e3350ddcbfa5957be). Remember to open the TCP port 8888 in the security group.
+
+  2.  Once the instance successfully launches, setup the following variables with proper values:
+
+    ```bash
+      export HOSTNAME=
+      export PERM=
+    ```
+
+   3. Now we should be able to ssh to the machine by
+
+      ```bash
+        chmod 400 $PERM
+        ssh -i $PERM -L 8888:localhost:8888 ubuntu@HOSTNAME
+      ```
+
+      Here we forward the EC2 machine's 8888 port to local port 8888.
+
+    4. Install [Maven](https://gist.github.com/sebsto/19b99f1fa1f32cae5d00). Install [Scala 2.11.8](https://www.scala-lang.org/files/archive/scala-2.11.8.rpm). Go to MXNet source code, compile scala-package by running command `make scalapkg`. Compiled jar file will be created in `mxnet/scala-package/assembly/{your-architecture}/target` directory. 
+
+    5. Install [coursier](https://github.com/coursier/coursier), a Scala library to fetch dependencies from Maven / Ivy repositories as follows.  
+
+	    On OS X, `brew install --HEAD paulp/extras/coursier`
+	    On Linux, 
+
+	    ```bash
+	      curl -L -o coursier https://git.io/vgvpD && chmod +x coursier && ./coursier --help
+	    ```
+
+	    Make sure coursier launcher is available in the PATH.
+
+    6. Install [Jupyter-Scala Kernel V0.3.x](https://github.com/alexarchambault/jupyter-scala) by following the instructions given below: 
+
+      ```bash
+      	git clone https://github.com/alexarchambault/jupyter-scala.git
+      	git checkout 0.3.x
+      	./jupyter-scala
+      ```
+
+      To check if scala-kernel is installed, type command `jupyter kernelspec list`.
+
+    7. Clone this repo on the EC2 machine and run jupyter
+
+      ```bash
+        git clone https://github.com/dmlc/mxnet-notebooks
+        jupyter notebook
+      ```
+
+    8. Now we are able to view and edit the notebooks from the browser using the URL: http://localhost:8888/tree/mxnet-notebooks/scala/. Choose scala211 kernel if asked. Include mxnet-scala jar created in step-4 in classpath by command `classpath.addPath("jar-path")` in the notebook you want to run.
 
 
 ## How to develop
diff --git a/scala/basic/dataIterator_scala.ipynb b/scala/basic/dataIterator_scala.ipynb
new file mode 100644
index 000000000..cce485429
--- /dev/null
+++ b/scala/basic/dataIterator_scala.ipynb
@@ -0,0 +1,1114 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Loading Data\n",
+    "This tutorial we focus on how to feed the data into a training and inference program. Most training and inference modules in MXNet accepts data iterators, especially when reading large datasets from filesystems. MXNet uses an iterator to provide data to the neural network. Iterators do some preprocessing and generate batches for the neural network.\n",
+    "\n",
+    "MXNet provides basic iterators for MNIST and RecordIO images. To hide the cost of I/O, MXNet uses a prefetch strategy that enables parallelism for the learning process and data fetching. Data is automatically fetched by an independent thread. Here we discuss the API conventions and several provided iterators."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Basic Data Iterator\n",
+    "\n",
+    "MXNet's data iterator returns a batch of data in each `next` call. We first introduce what a data batch looks like and then how to write a basic data iterator.\n",
+    "\n",
+    "### Data Batch\n",
+    "A data batch often contains n examples and the according labels. Here n is often called as the batch size.\n",
+    "The following codes defines a valid data batch is able to be read by most training/inference modules."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.collection.immutable.ListMap\u001b[0m\n",
+       "defined \u001b[32mclass \u001b[36mDataBatch\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import scala.collection.immutable.ListMap\n",
+    "\n",
+    "class DataBatch(val data: IndexedSeq[NDArray],\n",
+    "                val label: IndexedSeq[NDArray],\n",
+    "                val index: IndexedSeq[Long],\n",
+    "                val pad: Int,\n",
+    "                // the key for the bucket that should be used for this batch,\n",
+    "                // for bucketing io only\n",
+    "                val bucketKey: AnyRef = null,\n",
+    "                // use ListMap to indicate the order of data/label loading\n",
+    "                // (must match the order of input data/label)\n",
+    "                private val providedData: ListMap[String, Shape] = null,\n",
+    "                private val providedLabel: ListMap[String, Shape] = null) {\n",
+    "  /**\n",
+    "   * Dispose its data and labels\n",
+    "   * The object shall never be used after it is disposed.\n",
+    "   */\n",
+    "  def dispose(): Unit = {\n",
+    "    if (data != null) {\n",
+    "      data.foreach(arr => if (arr != null) arr.dispose())\n",
+    "    }\n",
+    "    if (label != null) {\n",
+    "      label.foreach(arr => if (arr != null) arr.dispose())\n",
+    "    }\n",
+    "  }\n",
+    "\n",
+    "  // The name and shape of data\n",
+    "  def provideData: ListMap[String, Shape] = providedData\n",
+    "\n",
+    "  // The name and shape of label\n",
+    "  def provideLabel: ListMap[String, Shape] = providedLabel\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We explain what each attribute means:\n",
+    "- **data** is a list of NDArray, each of them has $n$ length first dimension. For example, if an example is an image with size $224 \\times 224$ and RGB channels, then the array shape should be (n, 3, 224, 244). Note that the image batch format used by MXNet is\n",
+    "\n",
+    "$$\\textrm{batch_size} \\times \\textrm{num_channel} \\times \\textrm{height} \\times \\textrm{width}$$ \n",
+    "\n",
+    "The channels are often in RGB order.\n",
+    "\n",
+    "Each array will be copied into a free variable of the Symbol later. The mapping from arrays to free variables should be given by the provide_data attribute of the iterator, which will be discussed shortly.\n",
+    "- **label** is also a list of NDArray. Often each NDArray is a 1-dimensional array with shape (n,). For classification, each class is represented by an integer starting from 0.\n",
+    "- **pad** is an integer shows how many examples are for merely used for padding, which should be ignored in the results. A nonzero padding is often used when we reach the end of the data and the total number of examples cannot be divided by the batch size.\n",
+    "- **providedData** is a ListMap of name and shape of the data.\n",
+    "- **providedLabel** is a ListMap of name and shape of the label."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "### Symbol and Data Variables\n",
+    "Before moving the iterator, we first look at how to find which variables in a Symbol are for input data. In MXNet, an operator (mx.sym.*) has one or more input variables and output variables; some operators may have additional auxiliary variables for internal states. For an input variable of an operator, if do not assign it with an output of another operator during creating this operator, then this input variable is free. We need to assign it with external data before running.\n",
+    "\n",
+    "The following codes define a simple multilayer perceptron (MLP) and then print all free variables."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mnumClasses\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m10\u001b[0m\n",
+       "\u001b[36mdata\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@30a65ea0\n",
+       "\u001b[36mfc1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@200fe851\n",
+       "\u001b[36mact1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@28fd3448\n",
+       "\u001b[36mfc2\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@671608bf\n",
+       "\u001b[36mmlp\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@65c719a3\n",
+       "\u001b[36mres2_6\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mString\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\n",
+       "  \u001b[32m\"data\"\u001b[0m,\n",
+       "  \u001b[32m\"fc1_weight\"\u001b[0m,\n",
+       "  \u001b[32m\"fc1_bias\"\u001b[0m,\n",
+       "  \u001b[32m\"fc2_weight\"\u001b[0m,\n",
+       "  \u001b[32m\"fc2_bias\"\u001b[0m,\n",
+       "  \u001b[32m\"softmax_label\"\u001b[0m\n",
+       ")\n",
+       "\u001b[36mres2_7\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mString\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\u001b[32m\"softmax_output\"\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val numClasses = 10\n",
+    "\n",
+    "val data = Symbol.Variable(\"data\")\n",
+    "val fc1 = Symbol.FullyConnected(name = \"fc1\")()(Map(\"data\" -> data, \"num_hidden\" -> 64))\n",
+    "val act1 = Symbol.Activation(name = \"relu1\")()(Map(\"data\" -> fc1, \"act_type\" -> \"relu\"))\n",
+    "val fc2 = Symbol.FullyConnected(name = \"fc2\")()(Map(\"data\" -> act1, \"num_hidden\" -> numClasses))\n",
+    "val mlp = Symbol.SoftmaxOutput(name = \"softmax\")()(Map(\"data\" -> fc2))\n",
+    "\n",
+    "mlp.listArguments()\n",
+    "mlp.listOutputs()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As can be seen, we name a variable either by its operator's name if it is atomic (e.g. Symbol.Variable(\"data\")) or by the opname_varname convention. The varname often means what this variable is for:\n",
+    "\n",
+    "- weight : the weight parameters\n",
+    "- bias : the bias parameters\n",
+    "- output : the output\n",
+    "- label : input label\n",
+    "\n",
+    "On the above example, now we know that there are 4 variables for parameters, and two for input data: data for examples and softmax_label for the according labels.\n",
+    "\n",
+    "The following example define a matrix factorization object function with rank 10 for recommendation systems. It has three input variables, user for user IDs, item for item IDs, and score is the rating user gives to item."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mnumUsers\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m1000\u001b[0m\n",
+       "\u001b[36mnumItems\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m1000\u001b[0m\n",
+       "\u001b[36mk\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m10\u001b[0m\n",
+       "\u001b[36muser\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@50a64721\n",
+       "\u001b[36mitem\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@6f09fabd\n",
+       "\u001b[36mscore\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@6cdc9943\n",
+       "\u001b[36muser1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@2971e218\n",
+       "\u001b[36mitem1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@4b0c2758\n",
+       "\u001b[36mpred0\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@6bf0e096\n",
+       "\u001b[36mpred1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@636e7a2a\n",
+       "\u001b[36mpred2\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@4529257d\n",
+       "\u001b[36mpred\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@6f35ec67"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val numUsers = 1000\n",
+    "val numItems = 1000\n",
+    "val k = 10 \n",
+    "\n",
+    "// input\n",
+    "val user = Symbol.Variable(\"user\")\n",
+    "val item = Symbol.Variable(\"item\")\n",
+    "val score = Symbol.Variable(\"score\")\n",
+    "\n",
+    "// user feature lookup\n",
+    "val user1 = Symbol.Embedding()()(Map(\"data\" -> user, \"input_dim\" -> numUsers, \"output_dim\" -> k))\n",
+    "\n",
+    "// item feature lookup\n",
+    "val item1 = Symbol.Embedding()()(Map(\"data\" -> item, \"input_dim\" -> numItems, \"output_dim\" -> k))\n",
+    "\n",
+    "// predict by the inner product, which is elementwise product and then sum\n",
+    "val pred0 = user1 * item1\n",
+    "val pred1 = Symbol.sum_axis()()(Map(\"data\" -> pred0, \"axis\" -> 1))\n",
+    "val pred2 = Symbol.Flatten()()(Map(\"data\" -> pred1))\n",
+    "\n",
+    "// loss layer\n",
+    "val pred = Symbol.LinearRegressionOutput()()(Map(\"data\" -> pred2, \"label\" -> score))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Data Iterators\n",
+    "Now we are ready to show how to create a valid MXNet data iterator. An iterator should extend DataIter class and override following methods:\n",
+    "\n",
+    "- **reset()** method to restart reading from the beginning\n",
+    "- **provideData()** to return a Listmap of (str, tuple) pairs, each pair stores an input data variable name and its shape. \n",
+    "- **provideLabel()** method to return a Listmap of (str, tuple) pairs, which provides information about input labels.\n",
+    "- **getData()** and **getLabel()** methods for getting data and label of current batch.\n",
+    "- **getPad()** for getting the number of padding examples.\n",
+    "- **getIndex()** for getting the index of current batch.\n",
+    "- **next()** method to return a data batch.\n",
+    "\n",
+    "The following codes define a simple iterator that return some random data each time"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mdataGen\u001b[0m\n",
+       "defined \u001b[32mfunction \u001b[36mlabelGen\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "    def dataGen(dim: Array[Int]) : Array[Array[Float]] ={\n",
+    "        val r = new scala.util.Random(100)\n",
+    "        Array.fill(dim(0), dim(1)) { 2*r.nextFloat-1 }\n",
+    "    }\n",
+    "    \n",
+    "    def labelGen(lowLimit: Int, highLimit: Int, dim: Int) : Array[Float] ={\n",
+    "        val r = new scala.util.Random(100)\n",
+    "        val label = for (i <- lowLimit+1 to dim) yield r.nextInt(highLimit).asInstanceOf[Float]\n",
+    "        label.toArray\n",
+    "    }"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mscala.collection.mutable.ArrayBuffer\u001b[0m\n",
+       "\u001b[36mnumBatches\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m10\u001b[0m\n",
+       "defined \u001b[32mclass \u001b[36mSimpleIter\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import scala.collection.mutable.ArrayBuffer\n",
+    "val numBatches: Int=10\n",
+    "\n",
+    "class SimpleIter(dataNames: String, dataShapes: Shape, dataGen: Array[Array[Float]],\n",
+    "                 labelNames: String, labelShapes: Shape, labelGen: Array[Float]) extends DataIter{\n",
+    "\n",
+    "    val _provideData = ListMap(dataNames -> dataShapes)\n",
+    "    val _provideLabel = ListMap(labelNames -> labelShapes)\n",
+    "    var curBatch = 0\n",
+    "\n",
+    "  // Get next data batch from iterator\n",
+    "  override def next(): DataBatch = {\n",
+    "    if (!hasNext) throw new NoSuchElementException\n",
+    "\n",
+    "      val data = Array(NDArray.array(dataGen.flatten.toArray, shape = dataShapes))\n",
+    "      val label = Array(NDArray.array(labelGen, shape = labelShapes))\n",
+    "      curBatch += 1\n",
+    "          \n",
+    "      new DataBatch(data=data,label=label, index=getIndex(), pad=getPad(), providedData=_provideData, providedLabel=_provideLabel)\n",
+    "  }    \n",
+    "    \n",
+    "  // reset the iterator    \n",
+    "  override def reset(): Unit = {\n",
+    "    curBatch = 0\n",
+    "  }\n",
+    "  // Check for next batch\n",
+    "  override def hasNext: Boolean = {\n",
+    "      curBatch < numBatches\n",
+    "  }\n",
+    "    \n",
+    "  override def batchSize: Int = numBatches\n",
+    "  // Get data of current batch\n",
+    "  override def getData(): IndexedSeq[NDArray] = IndexedSeq()\n",
+    "  // Get the index of current batch\n",
+    "  override def getIndex(): IndexedSeq[Long] = IndexedSeq[Long]()\n",
+    "  // Get label of current batch\n",
+    "  override def getLabel(): IndexedSeq[NDArray] = IndexedSeq()\n",
+    "  // Get the number of padding examples in current batch\n",
+    "  override def getPad(): Int = 0\n",
+    "  // The name and shape of data provided by this iterator\n",
+    "  override def provideData: ListMap[String, Shape] = _provideData\n",
+    "  // The name and shape of label provided by this iterator\n",
+    "  override def provideLabel: ListMap[String, Shape] = _provideLabel\n",
+    "\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can feed the data iterator into a training problem. Here we used the Module class, more details about this class is discussed in module.ipynb."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.module.{FitParams, Module}\u001b[0m\n",
+       "\u001b[36mn\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m32\u001b[0m\n",
+       "\u001b[36mdata\u001b[0m: \u001b[32mSimpleIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mmod\u001b[0m: \u001b[32mmodule\u001b[0m.\u001b[32mModule\u001b[0m = ml.dmlc.mxnet.module.Module@4f52fe71"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet.module.{FitParams, Module}\n",
+    "\n",
+    "val n = 32\n",
+    "val data = new SimpleIter(\"data\", Shape(n,100), \n",
+    "                  dataGen(Array(n,100)),\n",
+    "                  \"softmax_label\", Shape(n), \n",
+    "                  labelGen(0, numClasses, n))\n",
+    "\n",
+    "val mod = new Module(mlp)\n",
+    "mod.fit(data, numEpoch=5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "While for Symbol pred, we need to provide three inputs, two for examples and one for label. Refer to the MatrixFactorization tutorial to know more.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## More Iterators\n",
+    "MXNet provides multiple efficient data iterators as follows:\n",
+    "\n",
+    "### MNISTIter\n",
+    "MNISTIter is the easy way to iterate on the MNIST dataset. \n",
+    "\n",
+    "**Parameters:**\n",
+    "\n",
+    "- \"image\" and \"label\" - Dataset Param:  MNist Image and Label datapath\n",
+    "- \"batch_size\" (int, optional, default='128') – Batch Param: Batch Size.\n",
+    "- \"shuffle\" - Augmentation Param: Whether to shuffle data.\n",
+    "- \"flat\" (boolean, optional, default=False) – Augmentation Param: Whether to flat the data into 1D.\n",
+    "- \"seed\" (int, optional, default='0') – Augmentation Param: Random Seed.\n",
+    "- \"silent\" (boolean, optional, default=False) – Auxiliary Param: Whether to print out data info.\n",
+    "- \"num_parts (int, optional, default='1') – partition the data into multiple parts\n",
+    "- \"part_index\" (int, optional, default='0') – the index of the part will read\n",
+    "- \"prefetch_buffer\" (long (non-negative), optional, default=4) – Maximal Number of batches to prefetch\n",
+    "- \"dtype\" ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. None means no change"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mparams\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mString\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"silent\"\u001b[0m -> \u001b[32m\"0\"\u001b[0m,\n",
+       "  \u001b[32m\"seed\"\u001b[0m -> \u001b[32m\"10\"\u001b[0m,\n",
+       "  \u001b[32m\"flat\"\u001b[0m -> \u001b[32m\"1\"\u001b[0m,\n",
+       "  \u001b[32m\"image\"\u001b[0m -> \u001b[32m\"data/train-images-idx3-ubyte\"\u001b[0m,\n",
+       "  \u001b[32m\"label\"\u001b[0m -> \u001b[32m\"data/train-labels-idx1-ubyte\"\u001b[0m,\n",
+       "  \u001b[32m\"shuffle\"\u001b[0m -> \u001b[32m\"1\"\u001b[0m,\n",
+       "  \u001b[32m\"data_shape\"\u001b[0m -> \u001b[32m\"(784,)\"\u001b[0m,\n",
+       "  \u001b[32m\"batch_size\"\u001b[0m -> \u001b[32m\"100\"\u001b[0m\n",
+       ")\n",
+       "\u001b[36mmnistPack\u001b[0m: \u001b[32mDataPack\u001b[0m = \u001b[33mMXDataPack\u001b[0m(\n",
+       "  ml.dmlc.mxnet.DataBatch@1c83152e,\n",
+       "  ml.dmlc.mxnet.DataBatch@18820148,\n",
+       "  ml.dmlc.mxnet.DataBatch@1410a0a5,\n",
+       "  ml.dmlc.mxnet.DataBatch@45dfe674,\n",
+       "  ml.dmlc.mxnet.DataBatch@4171b184,\n",
+       "  ml.dmlc.mxnet.DataBatch@497170a3,\n",
+       "  ml.dmlc.mxnet.DataBatch@58f5f4a0,\n",
+       "  ml.dmlc.mxnet.DataBatch@6223558c,\n",
+       "  ml.dmlc.mxnet.DataBatch@2e1235dd,\n",
+       "  ml.dmlc.mxnet.DataBatch@6ca4bcd4,\n",
+       "  ml.dmlc.mxnet.DataBatch@1b030514,\n",
+       "  ml.dmlc.mxnet.DataBatch@63f4bccd,\n",
+       "  ml.dmlc.mxnet.DataBatch@5c77d1b3,\n",
+       "  ml.dmlc.mxnet.DataBatch@15fc84f5,\n",
+       "  ml.dmlc.mxnet.DataBatch@16d418fb,\n",
+       "  ml.dmlc.mxnet.DataBatch@5000cc38,\n",
+       "  ml.dmlc.mxnet.DataBatch@321875c2,\n",
+       "  ml.dmlc.mxnet.DataBatch@43b29458,\n",
+       "  ml.dmlc.mxnet.DataBatch@75975f15,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mnBatch\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m600\u001b[0m\n",
+       "\u001b[36mbatchCount\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m600\u001b[0m\n",
+       "\u001b[36mmnistIter\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mprovideData\u001b[0m: \u001b[32mListMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mShape\u001b[0m] = \u001b[33mMap\u001b[0m(\u001b[32m\"data\"\u001b[0m -> (100,784))\n",
+       "\u001b[36mprovideLabel\u001b[0m: \u001b[32mListMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mShape\u001b[0m] = \u001b[33mMap\u001b[0m(\u001b[32m\"label\"\u001b[0m -> (100))\n",
+       "\u001b[36mres8_12\u001b[0m: \u001b[32mDataBatch\u001b[0m = ml.dmlc.mxnet.DataBatch@4e9f4579\n",
+       "\u001b[36mlabel0\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m9.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m9.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m6.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m6.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m9.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mdata0\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mres8_15\u001b[0m: \u001b[32mDataBatch\u001b[0m = ml.dmlc.mxnet.DataBatch@55a394de\n",
+       "\u001b[36mres8_16\u001b[0m: \u001b[32mDataBatch\u001b[0m = ml.dmlc.mxnet.DataBatch@6123a79e\n",
+       "\u001b[36mres8_17\u001b[0m: \u001b[32mDataBatch\u001b[0m = ml.dmlc.mxnet.DataBatch@639de812\n",
+       "\u001b[36mres8_19\u001b[0m: \u001b[32mDataBatch\u001b[0m = ml.dmlc.mxnet.DataBatch@b0fdedf\n",
+       "\u001b[36mlabel1\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m9.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m9.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m6.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m6.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m9.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mdata1\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val params = Map(\n",
+    "      \"image\" -> \"data/train-images-idx3-ubyte\",\n",
+    "      \"label\" -> \"data/train-labels-idx1-ubyte\",\n",
+    "      \"data_shape\" -> \"(784,)\",\n",
+    "      \"batch_size\" -> \"100\",\n",
+    "      \"shuffle\" -> \"1\",\n",
+    "      \"flat\" -> \"1\",\n",
+    "      \"silent\" -> \"0\",\n",
+    "      \"seed\" -> \"10\"\n",
+    "    )\n",
+    "\n",
+    "    val mnistPack = IO.MNISTPack(params)\n",
+    "\n",
+    "    val nBatch = 600\n",
+    "    var batchCount = 0\n",
+    "    for(batch <- mnistPack) {\n",
+    "      batchCount += 1\n",
+    "    }\n",
+    "\n",
+    "    // create DataIter\n",
+    "    val mnistIter = mnistPack.iterator\n",
+    "    // get the name and shape of data provided by this iterator \n",
+    "    val provideData = mnistIter.provideData\n",
+    "    // get the name and shape of label provided by this iterator \n",
+    "    val provideLabel = mnistIter.provideLabel\n",
+    "     \n",
+    "    // reset the iterator\n",
+    "    mnistIter.reset()\n",
+    "    batchCount = 0\n",
+    "    // check if iterator has next batch of data\n",
+    "    while (mnistIter.hasNext) {\n",
+    "      mnistIter.next()\n",
+    "      batchCount += 1\n",
+    "    }\n",
+    " \n",
+    "    mnistIter.reset()\n",
+    "    // get next data batch from iterator\n",
+    "    mnistIter.next()\n",
+    "    // get label of current batch\n",
+    "    val label0 = mnistIter.getLabel().head.toArray\n",
+    "    // get data of current batch\n",
+    "    val data0 = mnistIter.getData().head.toArray\n",
+    "    mnistIter.next()\n",
+    "    mnistIter.next()\n",
+    "    mnistIter.next()\n",
+    "    mnistIter.reset()\n",
+    "    mnistIter.next()\n",
+    "    val label1 = mnistIter.getLabel().head.toArray\n",
+    "    val data1 = mnistIter.getData().head.toArray\n",
+    "  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### ImageRecordIter\n",
+    "ImageRecordIter is for iterating on image RecordIO files\n",
+    "It read images batches from RecordIO files with a rich of data augmentation options.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mparams\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mString\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"prefetch_buffer\"\u001b[0m -> \u001b[32m\"1\"\u001b[0m,\n",
+       "  \u001b[32m\"path_imgrec\"\u001b[0m -> \u001b[32m\"data/cifar/train.rec\"\u001b[0m,\n",
+       "  \u001b[32m\"mean_img\"\u001b[0m -> \u001b[32m\"data/cifar/cifar10_mean.bin\"\u001b[0m,\n",
+       "  \u001b[32m\"and_mirror\"\u001b[0m -> \u001b[32m\"False\"\u001b[0m,\n",
+       "  \u001b[32m\"shuffle\"\u001b[0m -> \u001b[32m\"False\"\u001b[0m,\n",
+       "  \u001b[32m\"preprocess_threads\"\u001b[0m -> \u001b[32m\"4\"\u001b[0m,\n",
+       "  \u001b[32m\"rand_crop\"\u001b[0m -> \u001b[32m\"False\"\u001b[0m,\n",
+       "  \u001b[32m\"data_shape\"\u001b[0m -> \u001b[32m\"(3,28,28)\"\u001b[0m,\n",
+       "  \u001b[32m\"batch_size\"\u001b[0m -> \u001b[32m\"100\"\u001b[0m\n",
+       ")\n",
+       "\u001b[36mimgRecIter\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mnBatch\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m500\u001b[0m\n",
+       "\u001b[36mbatchCount\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m500\u001b[0m\n",
+       "\u001b[36mprovideData\u001b[0m: \u001b[32mListMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mShape\u001b[0m] = \u001b[33mMap\u001b[0m(\u001b[32m\"data\"\u001b[0m -> (100,3,28,28))\n",
+       "\u001b[36mprovideLabel\u001b[0m: \u001b[32mListMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mShape\u001b[0m] = \u001b[33mMap\u001b[0m(\u001b[32m\"label\"\u001b[0m -> (100))\n",
+       "\u001b[36mres9_9\u001b[0m: \u001b[32mDataBatch\u001b[0m = ml.dmlc.mxnet.DataBatch@5233832d\n",
+       "\u001b[36mlabel0\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m7.0F\u001b[0m,\n",
+       "  \u001b[32m6.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m7.0F\u001b[0m,\n",
+       "  \u001b[32m9.0F\u001b[0m,\n",
+       "  \u001b[32m6.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m5.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m5.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mdata0\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m11.52652F\u001b[0m,\n",
+       "  \u001b[32m10.147156F\u001b[0m,\n",
+       "  \u001b[32m8.638443F\u001b[0m,\n",
+       "  \u001b[32m7.039444F\u001b[0m,\n",
+       "  \u001b[32m6.5186005F\u001b[0m,\n",
+       "  \u001b[32m5.9982452F\u001b[0m,\n",
+       "  \u001b[32m6.3482666F\u001b[0m,\n",
+       "  \u001b[32m6.867447F\u001b[0m,\n",
+       "  \u001b[32m6.450226F\u001b[0m,\n",
+       "  \u001b[32m6.224579F\u001b[0m,\n",
+       "  \u001b[32m5.1456604F\u001b[0m,\n",
+       "  \u001b[32m5.121048F\u001b[0m,\n",
+       "  \u001b[32m6.208969F\u001b[0m,\n",
+       "  \u001b[32m7.3796997F\u001b[0m,\n",
+       "  \u001b[32m7.333359F\u001b[0m,\n",
+       "  \u001b[32m7.2532196F\u001b[0m,\n",
+       "  \u001b[32m6.3181F\u001b[0m,\n",
+       "  \u001b[32m5.5006866F\u001b[0m,\n",
+       "  \u001b[32m6.7429657F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val params = Map(\n",
+    "      \"path_imgrec\" -> \"data/cifar/train.rec\",\n",
+    "      \"mean_img\" -> \"data/cifar/cifar10_mean.bin\",\n",
+    "      \"rand_crop\" -> \"False\",\n",
+    "      \"rand_mirror\" -> \"False\",\n",
+    "      \"shuffle\" -> \"False\",\n",
+    "      \"data_shape\" -> \"(3,28,28)\",\n",
+    "      \"batch_size\" -> \"100\",\n",
+    "      \"preprocess_threads\" -> \"4\",\n",
+    "      \"prefetch_buffer\" -> \"1\"\n",
+    "    )\n",
+    "    val imgRecIter = IO.ImageRecordIter(params)\n",
+    "    val nBatch = 500\n",
+    "    var batchCount = 0\n",
+    "    // test provideData\n",
+    "    val provideData = imgRecIter.provideData\n",
+    "    val provideLabel = imgRecIter.provideLabel\n",
+    "    \n",
+    "    // Reset the iterator\n",
+    "    imgRecIter.reset()\n",
+    "    while (imgRecIter.hasNext) {\n",
+    "      imgRecIter.next()\n",
+    "      batchCount += 1\n",
+    "    }\n",
+    "\n",
+    "    imgRecIter.reset()\n",
+    "    // Get next batch of iterator\n",
+    "    imgRecIter.next()\n",
+    "    // Get label of current batch\n",
+    "    val label0 = imgRecIter.getLabel().head.toArray\n",
+    "    // Get data of current batch\n",
+    "    val data0 = imgRecIter.getData().head.toArray\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### ResizeIter\n",
+    "Resize a DataIter to given number of batches per epoch. May produce incomplete batch in the middle of an epoch due to padding from internal iterator.\n",
+    "\n",
+    "It takes input arguments **dataIter**(Internal data iterator), **reSize**(number of batches per epoch to resize to) and **resetInternal**(whether to reset internal iterator on ResizeIter.reset) and returns resizeIterator.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.io.{NDArrayIter, ResizeIter, PrefetchingIter}\u001b[0m\n",
+       "\u001b[36mparams\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mString\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"silent\"\u001b[0m -> \u001b[32m\"0\"\u001b[0m,\n",
+       "  \u001b[32m\"seed\"\u001b[0m -> \u001b[32m\"10\"\u001b[0m,\n",
+       "  \u001b[32m\"flat\"\u001b[0m -> \u001b[32m\"1\"\u001b[0m,\n",
+       "  \u001b[32m\"image\"\u001b[0m -> \u001b[32m\"data/train-images-idx3-ubyte\"\u001b[0m,\n",
+       "  \u001b[32m\"label\"\u001b[0m -> \u001b[32m\"data/train-labels-idx1-ubyte\"\u001b[0m,\n",
+       "  \u001b[32m\"shuffle\"\u001b[0m -> \u001b[32m\"1\"\u001b[0m,\n",
+       "  \u001b[32m\"data_shape\"\u001b[0m -> \u001b[32m\"(784,)\"\u001b[0m,\n",
+       "  \u001b[32m\"batch_size\"\u001b[0m -> \u001b[32m\"100\"\u001b[0m\n",
+       ")\n",
+       "\u001b[36mmnistIter\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mnBatch\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m400\u001b[0m\n",
+       "\u001b[36mbatchCount\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m400\u001b[0m\n",
+       "\u001b[36mresizeIter\u001b[0m: \u001b[32mio\u001b[0m.\u001b[32mResizeIter\u001b[0m = empty iterator"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet.io.{NDArrayIter, ResizeIter, PrefetchingIter}\n",
+    "\n",
+    "val params = Map(\n",
+    "      \"image\" -> \"data/train-images-idx3-ubyte\",\n",
+    "      \"label\" -> \"data/train-labels-idx1-ubyte\",\n",
+    "      \"data_shape\" -> \"(784,)\",\n",
+    "      \"batch_size\" -> \"100\",\n",
+    "      \"shuffle\" -> \"1\",\n",
+    "      \"flat\" -> \"1\",\n",
+    "      \"silent\" -> \"0\",\n",
+    "      \"seed\" -> \"10\"\n",
+    "    )\n",
+    "\n",
+    "    val mnistIter = IO.MNISTIter(params)\n",
+    "    val nBatch = 400\n",
+    "    var batchCount = 0\n",
+    "\n",
+    "    // Resize a Mnist data iterator\n",
+    "    val resizeIter = new ResizeIter(mnistIter, nBatch, false)\n",
+    "\n",
+    "    while(resizeIter.hasNext) {\n",
+    "      resizeIter.next()\n",
+    "      batchCount += 1\n",
+    "    }\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### PrefetchIter\n",
+    "\n",
+    "Performs pre-fetch for other data iterators. Takes one or more DataIters and combine them with prefetching.\n",
+    "\n",
+    "This iterator will create another thread to perform next() and then store the data in memory. It potentially accelerates the data read, at the cost of more memory usage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mparams\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mString\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"silent\"\u001b[0m -> \u001b[32m\"0\"\u001b[0m,\n",
+       "  \u001b[32m\"seed\"\u001b[0m -> \u001b[32m\"10\"\u001b[0m,\n",
+       "  \u001b[32m\"flat\"\u001b[0m -> \u001b[32m\"1\"\u001b[0m,\n",
+       "  \u001b[32m\"image\"\u001b[0m -> \u001b[32m\"data/train-images-idx3-ubyte\"\u001b[0m,\n",
+       "  \u001b[32m\"label\"\u001b[0m -> \u001b[32m\"data/train-labels-idx1-ubyte\"\u001b[0m,\n",
+       "  \u001b[32m\"shuffle\"\u001b[0m -> \u001b[32m\"1\"\u001b[0m,\n",
+       "  \u001b[32m\"data_shape\"\u001b[0m -> \u001b[32m\"(784,)\"\u001b[0m,\n",
+       "  \u001b[32m\"batch_size\"\u001b[0m -> \u001b[32m\"100\"\u001b[0m\n",
+       ")\n",
+       "\u001b[36mmnistPack1\u001b[0m: \u001b[32mDataPack\u001b[0m = \u001b[33mMXDataPack\u001b[0m(\n",
+       "  ml.dmlc.mxnet.DataBatch@206a0ce5,\n",
+       "  ml.dmlc.mxnet.DataBatch@19b9b2f4,\n",
+       "  ml.dmlc.mxnet.DataBatch@60961087,\n",
+       "  ml.dmlc.mxnet.DataBatch@aa498fd,\n",
+       "  ml.dmlc.mxnet.DataBatch@7ad9b068,\n",
+       "  ml.dmlc.mxnet.DataBatch@2e2383d5,\n",
+       "  ml.dmlc.mxnet.DataBatch@7ee1acbe,\n",
+       "  ml.dmlc.mxnet.DataBatch@50acb0ef,\n",
+       "  ml.dmlc.mxnet.DataBatch@67411062,\n",
+       "  ml.dmlc.mxnet.DataBatch@55ce1a74,\n",
+       "  ml.dmlc.mxnet.DataBatch@2639c82f,\n",
+       "  ml.dmlc.mxnet.DataBatch@13272fcf,\n",
+       "  ml.dmlc.mxnet.DataBatch@7c0aefc9,\n",
+       "  ml.dmlc.mxnet.DataBatch@59325786,\n",
+       "  ml.dmlc.mxnet.DataBatch@31a2843f,\n",
+       "  ml.dmlc.mxnet.DataBatch@1bd18c93,\n",
+       "  ml.dmlc.mxnet.DataBatch@300e5c87,\n",
+       "  ml.dmlc.mxnet.DataBatch@7bcba367,\n",
+       "  ml.dmlc.mxnet.DataBatch@5e6d435d,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mmnistPack2\u001b[0m: \u001b[32mDataPack\u001b[0m = \u001b[33mMXDataPack\u001b[0m(\n",
+       "  ml.dmlc.mxnet.DataBatch@1f423ad2,\n",
+       "  ml.dmlc.mxnet.DataBatch@3bd5463b,\n",
+       "  ml.dmlc.mxnet.DataBatch@35e15e6d,\n",
+       "  ml.dmlc.mxnet.DataBatch@377e824,\n",
+       "  ml.dmlc.mxnet.DataBatch@dedd632,\n",
+       "  ml.dmlc.mxnet.DataBatch@1c18ad2a,\n",
+       "  ml.dmlc.mxnet.DataBatch@23b58af2,\n",
+       "  ml.dmlc.mxnet.DataBatch@1f3f6068,\n",
+       "  ml.dmlc.mxnet.DataBatch@7c0079fb,\n",
+       "  ml.dmlc.mxnet.DataBatch@25a8faac,\n",
+       "  ml.dmlc.mxnet.DataBatch@2a4516f1,\n",
+       "  ml.dmlc.mxnet.DataBatch@4e9d1ff1,\n",
+       "  ml.dmlc.mxnet.DataBatch@312d7878,\n",
+       "  ml.dmlc.mxnet.DataBatch@53b2996b,\n",
+       "  ml.dmlc.mxnet.DataBatch@51c2ef72,\n",
+       "  ml.dmlc.mxnet.DataBatch@7706102c,\n",
+       "  ml.dmlc.mxnet.DataBatch@2db2580c,\n",
+       "  ml.dmlc.mxnet.DataBatch@6a8cf510,\n",
+       "  ml.dmlc.mxnet.DataBatch@2c732e4c,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mnBatch\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m600\u001b[0m\n",
+       "\u001b[36mbatchCount\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m600\u001b[0m\n",
+       "\u001b[36mmnistIter1\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mmnistIter2\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mprefetchIter\u001b[0m: \u001b[32mPrefetchingIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mprovideData\u001b[0m: \u001b[32mListMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mShape\u001b[0m] = \u001b[33mMap\u001b[0m(\u001b[32m\"data1\"\u001b[0m -> (100,784), \u001b[32m\"data2\"\u001b[0m -> (100,784))\n",
+       "\u001b[36mprovideLabel\u001b[0m: \u001b[32mListMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mShape\u001b[0m] = \u001b[33mMap\u001b[0m(\u001b[32m\"label1\"\u001b[0m -> (100), \u001b[32m\"label2\"\u001b[0m -> (100))\n",
+       "\u001b[36mres11_12\u001b[0m: \u001b[32mDataBatch\u001b[0m = ml.dmlc.mxnet.DataBatch@1941f220\n",
+       "\u001b[36mlabel0\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m7.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m5.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m7.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m7.0F\u001b[0m,\n",
+       "  \u001b[32m9.0F\u001b[0m,\n",
+       "  \u001b[32m5.0F\u001b[0m,\n",
+       "  \u001b[32m6.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m7.0F\u001b[0m,\n",
+       "  \u001b[32m5.0F\u001b[0m,\n",
+       "  \u001b[32m8.0F\u001b[0m,\n",
+       "  \u001b[32m5.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mdata0\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val params = Map(\n",
+    "      \"image\" -> \"data/train-images-idx3-ubyte\",\n",
+    "      \"label\" -> \"data/train-labels-idx1-ubyte\",\n",
+    "      \"data_shape\" -> \"(784,)\",\n",
+    "      \"batch_size\" -> \"100\",\n",
+    "      \"shuffle\" -> \"1\",\n",
+    "      \"flat\" -> \"1\",\n",
+    "      \"silent\" -> \"0\",\n",
+    "      \"seed\" -> \"10\"\n",
+    "    )\n",
+    "\n",
+    "    val mnistPack1 = IO.MNISTPack(params)\n",
+    "    val mnistPack2 = IO.MNISTPack(params)\n",
+    "\n",
+    "    val nBatch = 600\n",
+    "    var batchCount = 0\n",
+    "\n",
+    "    val mnistIter1 = mnistPack1.iterator\n",
+    "    val mnistIter2 = mnistPack2.iterator\n",
+    "\n",
+    "    var prefetchIter = new PrefetchingIter(\n",
+    "        IndexedSeq(mnistIter1, mnistIter2),\n",
+    "        IndexedSeq(Map(\"data\" -> \"data1\"), Map(\"data\" -> \"data2\")),\n",
+    "        IndexedSeq(Map(\"label\" -> \"label1\"), Map(\"label\" -> \"label2\"))\n",
+    "    )\n",
+    "\n",
+    "    // Check for next batch\n",
+    "    while(prefetchIter.hasNext) {\n",
+    "      prefetchIter.next()\n",
+    "      batchCount += 1\n",
+    "    }\n",
+    "\n",
+    "    // The name and shape of data provided by this iterator\n",
+    "    val provideData = prefetchIter.provideData\n",
+    "    // The name and shape of label provided by this iterator\n",
+    "    val provideLabel = prefetchIter.provideLabel\n",
+    "\n",
+    "    prefetchIter.reset()\n",
+    "    prefetchIter.next()\n",
+    "    val label0 = prefetchIter.getLabel().head.toArray\n",
+    "    val data0 = prefetchIter.getData().head.toArray\n",
+    "\n",
+    "    prefetchIter.dispose()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### NDArrayIter\n",
+    "\n",
+    "NDArrayIter is for iterating on NDArray. NDArray is a basic ndarray/Tensor like data structure in mxnet. \n",
+    "It takes following parameters:\n",
+    "- **data**(NDArrayIter supports single or multiple data and label)\n",
+    "- **label**(Same as data, but is not fed to the model during testing)\n",
+    "- **dataBatchSize**(Batch Size)\n",
+    "- **shuffle**(Whether to shuffle the data) \n",
+    "- **lastBatchHandle** (\"pad\", \"discard\" or \"roll_over\").- How to handle the last batch.\n",
+    "\n",
+    "This iterator will pad, discard or roll over the last batch if the size of data does not match batch_size. Roll over is intended for training and can cause problems if used for prediction."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mshape0\u001b[0m: \u001b[32mShape\u001b[0m = (1000,2,2)\n",
+       "\u001b[36mdata\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@bc757ea0, ml.dmlc.mxnet.NDArray@a1db81b0)\n",
+       "\u001b[36mshape1\u001b[0m: \u001b[32mShape\u001b[0m = (1000,1)\n",
+       "\u001b[36mlabel\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@fc450a2b)\n",
+       "\u001b[36mbatchData0\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@137c6494\n",
+       "\u001b[36mbatchData1\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@4ea5965d\n",
+       "\u001b[36mbatchLabel\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@85e2cd0f\n",
+       "\u001b[36mdataIter0\u001b[0m: \u001b[32mNDArrayIter\u001b[0m = empty iterator\n",
+       "\u001b[36mbatchCount\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m7\u001b[0m\n",
+       "\u001b[36mnBatch0\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m8\u001b[0m\n",
+       "\u001b[36mdataIter1\u001b[0m: \u001b[32mNDArrayIter\u001b[0m = empty iterator\n",
+       "\u001b[36mnBatch1\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m7\u001b[0m\n",
+       "\u001b[36mdataIter2\u001b[0m: \u001b[32mNDArrayIter\u001b[0m = empty iterator"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val shape0 = Shape(Array(1000, 2, 2))\n",
+    "    val data = IndexedSeq(NDArray.ones(shape0), NDArray.zeros(shape0))\n",
+    "    val shape1 = Shape(Array(1000, 1))\n",
+    "    val label = IndexedSeq(NDArray.ones(shape1))\n",
+    "    val batchData0 = NDArray.ones(Shape(Array(128, 2, 2)))\n",
+    "    val batchData1 = NDArray.zeros(Shape(Array(128, 2, 2)))\n",
+    "    val batchLabel = NDArray.ones(Shape(Array(128, 1)))\n",
+    "\n",
+    "    // lastBatchHandle = pad\n",
+    "    val dataIter0 = new NDArrayIter(data, label, 128, false, \"pad\")\n",
+    "    var batchCount = 0\n",
+    "    val nBatch0 = 8\n",
+    "    while(dataIter0.hasNext) {\n",
+    "      val tBatch = dataIter0.next()\n",
+    "      batchCount += 1\n",
+    "     }\n",
+    "\n",
+    "    // lastBatchHandle = discard\n",
+    "    val dataIter1 = new NDArrayIter(data, label, 128, false, \"discard\")\n",
+    "    val nBatch1 = 7\n",
+    "    batchCount = 0\n",
+    "    while(dataIter1.hasNext) {\n",
+    "      val tBatch = dataIter1.next()\n",
+    "      batchCount += 1\n",
+    "    }\n",
+    "\n",
+    "    // empty label (for prediction)\n",
+    "    val dataIter2 = new NDArrayIter(data = data, dataBatchSize = 128, lastBatchHandle = \"discard\")\n",
+    "    batchCount = 0\n",
+    "    while(dataIter2.hasNext) {\n",
+    "      val tBatch = dataIter2.next()\n",
+    "      batchCount += 1\n",
+    "    }\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Implementation\n",
+    "Iterators can be implemented in either C++ or front-end languages such as Python. The C++ definition is at [include/mxnet/io.h](https://github.com/dmlc/mxnet/blob/master/include/mxnet/io.h), all C++ implementations are located in [src/io](https://github.com/dmlc/mxnet/tree/master/src/io). These implementations heavily rely on [dmlc-core](https://github.com/dmlc/dmlc-core), which supports reading data from various data format and filesystems.\n",
+    "\n",
+    "## Further Readings\n",
+    "- [Data loading API](http://mxnet.io/api/scala/io.html)\n",
+    "- [Design of efficient data format](http://mxnet.io/architecture/note_data_loading.html)"
+   ]
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/basic/image_io_scala.ipynb b/scala/basic/image_io_scala.ipynb
new file mode 100644
index 000000000..cce36e581
--- /dev/null
+++ b/scala/basic/image_io_scala.ipynb
@@ -0,0 +1,449 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Image Data IO\n",
+    "This tutorial explains how to prepare, load and train with image data in MXNet. All IO in MXNet is handled via IO.DataIter and its subclasses, which is explained [here](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/data.ipynb). In this tutorial we focus on how to use pre-built data iterators as while as custom iterators to process image data.\n",
+    "\n",
+    "There are mainly three ways of loading image data in MXNet:\n",
+    "- IO.ImageRecordIter: implemented in backend (C++), less customizable but can be used in all language bindings, load from .rec files\n",
+    "- Custom iterator by inheriting IO.DataIter\n",
+    "\n",
+    "First, we explain the record io file format used by mxnet:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## RecordIO\n",
+    "Record IO is the main file format used by MXNet for data IO. It supports reading and writing on various file systems including distributed file systems like Hadoop HDFS and AWS S3. First, we download the Caltech 101 dataset that contains 101 classes of objects and convert them into record io format:\n",
+    "\n",
+    "Download and unzip the Image Dataset as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mMXNET_HOME\u001b[0m: \u001b[32mString\u001b[0m = \u001b[32m\"/home/ec2-user/src/mxnet\"\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// change this to your mxnet location\n",
+    "val MXNET_HOME = \"/home/ec2-user/src/mxnet\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36msys.process._\u001b[0m\n",
+       "\u001b[36mres2_1\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import sys.process._\n",
+    "\"wget http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz -P data/ -q\"!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres3\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "\"tar -xzf data/101_ObjectCategories.tar.gz -C data/\"!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's take a look at the data. As you can see, under the root folder every category has a subfolder.\n",
+    "\n",
+    "Now let's convert them into record io format. First we need to make a list that contains all the image files and their categories:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "BACKGROUND_Google 0\n",
+      "Faces 1\n",
+      "Faces_easy 2\n",
+      "Leopards 3\n",
+      "Motorbikes 4\n",
+      "accordion 5\n",
+      "airplanes 6\n",
+      "anchor 7\n",
+      "ant 8\n",
+      "barrel 9\n",
+      "bass 10\n",
+      "beaver 11\n",
+      "binocular 12\n",
+      "bonsai 13\n",
+      "brain 14\n",
+      "brontosaurus 15\n",
+      "buddha 16\n",
+      "butterfly 17\n",
+      "camera 18\n",
+      "cannon 19\n",
+      "car_side 20\n",
+      "ceiling_fan 21\n",
+      "cellphone 22\n",
+      "chair 23\n",
+      "chandelier 24\n",
+      "cougar_body 25\n",
+      "cougar_face 26\n",
+      "crab 27\n",
+      "crayfish 28\n",
+      "crocodile 29\n",
+      "crocodile_head 30\n",
+      "cup 31\n",
+      "dalmatian 32\n",
+      "dollar_bill 33\n",
+      "dolphin 34\n",
+      "dragonfly 35\n",
+      "electric_guitar 36\n",
+      "elephant 37\n",
+      "emu 38\n",
+      "euphonium 39\n",
+      "ewer 40\n",
+      "ferry 41\n",
+      "flamingo 42\n",
+      "flamingo_head 43\n",
+      "garfield 44\n",
+      "gerenuk 45\n",
+      "gramophone 46\n",
+      "grand_piano 47\n",
+      "hawksbill 48\n",
+      "headphone 49\n",
+      "hedgehog 50\n",
+      "helicopter 51\n",
+      "ibis 52\n",
+      "inline_skate 53\n",
+      "joshua_tree 54\n",
+      "kangaroo 55\n",
+      "ketch 56\n",
+      "lamp 57\n",
+      "laptop 58\n",
+      "llama 59\n",
+      "lobster 60\n",
+      "lotus 61\n",
+      "mandolin 62\n",
+      "mayfly 63\n",
+      "menorah 64\n",
+      "metronome 65\n",
+      "minaret 66\n",
+      "nautilus 67\n",
+      "octopus 68\n",
+      "okapi 69\n",
+      "pagoda 70\n",
+      "panda 71\n",
+      "pigeon 72\n",
+      "pizza 73\n",
+      "platypus 74\n",
+      "pyramid 75\n",
+      "revolver 76\n",
+      "rhino 77\n",
+      "rooster 78\n",
+      "saxophone 79\n",
+      "schooner 80\n",
+      "scissors 81\n",
+      "scorpion 82\n",
+      "sea_horse 83\n",
+      "snoopy 84\n",
+      "soccer_ball 85\n",
+      "stapler 86\n",
+      "starfish 87\n",
+      "stegosaurus 88\n",
+      "stop_sign 89\n",
+      "strawberry 90\n",
+      "sunflower 91\n",
+      "tick 92\n",
+      "trilobite 93\n",
+      "umbrella 94\n",
+      "watch 95\n",
+      "water_lilly 96\n",
+      "wheelchair 97\n",
+      "wild_cat 98\n",
+      "windsor_chair 99\n",
+      "wrench 100\n",
+      "yin_yang 101\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres4\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "\"python \"+MXNET_HOME+\"/tools/im2rec.py --list=1 --recursive=1 --shuffle=1 --test-ratio=0.2 data/caltech data/101_ObjectCategories\"!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The resulting list file is in the format index\\t(one or more label)\\tpath. In this case there is only one label for each image but you can modify the list to add in more for multi label training.\n",
+    "Then we can use this list to create our record io file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Creating .rec file from /home/ec2-user/mxnet-notebooks/scala/basic/data/caltech.lst in /home/ec2-user/mxnet-notebooks/scala/basic/data\n",
+      "time: 0.00183486938477  count: 0\n",
+      "time: 0.0827140808105  count: 1000\n",
+      "time: 0.0803511142731  count: 2000\n",
+      "time: 0.0845577716827  count: 3000\n",
+      "time: 0.0807552337646  count: 4000\n",
+      "time: 0.0814788341522  count: 5000\n",
+      "time: 0.0812129974365  count: 6000\n",
+      "time: 0.0810561180115  count: 7000\n",
+      "time: 0.0805099010468  count: 8000\n",
+      "time: 0.0889499187469  count: 9000\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres5\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "\"python \"+MXNET_HOME+\"/tools/im2rec.py --num-thread=4 --pass-through=1 data/caltech data/101_ObjectCategories\"!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The record io files are now generated in data folder."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ImageRecordIter\n",
+    "IO.ImageRecordIter can be used for loading image data saved in record io format. It is available in all frontend languages, but as it's implemented in C++, it is less flexible.\n",
+    "\n",
+    "To use ImageRecordIter, simply create an instance by loading your record file:\n",
+    "\n",
+    "**Parameters**\n",
+    "- **path_imglist** (string, optional, default='') – Dataset Param: Path to image list.\n",
+    "- **path_imgrec** (string, optional, default='./data/imgrec.rec') – Dataset Param: Path to image record file.\n",
+    "- **aug_seq** (string, optional, default='aug_default') – Augmentation Param: the augmenter names to represent sequence of augmenters to be applied, seperated by comma. Additional keyword parameters will be seen by these augmenters.\n",
+    "- **label_width** (int, optional, default='1') – Dataset Param: How many labels for an image.\n",
+    "- **data_shape** (Shape(tuple), required) – Dataset Param: Shape of each instance generated by the DataIter.\n",
+    "- **preprocess_threads** (int, optional, default='4') – Backend Param: Number of thread to do preprocessing.\n",
+    "- **verbose** (boolean, optional, default=True) – Auxiliary Param: Whether to output parser information.\n",
+    "- **num_parts** (int, optional, default='1') – partition the data into multiple parts\n",
+    "- **part_index** (int, optional, default='0') – the index of the part will read\n",
+    "- **shuffle_chunk_size** (long (non-negative), optional, default=0) – the size(MB) of the shuffle chunk, used with shuffle=True, it can enable global shuffling\n",
+    "- **shuffle_chunk_seed** (int, optional, default='0') – the seed for chunk shuffling\n",
+    "- **shuffle** (boolean, optional, default=False) – Augmentation Param: Whether to shuffle data.\n",
+    "- **seed** (int, optional, default='0') – Augmentation Param: Random Seed.\n",
+    "- **batch_size** (int (non-negative), required) – Batch Param: Batch size.\n",
+    "- **round_batch** (boolean, optional, default=True) – Batch Param: Use round robin to handle overflow batch.\n",
+    "- **prefetch_buffer** (long (non-negative), optional, default=4) – Backend Param: Number of prefetched parameters\n",
+    "- **dtype** ({None, 'float16', 'float32', 'float64', 'int32', 'uint8'},optional, default='None') – Output data type. Leave as None to useinternal data iterator’s output type\n",
+    "- **resize** (int, optional, default='-1') – Augmentation Param: scale shorter edge to size before applying other augmentations.\n",
+    "- **rand_crop** (boolean, optional, default=False) – Augmentation Param: Whether to random crop on the image\n",
+    "- **crop_y_start** (int, optional, default='-1') – Augmentation Param: Where to nonrandom crop on y.\n",
+    "- **crop_x_start** (int, optional, default='-1') – Augmentation Param: Where to nonrandom crop on x.\n",
+    "- **max_rotate_angle** (int, optional, default='0') – Augmentation Param: rotated randomly in [-max_rotate_angle, max_rotate_angle].\n",
+    "- **max_aspect_ratio** (float, optional, default=0) – Augmentation Param: denotes the max ratio of random aspect ratio augmentation.\n",
+    "- **max_shear_ratio** (float, optional, default=0) – Augmentation Param: denotes the max random shearing ratio.\n",
+    "- **max_crop_size** (int, optional, default='-1') – Augmentation Param: Maximum crop size.\n",
+    "- **min_crop_size** (int, optional, default='-1') – Augmentation Param: Minimum crop size.\n",
+    "- **max_random_scale** (float, optional, default=1) – Augmentation Param: Maximum scale ratio.\n",
+    "- **min_random_scale** (float, optional, default=1) – Augmentation Param: Minimum scale ratio.\n",
+    "- **max_img_size** (float, optional, default=1e+10) – Augmentation Param: Maximum image size after resizing.\n",
+    "- **min_img_size** (float, optional, default=0) – Augmentation Param: Minimum image size after resizing.\n",
+    "- **random_h** (int, optional, default='0') – Augmentation Param: Maximum random value of H channel in HSL color space.\n",
+    "- **random_s** (int, optional, default='0') – Augmentation Param: Maximum random value of S channel in HSL color space.\n",
+    "- **random_l** (int, optional, default='0') – Augmentation Param: Maximum random value of L channel in HSL color space.\n",
+    "- **rotate** (int, optional, default='-1') – Augmentation Param: Rotate angle.\n",
+    "- **fill_value** (int, optional, default='255') – Augmentation Param: Filled color value while padding.\n",
+    "- **inter_method** (int, optional, default='1') – Augmentation Param: 0-NN 1-bilinear 2-cubic 3-area 4-lanczos4 9-auto 10-rand.\n",
+    "- **pad** (int, optional, default='0') – Augmentation Param: Padding size.\n",
+    "- **mirror** (boolean, optional, default=False) – Augmentation Param: Whether to mirror the image.\n",
+    "- **rand_mirror** (boolean, optional, default=False) – Augmentation Param: Whether to mirror the image randomly.\n",
+    "- **mean_img** (string, optional, default='') – Augmentation Param: Mean Image to be subtracted.\n",
+    "- **mean_r** (float, optional, default=0) – Augmentation Param: Mean value on R channel.\n",
+    "- **mean_g** (float, optional, default=0) – Augmentation Param: Mean value on G channel.\n",
+    "- **mean_b** (float, optional, default=0) – Augmentation Param: Mean value on B channel.\n",
+    "- **mean_a** (float, optional, default=0) – Augmentation Param: Mean value on Alpha channel.\n",
+    "- **scale** (float, optional, default=1) – Augmentation Param: Scale in color space.\n",
+    "- **max_random_contrast** (float, optional, default=0) – Augmentation Param: Maximum ratio of contrast variation.\n",
+    "- **max_random_illumination** (float, optional, default=0) – Augmentation Param: Maximum value of illumination variation.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[36mdataIter\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mDataIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mbatch\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mDataBatch\u001b[0m = ml.dmlc.mxnet.DataBatch@1dee5185\n",
+       "\u001b[36mdata\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@496baa2a"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "\n",
+    "val dataIter = IO.ImageRecordIter(Map(\n",
+    "    \"path_imgrec\" -> \"data/caltech.rec\", // the target record file\n",
+    "    \"data_shape\" -> \"(3, 227, 227)\", // output data shape. An 227x227 region will be cropped from the original image.\n",
+    "    \"batch_size\" -> \"4\", // number of samples per batch\n",
+    "    \"resize\" -> \"256\" // resize the shorter edge to 256 before cropping\n",
+    "    // ... you can add more augumentation options here. check above to see all possible choices\n",
+    "    ))\n",
+    "\n",
+    "dataIter.reset()\n",
+    "val batch = dataIter.next()\n",
+    "val data = batch.data(0)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next Step\n",
+    "- [Record IO](https://github.com/dmlc/mxnet-notebooks/tree/master/scala/basic/record_io_scala.ipynb) Read & Write RecordIO files with scala interface\n",
+    "- [Advanced Image IO](https://github.com/dmlc/mxnet-notebooks/tree/master/scala/basic/advanced_img_io_scala.ipynb) Advanced image IO for detection, segmentation, etc..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/basic/module_scala.ipynb b/scala/basic/module_scala.ipynb
new file mode 100644
index 000000000..4dce6d345
--- /dev/null
+++ b/scala/basic/module_scala.ipynb
@@ -0,0 +1,785 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Training and Inference Module\n",
+    "We modularized commonly used codes for training and inference in the module (or mod for short) package. This package provides intermediate-level and high-level interface for executing predefined networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Basic Usage\n",
+    "### Preliminary\n",
+    "In this tutorial, we will use a simple multilayer perception for 10 classes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.module.{FitParams, Module}\u001b[0m\n",
+       "\u001b[36mdata\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@153695b4\n",
+       "\u001b[36mfc1\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@782359b1\n",
+       "\u001b[36mact1\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@69728e46\n",
+       "\u001b[36mfc2\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@6d9120e2\n",
+       "\u001b[36msoftmax\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@659455de"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import ml.dmlc.mxnet.module.{FitParams, Module}\n",
+    "\n",
+    "val data = Symbol.Variable(\"data\")\n",
+    "val fc1 = Symbol.FullyConnected(name = \"fc1\")()(Map(\"data\" -> data, \"num_hidden\" -> 64))\n",
+    "val act1 = Symbol.Activation(name = \"relu1\")()(Map(\"data\" -> fc1, \"act_type\" -> \"relu\"))\n",
+    "val fc2 = Symbol.FullyConnected(name = \"fc2\")()(Map(\"data\" -> act1, \"num_hidden\" -> 10))\n",
+    "val softmax = Symbol.SoftmaxOutput(name = \"softmax\")()(Map(\"data\" -> fc2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "### Create Module\n",
+    "The most widely used module class is Module, which wraps a Symbol and one or more Executors.\n",
+    "\n",
+    "We construct a module by specify\n",
+    "\n",
+    "- symbol : the network Symbol\n",
+    "- context : the device (or a list of devices) for execution\n",
+    "- data_names : the list of data variable names\n",
+    "- label_names : the list of label variable names\n",
+    "\n",
+    "One can refer to data.ipynb for more explanations about the last two arguments. Here we have only one data named data, and one label, with the name softmax_label, which is automatically named for us following the name softmax we specified for the SoftmaxOutput operator."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.optimizer.SGD\u001b[0m\n",
+       "\u001b[36mmod\u001b[0m: \u001b[32mModule\u001b[0m = ml.dmlc.mxnet.module.Module@3110b89"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet.optimizer.SGD\n",
+    "\n",
+    "val mod = new Module(softmax, contexts=Context.cpu(), dataNames=Array(\"data\"), labelNames=Array(\"softmax_label\"))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create a DataIterator. Using Mnist data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mbatchSize\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m2\u001b[0m\n",
+       "\u001b[36mtrainIter\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mevalIter\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val batchSize=2\n",
+    "\n",
+    "val trainIter = IO.MNISTIter(Map(\n",
+    "        \"image\" -> (\"data/train-images-idx3-ubyte\"),\n",
+    "        \"label\" -> (\"data/train-labels-idx1-ubyte\"),\n",
+    "        \"label_name\" -> \"softmax_label\",\n",
+    "        \"input_shape\" -> \"(784,)\",\n",
+    "        \"batch_size\" -> batchSize.toString,\n",
+    "        \"shuffle\" -> \"True\",\n",
+    "        \"flat\" -> \"True\", \"silent\" -> \"False\", \"seed\" -> \"10\"))\n",
+    "val evalIter = IO.MNISTIter(Map(\n",
+    "        \"image\" -> (\"data/t10k-images-idx3-ubyte\"),\n",
+    "        \"label\" -> (\"data/t10k-labels-idx1-ubyte\"),\n",
+    "        \"label_name\" -> \"softmax_label\",\n",
+    "        \"input_shape\" -> \"(784,)\",\n",
+    "        \"batch_size\" -> batchSize.toString,\n",
+    "        \"flat\" -> \"True\", \"silent\" -> \"False\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Train, Predict, and Evaluate\n",
+    "Modules provide high-level APIs for training, predicting and evaluating. To fit a module, simply call the fit function with some DataIters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "mod.fit(trainIter, \n",
+    "        evalData=scala.Option(evalIter),\n",
+    "        fitParams = new FitParams().setOptimizer(new SGD(learningRate = 0.1f, momentum = 0.9f, wd = 0.0001f)),\n",
+    "        numEpoch=5)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To predict with a module, simply call predict() with a DataIter. It will collect and return all the prediction results.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36my\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\n",
+       "  ml.dmlc.mxnet.NDArray@e4216872,\n",
+       "  ml.dmlc.mxnet.NDArray@b6c053f0,\n",
+       "  ml.dmlc.mxnet.NDArray@f3e68a11,\n",
+       "  ml.dmlc.mxnet.NDArray@b04c5734,\n",
+       "  ml.dmlc.mxnet.NDArray@e4635691,\n",
+       "  ml.dmlc.mxnet.NDArray@38f232b,\n",
+       "  ml.dmlc.mxnet.NDArray@f77fe955,\n",
+       "  ml.dmlc.mxnet.NDArray@eef4e2b3,\n",
+       "  ml.dmlc.mxnet.NDArray@e42c116c,\n",
+       "  ml.dmlc.mxnet.NDArray@acf7250f,\n",
+       "  ml.dmlc.mxnet.NDArray@e538781,\n",
+       "  ml.dmlc.mxnet.NDArray@199fdaef,\n",
+       "  ml.dmlc.mxnet.NDArray@cb3d9293,\n",
+       "  ml.dmlc.mxnet.NDArray@eaf6f77c,\n",
+       "  ml.dmlc.mxnet.NDArray@ca9ff00,\n",
+       "  ml.dmlc.mxnet.NDArray@c406c5ef,\n",
+       "  ml.dmlc.mxnet.NDArray@b670b4d8,\n",
+       "  ml.dmlc.mxnet.NDArray@d9194c50,\n",
+       "  ml.dmlc.mxnet.NDArray@2dd299,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mres7_1\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m5000\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val y = mod.predict(evalIter)\n",
+    "y.size"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Another convenient API for prediction in the case where the prediction results might be too large to fit in the memory is `predictEveryBatch`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36morg.slf4j.LoggerFactory\u001b[0m\n",
+       "\u001b[36mpreds\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m]] = \u001b[33mArrayBuffer\u001b[0m(\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@bd4d9298),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@ca6e5090),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@3168f3d),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@ca102191),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@cdc7d946),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@c1c349f4),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@c8a05432),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@de269f8b),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@374bbd4),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@ae8dfe32),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@c0b6a5dd),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@e5b6dacb),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@1f207ca7),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@f46b7260),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@cb95d452),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@a95649d4),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@ede813e7),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@b1c8980f),\n",
+       "  \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@fcfea88b),\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import org.slf4j.LoggerFactory\n",
+    "\n",
+    "private val logger = LoggerFactory.getLogger(\"mnist\")   \n",
+    "val preds = mod.predictEveryBatch(evalIter)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "batch 0 accuracy 0.1135\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36maccSum\u001b[0m: \u001b[32mFloat\u001b[0m = \u001b[32m1.0F\u001b[0m\n",
+       "\u001b[36maccCnt\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m2\u001b[0m\n",
+       "\u001b[36mi\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m1\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// perform prediction and calculate accuracy manually\n",
+    "    evalIter.reset()\n",
+    "    var accSum = 0.0f\n",
+    "    var accCnt = 0\n",
+    "    var i = 0\n",
+    "    while (evalIter.hasNext) {\n",
+    "              //println(\"hi\")\n",
+    "\n",
+    "      val batch = evalIter.next()\n",
+    "      val predLabel: Array[Int] = NDArray.argmax_channel(preds(i)(0)).toArray.map(_.toInt)\n",
+    "      val label = batch.label(0).toArray.map(_.toInt)\n",
+    "      accSum += (predLabel zip label).map { case (py, y) =>\n",
+    "        if (py == y) 1 else 0\n",
+    "      }.sum\n",
+    "      accCnt += predLabel.length\n",
+    "      val (name, value) = mod.score(evalIter, new Accuracy).get\n",
+    "      println(\"batch \" + i + \" accuracy \" + value)\n",
+    "      i += 1\n",
+    "    }"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36macc\u001b[0m: \u001b[32mEvalMetric\u001b[0m = ml.dmlc.mxnet.Accuracy@7055caf9\n",
+       "\u001b[36mn\u001b[0m: \u001b[32mString\u001b[0m = \u001b[32m\"accuracy\"\u001b[0m\n",
+       "\u001b[36mv\u001b[0m: \u001b[32mFloat\u001b[0m = \u001b[32m0.1135F\u001b[0m\n",
+       "\u001b[36mres13_2\u001b[0m: (\u001b[32mString\u001b[0m, \u001b[32mFloat\u001b[0m) = \u001b[33m\u001b[0m(\u001b[32m\"accuracy\"\u001b[0m, \u001b[32m0.1135F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val acc = mod.score(evalIter, new Accuracy)\n",
+    "val (n,v) = acc.get\n",
+    "(n,v)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Save and Load\n",
+    "We can save the module parameters in each training epoch by calling `setEpochEndCallback` method for `FitParams` object."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mmodelPrefix\u001b[0m: \u001b[32mString\u001b[0m = \u001b[32m\"mx mlp\"\u001b[0m\n",
+       "\u001b[36mmetric\u001b[0m: \u001b[32mAccuracy\u001b[0m = ml.dmlc.mxnet.Accuracy@1f90e6f6"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// construct a callback function to save checkpoints\n",
+    "val modelPrefix: String = \"mx mlp\"\n",
+    "//val mod = new Module(softmax)\n",
+    "val metric = new Accuracy()\n",
+    "\n",
+    "//val epoch: Int = 1\n",
+    "for (epoch <- 0 until 5) {\n",
+    " //   val checkpoint = mod.saveCheckpoint(modelPrefix, epoch, saveOptStates = true)\n",
+    "    while (trainIter.hasNext) {\n",
+    "        val batch = trainIter.next()\n",
+    "        mod.forward(batch)\n",
+    "        mod.updateMetric(metric, batch.label)\n",
+    "        mod.backward()\n",
+    "        mod.update()\n",
+    "      }\n",
+    "// saveOptStates = true means save optimizer states\n",
+    "      val checkpoint = mod.saveCheckpoint(modelPrefix, epoch, saveOptStates = true)\n",
+    "\n",
+    "      val (name, value) = metric.get\n",
+    "      metric.reset()\n",
+    "      trainIter.reset()\n",
+    "}\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "To load the saved module parameters, call the `loadCheckpoint` function. You can specify cpu/gpu you want to use in Context and also workLoadList which helps in distributing work load on different cpus/gpus. \n",
+    "\n",
+    "`loadCheckpoint` function creates a module from previously saved checkpoint."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mloadModelEpoch\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m2\u001b[0m\n",
+       "\u001b[36mmod\u001b[0m: \u001b[32mModule\u001b[0m = ml.dmlc.mxnet.module.Module@7aaade12"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// Epoch to load\n",
+    "val loadModelEpoch = 2\n",
+    "// loadOptimizerStates = true only when checkpoint was saved with saveOptStates=True\n",
+    "val mod = Module.loadCheckpoint(modelPrefix, loadModelEpoch, loadOptimizerStates = true)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To initialize parameters, Bind the symbols to construct executors first with `bind` method. Then, initialize the parameters and auxiliary states by calling `initParams()` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "mod.bind(dataShapes = trainIter.provideData, labelShapes = Some(trainIter.provideLabel))\n",
+    "mod.initParams()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Get current parameters using `getParams` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36margParams\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"fc1_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@96840c2c,\n",
+       "  \u001b[32m\"fc2_bias\"\u001b[0m -> ml.dmlc.mxnet.NDArray@367f5b1c,\n",
+       "  \u001b[32m\"fc2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@34041eee,\n",
+       "  \u001b[32m\"fc1_bias\"\u001b[0m -> ml.dmlc.mxnet.NDArray@ea1ecde3\n",
+       ")\n",
+       "\u001b[36mauxParams\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m()"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val (argParams, auxParams) = mod.getParams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, assign parameter and aux state values using `setParams` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "mod.setParams(argParams, auxParams)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If we just want to resume training from a saved checkpoint, instead of calling setParams(), we can directly call fit(), passing the loaded parameters, so that fit() knows to start from those parameters instead of initializing from random. We also set the beginEpoch so that so that fit() knows we are resuming from a previous saved epoch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mbeginEpoch\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m4\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val beginEpoch = 4\n",
+    "mod.fit(trainIter, \n",
+    "        evalData=scala.Option(evalIter),\n",
+    "        fitParams=new FitParams().setArgParams(argParams).\n",
+    "        setAuxParams(auxParams).setBeginEpoch(beginEpoch).\n",
+    "        setOptimizer(new SGD(learningRate = 0.1f, momentum = 0.9f, wd = 0.0001f)),\n",
+    "        numEpoch=5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Module as a computation \"machine\"\n",
+    "We already seen how to module for basic training and inference. Now we are going to show a more flexiable usage of module.\n",
+    "\n",
+    "A module represents a computation component. The design purpose of a module is that it abstract a computation “machine”, that accpets Symbol programs and data, and then we can run forward, backward, update parameters, etc.\n",
+    "\n",
+    "We aim to make the APIs easy and flexible to use, especially in the case when we need to use imperative API to work with multiple modules (e.g. stochastic depth network).\n",
+    "\n",
+    "A module has several states:\n",
+    "\n",
+    "- **Initial state**. Memory is not allocated yet, not ready for computation yet.\n",
+    "- **Binded**. Shapes for inputs, outputs, and parameters are all known, memory allocated, ready for computation.\n",
+    "- **Parameter initialized**. For modules with parameters, doing computation before initializing the parameters might result in undefined outputs.\n",
+    "- **Optimizer installed**. An optimizer can be installed to a module. After this, the parameters of the module can be updated according to the optimizer after gradients are computed (forward-backward).\n",
+    "\n",
+    "The following codes implement a simplified fit(). Here we used other components including initializer, optimizer, and metric, which are explained in other notebooks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mmod\u001b[0m: \u001b[32mModule\u001b[0m = ml.dmlc.mxnet.module.Module@27cbfc5\n",
+       "\u001b[36mmetric\u001b[0m: \u001b[32mAccuracy\u001b[0m = ml.dmlc.mxnet.Accuracy@273f6196\n",
+       "\u001b[36mname\u001b[0m: \u001b[32mString\u001b[0m = \u001b[32m\"accuracy\"\u001b[0m\n",
+       "\u001b[36mvalue\u001b[0m: \u001b[32mFloat\u001b[0m = \u001b[32m0.10195F\u001b[0m\n",
+       "\u001b[36mres26_7\u001b[0m: (\u001b[32mString\u001b[0m, \u001b[32mFloat\u001b[0m) = \u001b[33m\u001b[0m(\u001b[32m\"accuracy\"\u001b[0m, \u001b[32m0.10195F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// initial state\n",
+    "val mod = new Module(softmax)\n",
+    "\n",
+    "// bind, tell the module the data and label shapes, so\n",
+    "// that memory could be allocated on the devices for computation\n",
+    "mod.bind(dataShapes=trainIter.provideData, labelShapes=Some(trainIter.provideLabel))\n",
+    "\n",
+    "// init parameters\n",
+    "mod.initParams(initializer=new Xavier(magnitude = 2f))\n",
+    "\n",
+    "// init optimizer\n",
+    "mod.initOptimizer(\"local\", new SGD(learningRate = 0.1f, momentum = 0.9f, wd = 0.0001f))\n",
+    "\n",
+    "// use accuracy as the metric\n",
+    "val metric = new Accuracy\n",
+    "\n",
+    "// train one epoch, i.e. going over the data iter one pass\n",
+    "while (trainIter.hasNext) {\n",
+    "    val batch = trainIter.next()\n",
+    "    mod.forward(batch)                     // compute predictions\n",
+    "    mod.updateMetric(metric, batch.label)  // accumulate prediction accuracy\n",
+    "    mod.backward()                         // compute gradients\n",
+    "    mod.update()                           // update parameters using SGD\n",
+    "}\n",
+    "\n",
+    "// training accuracy\n",
+    "val (name, value) = metric.get\n",
+    "(name, value)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Beside the operations, a module provides a lot of useful information.\n",
+    "\n",
+    "basic names:\n",
+    "- **dataNames**: list of string indicating the names of the required data.\n",
+    "- **outputNames**: list of string indicating the names of the outputs.\n",
+    "\n",
+    "state information\n",
+    "- **binded**: bool, indicating whether the memory buffers needed for computation has been allocated.\n",
+    "- **forTraining**: whether the module is binded for training (if binded).\n",
+    "- **paramsInitialized**: bool, indicating whether the parameters of this modules has been initialized.\n",
+    "- **optimizerInitialized**: bool, indicating whether an optimizer is defined and initialized.\n",
+    "- **inputsNeedGrad**: bool, indicating whether gradients with respect to the input data is needed. Might be useful when implementing composition of modules.\n",
+    "\n",
+    "input/output information\n",
+    "- **dataShapes**: a list of (name, shape). In theory, since the memory is allocated, we could directly provide the data arrays. But in the case of data parallelization, the data arrays might not be of the same shape as viewed from the external world.\n",
+    "- **labelShapes**: a list of (name, shape). This might be [] if the module does not need labels (e.g. it does not contains a loss function at the top), or a module is not binded for training.\n",
+    "- **outputShapes**: a list of (name, shape) for outputs of the module.\n",
+    "\n",
+    "parameters (for modules with parameters)\n",
+    "- **getParams()**: return a tuple (argParams, auxParams). Each of those is a dictionary of name to NDArray mapping. Those NDArray always lives on CPU. The actual parameters used for computing might live on other devices (GPUs), this function will retrieve (a copy of) the latest parameters.\n",
+    "- **getOutputs()**: get outputs of the previous forward operation.\n",
+    "- **getInputGrads()**: get the gradients with respect to the inputs computed in the previous backward operation.\n",
+    "\n",
+    "setup\n",
+    "- **bind()**: prepare environment for computation.\n",
+    "- **initOptimizer()**: install optimizer for parameter updating.\n",
+    "\n",
+    "computation\n",
+    "- **forward(dataBatch)**: forward operation.\n",
+    "- **backward(outGrads=None)**: backward operation.\n",
+    "- **update()**: update parameters according to installed optimizer.\n",
+    "- **getOutputs()**: get outputs of the previous forward operation.\n",
+    "- **getInputGrads()**: get the gradients with respect to the inputs computed in the previous backward operation.\n",
+    "- **updateMetric(metric, labels)**: update performance metric for the previous forward computed results.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres27_0\u001b[0m: (\u001b[32mIndexedSeq\u001b[0m[\u001b[32mDataDesc\u001b[0m], \u001b[32mIndexedSeq\u001b[0m[\u001b[32mDataDesc\u001b[0m], \u001b[32mIndexedSeq\u001b[0m[(\u001b[32mString\u001b[0m, \u001b[32mShape\u001b[0m)]) = \u001b[33m\u001b[0m(\n",
+       "  \u001b[33mVector\u001b[0m(DataDesc[data,(2,784),Float32,NCHW]),\n",
+       "  \u001b[33mVector\u001b[0m(DataDesc[softmax_label,(2),Float32,NCHW]),\n",
+       "  \u001b[33mArrayBuffer\u001b[0m(\u001b[33m\u001b[0m(\u001b[32m\"softmax_output\"\u001b[0m, (2,10)))\n",
+       ")\n",
+       "\u001b[36mres27_1\u001b[0m: (\u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m], \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m]) = \u001b[33m\u001b[0m(\n",
+       "  \u001b[33mMap\u001b[0m(\n",
+       "    \u001b[32m\"fc1_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@a073e593,\n",
+       "    \u001b[32m\"fc1_bias\"\u001b[0m -> ml.dmlc.mxnet.NDArray@15f069eb,\n",
+       "    \u001b[32m\"fc2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@fb482814,\n",
+       "    \u001b[32m\"fc2_bias\"\u001b[0m -> ml.dmlc.mxnet.NDArray@56ac2896\n",
+       "  ),\n",
+       "  \u001b[33mMap\u001b[0m()\n",
+       ")"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "(mod.dataShapes, mod.labelShapes, mod.outputShapes)\n",
+    "mod.getParams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## More on Modules\n",
+    "Module simplifies the implementation of new modules. For example\n",
+    "- [SequentialModule](https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/module/SequentialModule.scala) can chain multiple modules together\n",
+    "\n",
+    "See also [examples](https://github.com/dmlc/mxnet/tree/master/scala-package/examples/src/main/scala/ml/dmlc/mxnet/examples/module) for a list of code examples using the module API.\n",
+    "\n",
+    "## Implementation\n",
+    "The module is implemented in scala, located at [scala/mxnet/module](https://github.com/dmlc/mxnet/tree/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/module)\n",
+    "\n",
+    "## Futher Readings\n",
+    "[module API](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.module.Module)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/basic/ndarray_scala.ipynb b/scala/basic/ndarray_scala.ipynb
new file mode 100644
index 000000000..da4d84e8f
--- /dev/null
+++ b/scala/basic/ndarray_scala.ipynb
@@ -0,0 +1,1084 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# NDArray Tutorial\n",
+    "One of the main object in MXNet is the multidimensional array provided by the package mxnet.NDArray, or mxnet.nd for short. If you familiar with the scientific computing python package NumPy, mxnet.NDArray is similar to numpy.ndarray in many aspects.\n",
+    "\n",
+    "## The basic\n",
+    "A multidimensional array is a table of numbers with the same type. For example, the coordinates of a point in 3D space [1, 2, 3] is a 1-dimensional array with that dimension has a length of 3. The following picture shows a 2-dimensional array. The length of the first dimension is 2, and the second dimension has a length of 3\n",
+    "[[0, 1, 2]\n",
+    " [3, 4, 5]]\n",
+    "The array class is called NDArray. Some important attributes of a NDArray object are:\n",
+    "- NDArray.shape the dimensions of the array. It is a tuple of integers indicating the length of the array in each dimension. For a matrix with n rows and m columns, the shape will be (n, m).\n",
+    "- NDArray.dtype an numpy object describing the type of the elements.\n",
+    "- NDArray.size the total number of numbers in the array, which equals to the product of the elements of shape\n",
+    "- NDArray.context the device this array is stored. A device can be the CPU or the i-th GPU."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Array Creation\n",
+    "An array can be created in multiple ways. For example, we can create an array from a regular Scala Array by using the array function"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@52989830\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@a8674600\n",
+       "\u001b[36mres32_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m3.0F\u001b[0m)\n",
+       "\u001b[36mres32_4\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m2.0F\u001b[0m, \u001b[32m3.0F\u001b[0m, \u001b[32m4.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "// create a 1-dimensional array with a scala array\n",
+    "val a = NDArray.array(Array(1, 2, 3), shape = Shape(1, 3))\n",
+    "// create a 2-dimensional array with a nested scala array \n",
+    "val b = NDArray.array(Array(1, 2, 3, 2, 3, 4), shape = Shape(2, 3))\n",
+    "\n",
+    "b.at(0).toArray   \n",
+    "b.at(1).toArray   "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can specify the element type with the option dtype while using `NDArray.zeros` and `NDArray.ones` method, which accepts a numpy type. In default, Float32 is used.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@7794b9e0\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@4a2c917f\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@aa611987\n",
+       "\u001b[36mres2_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m)\n",
+       "\u001b[36mres2_4\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m)\n",
+       "\u001b[36mres2_5\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// create an int32 array\n",
+    "// val a = NDArray.array(Array(1, 2, 3, 2, 3, 4), shape = Shape(2, 3), dtype = DType.Int32)\n",
+    "// create a 16-bit float array\n",
+    "val a = NDArray.ones(Shape(1, 2), dtype = DType.Float64) \n",
+    "val b = NDArray.ones(Shape(1, 2), dtype = DType.UInt8)\n",
+    "val c = NDArray.ones(Shape(2, 3), dtype = DType.Int32)\n",
+    "a.toArray\n",
+    "b.toArray\n",
+    "c.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If we only know the size but not the element values, there are several functions to create arrays with initial placeholder content."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@fc1b387d\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@9efb9cc0\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@990cec23\n",
+       "\u001b[36md\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@b3e7cd40\n",
+       "\u001b[36me\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@f081ae28"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// create a 2-dimensional array full of zeros with shape (2,3) \n",
+    "val a = NDArray.zeros(2,3)\n",
+    "// create a same shape array full of ones\n",
+    "val b = NDArray.ones(shape = Shape(2,3))\n",
+    "// create a same shape array with all elements set to 7\n",
+    "val c = NDArray.full(shape = Shape(2,3), 7)\n",
+    "// create a same shape whose initial content is random and \n",
+    "// depends on the state of the memory\n",
+    "val d = NDArray.empty(2,3)\n",
+    "// create a same shape and specify context you want cpu or gpu\n",
+    "val e = NDArray.empty(ctx = Context.cpu(0), shape = Shape(2,3))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Printing Arrays\n",
+    "We often use `toArray` method to print and flatten the array. We can also use `at` method to see contents of sub NDArray by using index of the array."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@9cd326fd\n",
+       "\u001b[36mres37_1\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m)\n",
+       "\u001b[36mres37_2\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m)\n",
+       "\u001b[36mres37_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val b = NDArray.ones(2,3)\n",
+    "b.toArray\n",
+    "b.at(0).toArray\n",
+    "b.at(1).toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Basic Operations\n",
+    "Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@d48c575b\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@88f7743d\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@e49f7c2b\n",
+       "\u001b[36md\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@db65845d\n",
+       "\u001b[36me\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@37da60b8\n",
+       "\u001b[36mf\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@c1ec10a5\n",
+       "\u001b[36mres26_6\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m3.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m3.0F\u001b[0m, \u001b[32m4.0F\u001b[0m, \u001b[32m3.0F\u001b[0m)\n",
+       "\u001b[36mg\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@f10318b1\n",
+       "\u001b[36mres26_8\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(2,3)\n",
+    "val b = NDArray.ones(2,3)\n",
+    "// elementwise plus\n",
+    "val c = a + b\n",
+    "// elementwise minus\n",
+    "val d = - c \n",
+    "// elementwise pow and sin\n",
+    "val e = NDArray.sin(NDArray.power(c,2))\n",
+    "// transpose \n",
+    "val f = NDArray.array(Array(1f, 2f, 4f, 3f, 3f, 3f), shape = Shape(2, 3))\n",
+    "f.T.toArray\n",
+    "// elementwise max\n",
+    "val g = NDArray.maximum(a, c)  \n",
+    "g.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Matrix-matrix multiplication is done by dot operator"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@7e886011\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@f9e97357\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@679b15e7\n",
+       "\u001b[36mres28_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m11.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.array(Array(1f, 2f), shape = Shape(1, 2))\n",
+    "val b = NDArray.array(Array(3f, 4f), shape = Shape(2, 1))\n",
+    "val c = NDArray.dot(arr1, arr2)\n",
+    "res.toArray"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@e3ad4e21\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@2a80fe82\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@30ecb48e\n",
+       "\u001b[36mres30_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(2,2)\n",
+    "val b = a * a\n",
+    "val c = NDArray.dot(a,a)\n",
+    "c.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The assignment operators such as += and *= act in place to modify an existing array rather than create a new one.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@de29a21a\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@cb35e13\n",
+       "\u001b[36mres31_2\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@168b3109\n",
+       "\u001b[36mres31_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(2,2)\n",
+    "val b = NDArray.ones(a.shape)\n",
+    "b += a\n",
+    "b.toArray"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@38dd4adb\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@70fe6b83\n",
+       "\u001b[36mres32_2\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@4fb25f28\n",
+       "\u001b[36mres32_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m4.0F\u001b[0m, \u001b[32m4.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(2, 1)\n",
+    "val b = a * 2\n",
+    "b *= b \n",
+    "b.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Indexing and Slicing\n",
+    "The slice operator [ ] applies on axis 0"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 58,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@cf9adb44\n",
+       "\u001b[36mres57_1\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m0.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m3.0F\u001b[0m, \u001b[32m4.0F\u001b[0m, \u001b[32m5.0F\u001b[0m)\n",
+       "\u001b[36mres57_2\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@3dff3ed6\n",
+       "\u001b[36mres57_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m0.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m4.0F\u001b[0m, \u001b[32m5.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.array(Array(0,1,2,3,4,5), shape= Shape(3,2))\n",
+    "a.toArray\n",
+    "a.slice(1).set(1f)\n",
+    "//a.slice(2).set(1f)\n",
+    "a.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also slice a particular axis with the method slice_axis. It takes parameters array, axis, begin, and end."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36md\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@31891413\n",
+       "\u001b[36mres58_1\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m5.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val d = NDArray.slice_axis(a, 1, 1, 2)\n",
+    "d.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Shape Manipulation\n",
+    "The shape of the array can be changed as long as the size remaining the same"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@390ff8f6\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@fd662ff8\n",
+       "\u001b[36mres36_2\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m0.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m3.0F\u001b[0m, \u001b[32m4.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m6.0F\u001b[0m, \u001b[32m7.0F\u001b[0m, \u001b[32m8.0F\u001b[0m, \u001b[32m9.0F\u001b[0m, \u001b[32m10.0F\u001b[0m, \u001b[32m11.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.array( Array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23), shape = Shape(3,2,4))\n",
+    "val b = a.reshape(Array(2,3,4))\n",
+    "b.at(0).toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Method concatenate stacks multiple arrays along the first dimension. (Their shapes must be the same).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 65,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@e0d7ddce\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@c54191e4\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@d06a2142\n",
+       "\u001b[36mres64_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(2,3)\n",
+    "val b = NDArray.ones(2,3)*2\n",
+    "val c = NDArray.concatenate(a,b)\n",
+    "c.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Reduce\n",
+    "We can reduce the array to a scalar"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@fc5e0047\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@2bd6715f\n",
+       "\u001b[36mres68_2\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m6.0F\u001b[0m)\n",
+       "\u001b[36mres68_3\u001b[0m: \u001b[32mFloat\u001b[0m = \u001b[32m6.0F\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(2,3)\n",
+    "val b = NDArray.sum(a)\n",
+    "b.toArray\n",
+    "NDArray.sum(a).toScalar "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "or along a particular axis"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 70,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@40e6052e\n",
+       "\u001b[36mres69_1\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m3.0F\u001b[0m, \u001b[32m3.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val c = NDArray.sum_axis(a, 1)\n",
+    "c.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Broadcast\n",
+    "We can also broadcast an array by duplicating. The following codes broadcast along axis 1"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 84,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@c504569d\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@332ded5e\n",
+       "\u001b[36mres83_2\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m0.0F\u001b[0m, \u001b[32m0.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m1.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m3.0F\u001b[0m, \u001b[32m3.0F\u001b[0m, \u001b[32m4.0F\u001b[0m, \u001b[32m4.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.array(Array(0,1,2,3,4,5), shape = Shape(6,1))\n",
+    "val b = NDArray.broadcast_to(a, (6,2))   \n",
+    "b.toArray\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "or broadcast along axes 1 and 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 86,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@b2357fef\n",
+       "\u001b[36md\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@72316e87\n",
+       "\u001b[36mres85_2\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m0.0F\u001b[0m,\n",
+       "  \u001b[32m1.0F\u001b[0m,\n",
+       "  \u001b[32m2.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m5.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "  \u001b[32m4.0F\u001b[0m,\n",
+       "  \u001b[32m5.0F\u001b[0m,\n",
+       "  \u001b[32m3.0F\u001b[0m,\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val c = a.reshape(Shape(2,1,1,3))\n",
+    "val d = NDArray.broadcast_to(c, (2,2,2,3))\n",
+    "d.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Broadcast can be applied to operations such as * and +."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 106,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@9b321c7\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@7552b22d\n",
+       "\u001b[36md\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@55f16bad\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@32a2dd09\n",
+       "\u001b[36mres105_4\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(3,2)\n",
+    "val b = NDArray.ones(1,2)\n",
+    "val d = NDArray.broadcast_to(b, (3,2))\n",
+    "val c = a + d\n",
+    "c.toArray\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Copies\n",
+    "Data is NOT copied in normal assignment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 107,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@b867e334\n",
+       "\u001b[36md\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@e77830eb\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@b8e1a014\n",
+       "\u001b[36mres106_3\u001b[0m: \u001b[32mBoolean\u001b[0m = \u001b[32mtrue\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(2,2)\n",
+    "val d = NDArray.zeros(2,2)\n",
+    "val b = a  \n",
+    "a == b"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "similar for function arguments passing.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 110,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mf\u001b[0m\n",
+       "\u001b[36mres109_1\u001b[0m: \u001b[32mBoolean\u001b[0m = \u001b[32mtrue\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def f(x: NDArray) ={  \n",
+    "    x\n",
+    "}\n",
+    "a == f(a)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The copy method makes a deep copy of the array and its data\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 112,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@1eacd198\n",
+       "\u001b[36mres111_1\u001b[0m: \u001b[32mBoolean\u001b[0m = \u001b[32mtrue\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val b = a.copy()\n",
+    "b == a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The above code allocate a new NDArray and then assign to b. We can use the copyto method to avoid additional memory allocation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 114,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@24498f4e\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@1bab121c\n",
+       "\u001b[36md\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@cb6f0c83\n",
+       "\u001b[36mres113_3\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@12fca855\n",
+       "\u001b[36mres113_4\u001b[0m: (\u001b[32mBoolean\u001b[0m, \u001b[32mBoolean\u001b[0m) = \u001b[33m\u001b[0m(\u001b[32mtrue\u001b[0m, \u001b[32mtrue\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val b = NDArray.ones(a.shape)\n",
+    "val c = b\n",
+    "val d = b\n",
+    "a.copyTo(d)\n",
+    "(c == b, d == b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The Advanced\n",
+    "There are some advanced features in mxnet.ndarray which make mxnet different from other libraries.\n",
+    "\n",
+    "## GPU Support\n",
+    "In default operators are executed on CPU. It is easy to switch to another computation resource, such as GPU, if available. The device information is stored in ndarray.context. When MXNet is compiled with flag USE_CUDA=1 and there is at least one Nvidia GPU card, we can make all computations run on GPU 0 by using Context.gpu(0), or simply Context.gpu(). If there are more than two GPUs, the 2nd GPU is represented by Context.gpu(1)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 121,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mf\u001b[0m\n",
+       "\u001b[36mres120_1\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@16acc21b\n",
+       "defined \u001b[32mfunction \u001b[36mf1\u001b[0m\n",
+       "\u001b[36mres120_3\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@1650e1f4\n",
+       "\u001b[36mctx\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mContext\u001b[0m] = \u001b[33mArray\u001b[0m(gpu(0), gpu(1))"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def f() ={\n",
+    "    val a = NDArray.ones(100,100)\n",
+    "    val b = NDArray.ones(100,100)\n",
+    "    val c = a + b\n",
+    "    c\n",
+    "}\n",
+    "// in default Context.cpu() is used\n",
+    "f()  \n",
+    "\n",
+    "// change the default context to the first GPU\n",
+    "def f1() ={\n",
+    "    val a = NDArray.ones(ctx=Context.cpu(0), shape=Shape(100,100))\n",
+    "    val b = NDArray.ones(ctx=Context.cpu(0), shape=Shape(100,100))\n",
+    "    val c = a + b\n",
+    "    c\n",
+    "}\n",
+    "f1()\n",
+    "\n",
+    "// you can also provide which cpus or gpus you want to use in array like this\n",
+    "val ctx = Array(Context.gpu(0), Context.gpu(1))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also explicitly specify the context when creating an array"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 122,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@a5e59bc"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(ctx=Context.cpu(0), shape=Shape(100,100))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Currently MXNet requires two arrays to sit on the same device for computation. There are several methods for copying data between devices."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 126,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@fa844d42\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@ae353f2a\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@dbacf13d\n",
+       "\u001b[36mres125_3\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@ae42e54e\n",
+       "\u001b[36md\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@a911584b\n",
+       "\u001b[36me\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@f29dc5fe\n",
+       "\u001b[36mres125_6\u001b[0m: (\u001b[32mNDArray\u001b[0m, \u001b[32mNDArray\u001b[0m) = \u001b[33m\u001b[0m(ml.dmlc.mxnet.NDArray@33e6380, ml.dmlc.mxnet.NDArray@e7a5cc45)\n",
+       "\u001b[36mres125_7\u001b[0m: (\u001b[32mContext\u001b[0m, \u001b[32mContext\u001b[0m) = \u001b[33m\u001b[0m(cpu(0), cpu(0))"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "\n",
+    "val a = NDArray.ones(ctx=Context.cpu(), shape= Shape(100,100))\n",
+    "val b = NDArray.ones(ctx=Context.gpu(), shape= Shape(100,100))\n",
+    "val c = NDArray.ones(ctx=Context.gpu(), shape= Shape(100,100))\n",
+    "a.copyTo(c)  // copy from CPU to GPU\n",
+    "val d = b + c\n",
+    "val e = b.asInContext(c.context) + c  // same to above\n",
+    "(d, e)\n",
+    "(d.context,e.context)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Serialize From/To (Distributed) Filesystems\n",
+    "You can use MXNet functions to save and load a list or dictionary of NDArrays from file systems, as follows:\n",
+    "\n",
+    "Besides single NDArray, we can load/save a list as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 127,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@c250ba93\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@5d65799\n",
+       "\u001b[36mc\u001b[0m: (\u001b[32mArray\u001b[0m[\u001b[32mString\u001b[0m], \u001b[32mArray\u001b[0m[\u001b[32mNDArray\u001b[0m]) = \u001b[33m\u001b[0m(\u001b[33mArray\u001b[0m(), \u001b[33mArray\u001b[0m(ml.dmlc.mxnet.NDArray@c9910b3a, ml.dmlc.mxnet.NDArray@8175e844))\n",
+       "\u001b[36mres126_4\u001b[0m: (\u001b[32mArray\u001b[0m[\u001b[32mString\u001b[0m], \u001b[32mArray\u001b[0m[\u001b[32mNDArray\u001b[0m]) = \u001b[33m\u001b[0m(\u001b[33mArray\u001b[0m(), \u001b[33mArray\u001b[0m(ml.dmlc.mxnet.NDArray@a15d3fe0, ml.dmlc.mxnet.NDArray@6a840199))"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = NDArray.ones(2,3)\n",
+    "val b = NDArray.ones(5,6)               \n",
+    "NDArray.save(\"temp.ndarray\", Array(a,b))\n",
+    "val c = NDArray.load(\"temp.ndarray\")\n",
+    "c"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "or a dict"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 128,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36md\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m(\u001b[32m\"A\"\u001b[0m -> ml.dmlc.mxnet.NDArray@f8a5cb1c, \u001b[32m\"B\"\u001b[0m -> ml.dmlc.mxnet.NDArray@3625fe0)\n",
+       "\u001b[36mc\u001b[0m: (\u001b[32mArray\u001b[0m[\u001b[32mString\u001b[0m], \u001b[32mArray\u001b[0m[\u001b[32mNDArray\u001b[0m]) = \u001b[33m\u001b[0m(\n",
+       "  \u001b[33mArray\u001b[0m(\u001b[32m\"A\"\u001b[0m, \u001b[32m\"B\"\u001b[0m),\n",
+       "  \u001b[33mArray\u001b[0m(ml.dmlc.mxnet.NDArray@c5142250, ml.dmlc.mxnet.NDArray@6d660c8a)\n",
+       ")\n",
+       "\u001b[36mres127_3\u001b[0m: (\u001b[32mArray\u001b[0m[\u001b[32mString\u001b[0m], \u001b[32mArray\u001b[0m[\u001b[32mNDArray\u001b[0m]) = \u001b[33m\u001b[0m(\n",
+       "  \u001b[33mArray\u001b[0m(\u001b[32m\"A\"\u001b[0m, \u001b[32m\"B\"\u001b[0m),\n",
+       "  \u001b[33mArray\u001b[0m(ml.dmlc.mxnet.NDArray@cf1d79d6, ml.dmlc.mxnet.NDArray@5fa44ac5)\n",
+       ")"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val d = Map(\"A\" -> a, \"B\" -> b)\n",
+    "NDArray.save(\"temp.ndarray\", d)\n",
+    "val c = NDArray.load(\"temp.ndarray\")\n",
+    "c"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If a distributed filesystem such as Amazon S3 or Hadoop HDFS is set up, we can directly save to and load from it\n",
+    "\n",
+    "```scala\n",
+    "val from_file = NDArray.load(\"/path/to/array/file\")\n",
+    "val from_s3 = NDArray.load(\"s3://path/to/s3/array\")\n",
+    "val from_hdfs = NDArray.load(\"hdfs://path/to/hdfs/array\")\n",
+    "    \n",
+    "NDArray.save(\"s3://mybucket/mydata.ndarray\", Map(\"A\" -> a))  // if compiled with USE_S3=1\n",
+    "NDArray.save(\"hdfs///users/myname/mydata.bin\", Map(\"B\" -> b))  // if compiled with USE_HDFS=1\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Futher Readings\n",
+    "[NDArray API](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.NDArray) Documents for all NDArray methods."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/basic/optimizer_scala.ipynb b/scala/basic/optimizer_scala.ipynb
new file mode 100644
index 000000000..3cb9cf6c3
--- /dev/null
+++ b/scala/basic/optimizer_scala.ipynb
@@ -0,0 +1,225 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Optimizer\n",
+    "In gradient-base optimization algorithms, we update the parameters (or weights) using the gradients in each iteration. We call this updating function as Optimizer.\n",
+    "\n",
+    "The main method of an optimizer is update(weight, grad), which updates a NDArray weight using a NDArray gradient. But given that a multi-layer neural network often has more than one weights, we assign each weight a unique integer index. Furthermore, an optimizer may need space to store auxiliary state, such as momentum, we also allow a user-defined state for updating. In summary, an optimizer has two major methods\n",
+    "\n",
+    "- createState(index, weight): create auxiliary state for the index-th weight.\n",
+    "- update(index, weight, grad, state): update the index-th weight given the gradient and auxiliary state. The state can be also updated.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Basic Usage\n",
+    "### Create and Update\n",
+    "MXNet has already implemented several popular optimizers in [optimizer.scala](https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/Optimizer.scala). An convenient way to create one is by using new SGD(args). The following codes create a standard SGD updater which does\n",
+    "\n",
+    "```scala\n",
+    "weight = weight - learning_rate * grad\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Import the optimizer you want to use as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.optimizer.SGD\u001b[0m\n",
+       "\u001b[36mopt\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32moptimizer\u001b[0m.\u001b[32mSGD\u001b[0m = ml.dmlc.mxnet.optimizer.SGD@4b5a5245"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import ml.dmlc.mxnet.optimizer.SGD\n",
+    "val opt = new SGD(learningRate=0.1f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then we can use the update function.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mgrad\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@a5d8d68a\n",
+       "\u001b[36mweight\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@ffb131d2\n",
+       "\u001b[36mindex\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m\n",
+       "\u001b[36mres2_4\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m0.89999F\u001b[0m, \u001b[32m0.89999F\u001b[0m, \u001b[32m0.89999F\u001b[0m, \u001b[32m0.89999F\u001b[0m, \u001b[32m0.89999F\u001b[0m, \u001b[32m0.89999F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val grad = NDArray.ones(2,3)\n",
+    "val weight = NDArray.ones(2,3)\n",
+    "val index = 0\n",
+    "opt.update(index, weight, grad, NDArray.empty(2,3))\n",
+    "weight.toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When momentum is non-zero, the sgd optimizer needs extra state. State is of type AnyRef. So, we cast the type to NDArray and then print the value of state.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mmomOpt\u001b[0m: \u001b[32mSGD\u001b[0m = ml.dmlc.mxnet.optimizer.SGD@143d6181\n",
+       "\u001b[36mindex\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m\n",
+       "\u001b[36mgrad\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@e409d01c\n",
+       "\u001b[36mweight\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@ac5011ac\n",
+       "\u001b[36mstate\u001b[0m: \u001b[32mAnyRef\u001b[0m = ml.dmlc.mxnet.NDArray@fbcb7abb\n",
+       "\u001b[36mres3_6\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m-0.10001F\u001b[0m, \u001b[32m-0.10001F\u001b[0m, \u001b[32m-0.10001F\u001b[0m, \u001b[32m-0.10001F\u001b[0m, \u001b[32m-0.10001F\u001b[0m, \u001b[32m-0.10001F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val momOpt = new SGD(learningRate = 0.1f, momentum = 0.01f)\n",
+    "val index = 0\n",
+    "val grad = NDArray.ones(2,3)\n",
+    "val weight = NDArray.ones(2,3)\n",
+    "val state = momOpt.createState(index, weight)\n",
+    "opt.update(index, weight, grad, state)\n",
+    "state.asInstanceOf[NDArray].toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "## Types of Optimizers supported\n",
+    "- [AdaDelta](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.AdaDelta)\n",
+    "- [AdaGrad](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.AdaGrad)\n",
+    "- [Adam](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.Adam)\n",
+    "- [SGD](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.SGD)\n",
+    "- [SGLD](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.SGLD)\n",
+    "- [DCASGD](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.DCASGD)\n",
+    "- [NAG](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.NAG)\n",
+    "- [RMSProp](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.RMSProp)\n",
+    "\n",
+    "You can set these optimizers while building a FeedForward network in `.setOptimizer(new SGD(...))` method "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Further Reading\n",
+    "[Optimizer](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.optimizer.AdaGrad)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/basic/predict_scala.ipynb b/scala/basic/predict_scala.ipynb
new file mode 100644
index 000000000..c0d9d0ac0
--- /dev/null
+++ b/scala/basic/predict_scala.ipynb
@@ -0,0 +1,386 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Predict and Extract Features with Pre-trained Models\n",
+    "This tutorial will work through how to use pre-trained models for predicting and feature extraction.\n",
+    "\n",
+    "## Download pre-trained models\n",
+    "A model often contains two parts, the .json file specifying the neural network structure, and the .params file containing the binary parameters. The name convention is name-symbol.json and name-epoch.params, where name is the model name, and epoch is the epoch number.\n",
+    "\n",
+    "Here we download a pre-trained Resnet 50-layer model on Imagenet. Other models are available at http://data.mxnet.io/models/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Import necessary libraries:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.module.{FitParams, Module}\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.collection.immutable.ListMap\u001b[0m\n",
+       "\u001b[32mimport \u001b[36msys.process._\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import ml.dmlc.mxnet.module.{FitParams, Module}\n",
+    "import scala.collection.immutable.ListMap\n",
+    "import sys.process._"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Download ResNet pretrained model as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "     0K .......... .......... .......... .......... .......... 67%  153K 0s\n",
+      "    50K .......... .......... ....                            100%  148K=0.5s"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres2_0\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m\n",
+       "\u001b[36mres2_1\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "\"wget http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-symbol.json -P model/ -q --show-progress\"!\n",
+    "\n",
+    "\"wget http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50-0000.params -P model/ -q --show-progress\"!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization\n",
+    "We first load the model into memory with loadCheckpoint. It returns the symbol (see [symbol_scala.ipynb](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/symbol_scala.ipynb)) definition of the neural network, and parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mresnet\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@56b5a235\n",
+       "\u001b[36margParamsResnet\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"stage1_unit3_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@dd62b3ac,\n",
+       "  \u001b[32m\"stage3_unit2_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@4869b270,\n",
+       "  \u001b[32m\"stage2_unit3_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@87d3d2dc,\n",
+       "  \u001b[32m\"stage4_unit3_bn1_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@84209042,\n",
+       "  \u001b[32m\"stage1_unit3_bn1_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@7a9f4c52,\n",
+       "  \u001b[32m\"stage3_unit1_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@d463a406,\n",
+       "  \u001b[32m\"stage2_unit1_bn3_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@b84d8eb5,\n",
+       "  \u001b[32m\"stage2_unit4_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@a7a69033,\n",
+       "  \u001b[32m\"stage2_unit4_bn3_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@ec41b811,\n",
+       "  \u001b[32m\"stage3_unit4_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@950aaf0c,\n",
+       "  \u001b[32m\"stage3_unit6_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@32fcc228,\n",
+       "  \u001b[32m\"stage2_unit2_bn3_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@994e9bfe,\n",
+       "  \u001b[32m\"stage3_unit2_bn3_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@5e97ef69,\n",
+       "  \u001b[32m\"stage1_unit2_bn2_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@d3354926,\n",
+       "  \u001b[32m\"stage4_unit1_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@98b68f03,\n",
+       "  \u001b[32m\"stage2_unit2_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@d16108a0,\n",
+       "  \u001b[32m\"stage3_unit6_bn2_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9bbc2c2d,\n",
+       "  \u001b[32m\"stage2_unit2_bn2_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@a1a93420,\n",
+       "  \u001b[32m\"stage1_unit2_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@67cb7cd0,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mauxParamsResnet\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"stage2_unit2_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@e7b7de4a,\n",
+       "  \u001b[32m\"stage3_unit6_bn2_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@5eeb2eeb,\n",
+       "  \u001b[32m\"stage2_unit2_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@e1932992,\n",
+       "  \u001b[32m\"stage3_unit1_bn2_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@6f2c9370,\n",
+       "  \u001b[32m\"stage1_unit3_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@b1fa3f55,\n",
+       "  \u001b[32m\"stage2_unit4_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@8e2a60bb,\n",
+       "  \u001b[32m\"stage2_unit4_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@d2129062,\n",
+       "  \u001b[32m\"stage3_unit1_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@7c521226,\n",
+       "  \u001b[32m\"stage2_unit3_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@a746a3a3,\n",
+       "  \u001b[32m\"stage2_unit1_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@942a806e,\n",
+       "  \u001b[32m\"stage1_unit2_bn1_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@91e5ad00,\n",
+       "  \u001b[32m\"stage4_unit3_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@2bc6bafd,\n",
+       "  \u001b[32m\"stage1_unit2_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9112eae5,\n",
+       "  \u001b[32m\"stage3_unit6_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@63ea781d,\n",
+       "  \u001b[32m\"stage3_unit5_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@56f8789e,\n",
+       "  \u001b[32m\"stage2_unit2_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9637b319,\n",
+       "  \u001b[32m\"stage2_unit1_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@bf9f5368,\n",
+       "  \u001b[32m\"stage1_unit1_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@2596a8e3,\n",
+       "  \u001b[32m\"stage3_unit3_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9140fe7c,\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "//val mod = Module.loadCheckpoint(\"model/resnet-50\", 0)\n",
+    "val (resnet, argParamsResnet, auxParamsResnet) = Model.loadCheckpoint(\"model/resnet-50\", 0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can visualize the neural network by `Visualization.plotNetwork` and save it by `dot.render` method. Give path where you want to save the visualization as follows: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdot\u001b[0m: \u001b[32mVisualization\u001b[0m.\u001b[32mDot\u001b[0m = ml.dmlc.mxnet.Visualization$Dot@40fe88f2"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val dot = Visualization.plotNetwork(symbol = resnet, nodeAttrs = Map(\"shape\" -> \"oval\", \"fixedsize\" -> \"false\") )\n",
+    "dot.render(engine = \"dot\", fileName = \"resnet\", path = \"model/\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Both argument parameters and auxiliary parameters (e.g mean/std in batch normalization layer) are stored as a dictionary of string name and ndarray value (see [ndarray_scala.ipynb](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/ndarray_scala.ipynb). The arguments contain consist of weight and bias.\n",
+    "\n",
+    "You can see full output by `println` command"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres4\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"stage1_unit3_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9a4b8176,\n",
+       "  \u001b[32m\"stage3_unit2_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@6abf1f48,\n",
+       "  \u001b[32m\"stage2_unit3_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9df221ba,\n",
+       "  \u001b[32m\"stage4_unit3_bn1_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@75d224a8,\n",
+       "  \u001b[32m\"stage1_unit3_bn1_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@640dd0ce,\n",
+       "  \u001b[32m\"stage3_unit1_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@abdd2283,\n",
+       "  \u001b[32m\"stage2_unit1_bn3_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@b816e9c2,\n",
+       "  \u001b[32m\"stage2_unit4_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@7a388895,\n",
+       "  \u001b[32m\"stage2_unit4_bn3_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@f6083d12,\n",
+       "  \u001b[32m\"stage3_unit4_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@abfbce39,\n",
+       "  \u001b[32m\"stage3_unit6_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@52b0e4a3,\n",
+       "  \u001b[32m\"stage2_unit2_bn3_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@fce672e1,\n",
+       "  \u001b[32m\"stage3_unit2_bn3_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@77fe9b1f,\n",
+       "  \u001b[32m\"stage1_unit2_bn2_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@d9e38fd9,\n",
+       "  \u001b[32m\"stage4_unit1_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9ed3362a,\n",
+       "  \u001b[32m\"stage2_unit2_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@b17c306f,\n",
+       "  \u001b[32m\"stage3_unit6_bn2_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@5ff9b8cb,\n",
+       "  \u001b[32m\"stage2_unit2_bn2_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@c3f656a6,\n",
+       "  \u001b[32m\"stage1_unit2_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@af1280a8,\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "argParamsResnet"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "while auxiliaries contains the the mean and std for the batch normalization layers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres5\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"stage2_unit2_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@aa539dcc,\n",
+       "  \u001b[32m\"stage3_unit6_bn2_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@6e85d8da,\n",
+       "  \u001b[32m\"stage2_unit2_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@fdc34046,\n",
+       "  \u001b[32m\"stage3_unit1_bn2_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@8cbb07ea,\n",
+       "  \u001b[32m\"stage1_unit3_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@673a1594,\n",
+       "  \u001b[32m\"stage2_unit4_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@a2bb7ded,\n",
+       "  \u001b[32m\"stage2_unit4_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@dacdc026,\n",
+       "  \u001b[32m\"stage3_unit1_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@8cbcc0cc,\n",
+       "  \u001b[32m\"stage2_unit3_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@fe5da542,\n",
+       "  \u001b[32m\"stage2_unit1_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9a7dfe4b,\n",
+       "  \u001b[32m\"stage1_unit2_bn1_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@698017c1,\n",
+       "  \u001b[32m\"stage4_unit3_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@6ea56feb,\n",
+       "  \u001b[32m\"stage1_unit2_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@b32bb14c,\n",
+       "  \u001b[32m\"stage3_unit6_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@733075d0,\n",
+       "  \u001b[32m\"stage3_unit5_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@8a9da512,\n",
+       "  \u001b[32m\"stage2_unit2_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@fad6d5c4,\n",
+       "  \u001b[32m\"stage2_unit1_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@c8e406da,\n",
+       "  \u001b[32m\"stage1_unit1_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@f25cd69,\n",
+       "  \u001b[32m\"stage3_unit3_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@8bac6514,\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "auxParamsResnet"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next we create an executable module (see [module_scala.ipynb](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/module_scala.ipynb)) on GPU 0. To use a difference device, we just need to change the context, e.g. Context.cpu(0) for CPU and Context.gpu(2) for the 3rd GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mmod\u001b[0m: \u001b[32mModule\u001b[0m = ml.dmlc.mxnet.module.Module@56ca816d"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val mod = new Module(resnet, contexts = Context.cpu())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The ResNet is trained with RGB images of size 224 x 224. The training data is feed by the variable data. We bind the module with the input shape and specify that it is only for predicting. The number 1 added before the image shape (3x224x224) means that we will only predict one image each time. Next we set the loaded parameters. Now the module is ready to run."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/basic/record_io_scala.ipynb b/scala/basic/record_io_scala.ipynb
new file mode 100644
index 000000000..d0b31acc4
--- /dev/null
+++ b/scala/basic/record_io_scala.ipynb
@@ -0,0 +1,363 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Scala Record IO\n",
+    "In [image_io](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/image_io_scala.ipynb) we already learned how to pack image into standard recordio format and load it with ImageRecordIter. This tutorial will walk through the scala interface for reading and writing record io files. It can be useful when you need more more control over the details of data pipeline. For example, when you need to augument image and label together for detection and segmentation, or when you need a custom data iterator for triplet sampling and negative sampling.\n",
+    "\n",
+    "You can find relevant code [here](https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/RecordIO.scala). There are two classes: [MXRecordIO](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.MXRecordIO), which supports sequential read and write, and [MXIndexedRecordIO](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.MXIndexedRecordIO), which supports random read and sequential write."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## MXRecordIO\n",
+    "First let's take a look at `MXRecordIO`. It takes path to recordIO file and `MXRecordIO.IOFlag` as input. `MXRecordIO.IOFlag` is `MXRecordIO.IORead` for reading and `MXRecordIO.Write` for writing. \n",
+    "\n",
+    "We open a file tmp.rec and write 5 strings to it with `MXRecordIO.IOWrite` flag:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mjava.io._\u001b[0m\n",
+       "\u001b[36mfRec\u001b[0m: \u001b[32mjava\u001b[0m.\u001b[32mio\u001b[0m.\u001b[32mFile\u001b[0m = /var/folders/f4/gts7qnkx319_nv4176gbz4jjrjzb4y/T/tmpFile2805315252382756478.tmp\n",
+       "\u001b[36mN\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m5\u001b[0m\n",
+       "\u001b[36mwriter\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mMXRecordIO\u001b[0m = ml.dmlc.mxnet.MXRecordIO@28a470f1"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import java.io._\n",
+    "\n",
+    "val fRec = File.createTempFile(\"tmpFile\", \".tmp\")\n",
+    "val N = 5\n",
+    "\n",
+    "val writer = new MXRecordIO(fRec.getAbsolutePath, MXRecordIO.IOWrite)\n",
+    "for (i <- 0 until N) {\n",
+    "    writer.write(\"record_\"+i)\n",
+    "}\n",
+    "writer.close()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then we can read it back by opening the same file with `MXRecordIO.IORead` flag as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "record_0\n",
+      "record_1\n",
+      "record_2\n",
+      "record_3\n",
+      "record_4\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mreader\u001b[0m: \u001b[32mMXRecordIO\u001b[0m = ml.dmlc.mxnet.MXRecordIO@5bf33b34"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val reader = new MXRecordIO(fRec.getAbsolutePath, MXRecordIO.IORead)\n",
+    "for (i <- 0 until N) {\n",
+    "    val res = reader.read()\n",
+    "    println(res)\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## MXIndexedRecordIO\n",
+    "Some times you need random access for more complex tasks. MXIndexedRecordIO is designed for this. Here we create a indexed record tmp.rec and a corresponding index file tmp.idx:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mfIdxRec\u001b[0m: \u001b[32mFile\u001b[0m = /var/folders/f4/gts7qnkx319_nv4176gbz4jjrjzb4y/T/tmpIdxFile9045730139606611372.tmp\n",
+       "\u001b[36mfIdx\u001b[0m: \u001b[32mFile\u001b[0m = /var/folders/f4/gts7qnkx319_nv4176gbz4jjrjzb4y/T/tmpIdx2844802785206482836.tmp\n",
+       "\u001b[36mN\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m5\u001b[0m\n",
+       "\u001b[36mwriter\u001b[0m: \u001b[32mMXIndexedRecordIO\u001b[0m = ml.dmlc.mxnet.MXIndexedRecordIO@43acea84"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val fIdxRec = File.createTempFile(\"tmpIdxFile\", \".tmp\")\n",
+    "val fIdx = File.createTempFile(\"tmpIdx\", \".tmp\")\n",
+    "val N = 5\n",
+    "\n",
+    "val writer = new MXIndexedRecordIO(fIdx.getAbsolutePath, fIdxRec.getAbsolutePath, MXRecordIO.IOWrite)\n",
+    "for (i <- 0 until N) {\n",
+    "  writer.writeIdx(i, \"record_\"+i)\n",
+    "}\n",
+    "writer.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can then access records with keys:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "record_1\n",
+      "record_4\n",
+      "record_3\n",
+      "record_0\n",
+      "record_2\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mreader\u001b[0m: \u001b[32mMXIndexedRecordIO\u001b[0m = ml.dmlc.mxnet.MXIndexedRecordIO@35c87f7a\n",
+       "\u001b[36mkeys\u001b[0m: \u001b[32mList\u001b[0m[\u001b[32mInt\u001b[0m] = \u001b[33mList\u001b[0m(\u001b[32m1\u001b[0m, \u001b[32m4\u001b[0m, \u001b[32m3\u001b[0m, \u001b[32m0\u001b[0m, \u001b[32m2\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val reader = new MXIndexedRecordIO(fIdx.getAbsolutePath, fIdxRec.getAbsolutePath, MXRecordIO.IORead)\n",
+    "var keys = reader.keys().map(_.asInstanceOf[Int]).toList.sorted\n",
+    " //   assert(keys.zip(0 until N).forall(x => x._1 == x._2))\n",
+    "keys = scala.util.Random.shuffle(keys)\n",
+    "for (k <- keys) {\n",
+    "    val res = reader.readIdx(k)\n",
+    "    println(res)\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can list all keys with:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres5\u001b[0m: \u001b[32mIterable\u001b[0m[\u001b[32mAny\u001b[0m] = \u001b[33mSet\u001b[0m(0, 1, 2, 3, 4)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "reader.keys"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Packing and Unpacking Data\n",
+    "Each record in a .rec file can contain arbitrary binary data, but machine learning data typically has a label/data structure. IO.MXRecordIO also contains a few utility functions for packing such data, namely: pack, unpack.\n",
+    "\n",
+    "### Binary Data\n",
+    "`pack` and `unpack` methods are used for storing 1d array of float label and binary data which is shown in following example.\n",
+    "\n",
+    "`IRHeader` class takes flag, label, id and id2 as parameters.\n",
+    "\n",
+    "`pack` method takes header of type IRHeader(header of the image record) and string to pack as input parameters and returns the resulting packed string.\n",
+    "\n",
+    "`unpack` method takes string buffer from MXRecordIO.read as input and returns header of type IRHeader(header of the image record) and unpacked string"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "ename": "",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "Failure(\"}\":2:5 ...\"s1 = MXRec\")"
+     ]
+    }
+   ],
+   "source": [
+    "def pack(header1, header2, data:String): Unit{\n",
+    "    s1 = MXRecordIO.pack(header1, data)\n",
+    "    s2 = MXRecordIO.pack(header2, data)\n",
+    "}\n",
+    "\n",
+    "val data = \"data\"\n",
+    "val label1 = Array(1f)\n",
+    "var s1: String = null\n",
+    "var s2: String = null\n",
+    "val header1 = MXRecordIO.IRHeader(0, label1, 1, 0)\n",
+    "\n",
+    "val label2 = Array(1f, 2f, 3f)\n",
+    "val header2 = MXRecordIO.IRHeader(0, label2, 2, 0)\n",
+    "\n",
+    "pack(header1, header2, data)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mrHeader1\u001b[0m: \u001b[32mMXRecordIO\u001b[0m.\u001b[32mIRHeader\u001b[0m = \u001b[33mIRHeader\u001b[0m(\u001b[32m1\u001b[0m, \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m), \u001b[32m1\u001b[0m, \u001b[32m0\u001b[0m)\n",
+       "\u001b[36mrContent1\u001b[0m: \u001b[32mString\u001b[0m = \u001b[32m\"data\"\u001b[0m\n",
+       "\u001b[36mrHeader2\u001b[0m: \u001b[32mMXRecordIO\u001b[0m.\u001b[32mIRHeader\u001b[0m = \u001b[33mIRHeader\u001b[0m(\u001b[32m3\u001b[0m, \u001b[33mArray\u001b[0m(\u001b[32m1.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m3.0F\u001b[0m), \u001b[32m2\u001b[0m, \u001b[32m0\u001b[0m)\n",
+       "\u001b[36mrContent2\u001b[0m: \u001b[32mString\u001b[0m = \u001b[32m\"data\"\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// unpack\n",
+    "val (rHeader1, rContent1) = MXRecordIO.unpack(s1)\n",
+    "val (rHeader2, rContent2) = MXRecordIO.unpack(s2)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next Step\n",
+    "- [Advanced Image IO](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/advanced_img_io.ipynb) Advanced image IO for detection, segmentation, etc..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/basic/symbol_scala.ipynb b/scala/basic/symbol_scala.ipynb
new file mode 100644
index 000000000..0dc8ac053
--- /dev/null
+++ b/scala/basic/symbol_scala.ipynb
@@ -0,0 +1,918 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Symbol Tutorial\n",
+    "Besides the tensor computation interface NDArray, another main object in MXNet is the Symbol provided by MXNet.Symbol. A symbol represents a multi-output symbolic expression. They are composited by operators, such as simple matrix operations (e.g. “+”), or a neural network layer (e.g. convolution layer). An operator can take several input variables, produce more than one output variables, and have internal state variables. A variable can be either free, which we can bind with value later, or an output of another symbol.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Symbol Composition\n",
+    "### Basic Operators\n",
+    "The following example composites a simple expression a+b. We first create the placeholders a and b with names using Symbol.Variable, and then construct the desired symbol by using the operator +. When the string name is not given during creating, MXNet will automatically generate a unique name for the symbol, which is the case for c."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.Visualization\u001b[0m\n",
+       "\u001b[36ma\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@62b80558\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@4565f56d\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@345db42b\n",
+       "\u001b[36mres1_5\u001b[0m: (\u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m, \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m, \u001b[32mml\u001b[0m.\u001b[32mdmlc\u001b[0m.\u001b[32mmxnet\u001b[0m.\u001b[32mSymbol\u001b[0m) = \u001b[33m\u001b[0m(\n",
+       "  ml.dmlc.mxnet.Symbol@62b80558,\n",
+       "  ml.dmlc.mxnet.Symbol@4565f56d,\n",
+       "  ml.dmlc.mxnet.Symbol@345db42b\n",
+       ")"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import ml.dmlc.mxnet.Visualization\n",
+    "\n",
+    "val a = Symbol.Variable(\"a\")\n",
+    "val b = Symbol.Variable(\"b\")\n",
+    "val c = a + b\n",
+    "(a, b, c)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Most NDArray operators can be applied to Symbol, for example:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36md\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@7e72d295\n",
+       "\u001b[36me\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@7695a017\n",
+       "\u001b[36mf\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@253b4644\n",
+       "\u001b[36mg\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@2ac34e90"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// elemental wise times\n",
+    "val d = a * b  \n",
+    "// matrix multiplication\n",
+    "val e = Symbol.dot()(a, b)()\n",
+    "// reshape\n",
+    "val f = Symbol.Reshape()(d+e)()  \n",
+    "// broadcast\n",
+    "val g = Symbol.broadcast_to()(f)()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Visualization:\n",
+    "\n",
+    "MXNet Scala package uses a simplified implementation of the python-Graphviz library functionality based on: https://github.com/xflr6/graphviz/tree/master/graphviz. You can find the detailed [source code here](https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/Visualization.scala).\n",
+    "\n",
+    "To visualize the network, create a folder to save the images or pdfs and provide its path in `dot.render()` method as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdot\u001b[0m: \u001b[32mVisualization\u001b[0m.\u001b[32mDot\u001b[0m = ml.dmlc.mxnet.Visualization$Dot@3fc1c14"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val dot = Visualization.plotNetwork(symbol = g)\n",
+    "dot.render(engine = \"dot\", fileName = \"g\", path = \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Basic Neural Networks\n",
+    "Besides the basic operators, Symbol has a rich set of neural network layers. The following codes construct a two layer fully connected neural work and then visualize the structure by given the input data shape."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdata\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@4e147b90\n",
+       "\u001b[36mfc1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@449a676\n",
+       "\u001b[36mact1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@624e9263\n",
+       "\u001b[36mfc2\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@202a4b05\n",
+       "\u001b[36mnet\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@4d8f7c6b"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// Output may vary\n",
+    "val data = Symbol.Variable(\"data\")\n",
+    "val fc1 = Symbol.FullyConnected(name = \"fc1\")()(Map(\"data\" -> data, \"num_hidden\" -> 128))\n",
+    "val act1 = Symbol.Activation(name = \"relu1\")()(Map(\"data\" -> fc1, \"act_type\" -> \"relu\"))\n",
+    "val fc2 = Symbol.FullyConnected(name = \"fc2\")()(Map(\"data\" -> act1, \"num_hidden\" -> 10))\n",
+    "val net = Symbol.SoftmaxOutput(name = \"out\")()(Map(\"data\" -> fc2))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To visualize the network:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdot\u001b[0m: \u001b[32mVisualization\u001b[0m.\u001b[32mDot\u001b[0m = ml.dmlc.mxnet.Visualization$Dot@56525af2"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val dot = Visualization.plotNetwork(symbol = net)\n",
+    "dot.render(engine = \"dot\", fileName = \"net\", path = \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Modulelized Construction for Deep Networks\n",
+    "For deep networks, such as the Google Inception, constructing layer by layer is painful given the large number of layers. For these networks, we often modularize the construction. Take the Google Inception as an example, we can first define a factory function to chain the convolution layer, batch normalization layer, and Relu activation layer together:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mConvFactory\u001b[0m\n",
+       "\u001b[36mprev\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@a763473\n",
+       "\u001b[36mconvComp\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@251e3edb\n",
+       "\u001b[36mshape\u001b[0m: \u001b[32mShape\u001b[0m = (128,3,28,28)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    " // Output may vary\n",
+    "def ConvFactory(data: Symbol, numFilter: Int, kernel: (Int, Int), stride: (Int, Int) = (1, 1),\n",
+    "      pad: (Int, Int) = (0, 0), name: String = \"\", suffix: String = \"\"): Symbol = {\n",
+    "    val conv = Symbol.Convolution(s\"conv_${name}${suffix}\")()(\n",
+    "        Map(\"data\" -> data, \"num_filter\" -> numFilter, \"kernel\" -> s\"$kernel\",\n",
+    "            \"stride\" -> s\"$stride\", \"pad\" -> s\"$pad\"))\n",
+    "      \n",
+    "    val bn = Symbol.BatchNorm(s\"bn_${name}${suffix}\")()(Map(\"data\" -> conv))\n",
+    "      \n",
+    "    val act = Symbol.Activation(s\"relu_${name}${suffix}\")()(\n",
+    "        Map(\"data\" -> bn, \"act_type\" -> \"relu\"))\n",
+    "    act\n",
+    "  }\n",
+    "\n",
+    "val prev = Symbol.Variable(\"PreviosOutput\")\n",
+    "val convComp = ConvFactory(data = prev, numFilter = 64, kernel = (7, 7), stride=(2, 2))\n",
+    "val shape = Shape(128, 3, 28, 28)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To visualize the network:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdot\u001b[0m: \u001b[32mVisualization\u001b[0m.\u001b[32mDot\u001b[0m = ml.dmlc.mxnet.Visualization$Dot@4958afa4"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val dot = Visualization.plotNetwork(symbol = convComp, title = \"ConvFactory\", shape = Map(\"PreviosOutput\" -> shape), \n",
+    "                                    nodeAttrs = Map(\"shape\" -> \"oval\", \"fixedsize\" -> \"false\"))\n",
+    "\n",
+    "dot.render(engine = \"dot\", fileName = \"ConvFactory\", path = \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then we define a function that constructs an Inception module based on ConvFactory\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mInceptionFactoryA\u001b[0m\n",
+       "\u001b[36mprev\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@21f75c67\n",
+       "\u001b[36min3a\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@775e917d"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def InceptionFactoryA(data: Symbol, num1x1: Int, num3x3red: Int, num3x3: Int,\n",
+    "      numd3x3red: Int, numd3x3: Int, pool: String, proj: Int, name: String): Symbol = {\n",
+    "    // 1x1\n",
+    "    val c1x1 = ConvFactory(data = data, numFilter = num1x1,\n",
+    "        kernel = (1, 1), name = s\"${name}_1x1\")\n",
+    "    // 3x3 reduce + 3x3\n",
+    "    val c3x3r = ConvFactory(data = data, numFilter = num3x3red,\n",
+    "        kernel = (1, 1), name = s\"${name}_3x3\", suffix = \"_reduce\")\n",
+    "    val c3x3 = ConvFactory(data = c3x3r, numFilter = num3x3,\n",
+    "        kernel = (3, 3), pad = (1, 1), name = s\"${name}_3x3\")\n",
+    "    // double 3x3 reduce + double 3x3\n",
+    "    val cd3x3r = ConvFactory(data = data, numFilter = numd3x3red,\n",
+    "        kernel = (1, 1), name = s\"${name}_double_3x3\", suffix = \"_reduce\")\n",
+    "    var cd3x3 = ConvFactory(data = cd3x3r, numFilter = numd3x3,\n",
+    "        kernel = (3, 3), pad = (1, 1), name = s\"${name}_double_3x3_0\")\n",
+    "    cd3x3 = ConvFactory(data = cd3x3, numFilter = numd3x3,\n",
+    "        kernel = (3, 3), pad = (1, 1), name = s\"${name}_double_3x3_1\")\n",
+    "    // pool + proj\n",
+    "    val pooling = Symbol.Pooling(s\"${pool}_pool_${name}_pool\")()(\n",
+    "        Map(\"data\" -> data, \"kernel\" -> \"(3, 3)\", \"stride\" -> \"(1, 1)\",\n",
+    "            \"pad\" -> \"(1, 1)\", \"pool_type\" -> pool))\n",
+    "    val cproj = ConvFactory(data = pooling, numFilter = proj,\n",
+    "        kernel = (1, 1), name = s\"${name}_proj\")\n",
+    "    // concat\n",
+    "    val concat = Symbol.Concat(s\"ch_concat_${name}_chconcat\")(c1x1, c3x3, cd3x3, cproj)()\n",
+    "    concat\n",
+    "  }\n",
+    "\n",
+    "\n",
+    "val prev = Symbol.Variable(\"PreviosOutput\")\n",
+    "val in3a = InceptionFactoryA(prev, 64, 64, 64, 64, 96, \"avg\", 32, \"in3a\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To visualize the network:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdot\u001b[0m: \u001b[32mVisualization\u001b[0m.\u001b[32mDot\u001b[0m = ml.dmlc.mxnet.Visualization$Dot@50a4da62"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val dot = Visualization.plotNetwork(symbol=in3a, shape = Map(\"PreviosOutput\" -> shape), nodeAttrs = Map(\"shape\" -> \"oval\", \"fixedsize\" -> \"false\"))\n",
+    "\n",
+    "dot.render(engine = \"dot\", fileName = \"InceptionFactoryA\", path = \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally we can obtain the whole network by chaining multiple inception modulas. A complete example is available at [visualization example](https://github.com/dmlc/mxnet/tree/master/scala-package/examples/src/main/scala/ml/dmlc/mxnet/examples/visualization)\n",
+    "### Group Multiple Symbols\n",
+    "To construct neural networks with multiple loss layers, we can use mxnet.Symbol.Group to group multiple symbols together. The following example group two outputs:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdata\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@27b69901\n",
+       "\u001b[36mfc1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@5b6d60d9\n",
+       "\u001b[36mnet\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@5535de50\n",
+       "\u001b[36mout1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@3593b1c3\n",
+       "\u001b[36mout2\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@26fe58e1\n",
+       "\u001b[36mgroup\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@16a59f4f\n",
+       "\u001b[36mres10_6\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mString\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\u001b[32m\"softmax_output\"\u001b[0m, \u001b[32m\"regression_output\"\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val data = Symbol.Variable(\"data\")\n",
+    "val fc1 = Symbol.FullyConnected(name = \"fc1\")()(Map(\"data\" -> data, \"num_hidden\" -> 128))\n",
+    "val net = Symbol.Activation(name = \"relu1\")()(Map(\"data\" -> fc1, \"act_type\" -> \"relu\"))\n",
+    "val out1 = Symbol.SoftmaxOutput(name = \"softmax\")()(Map(\"data\" -> act1))\n",
+    "val out2 = Symbol.LinearRegressionOutput(\"regression\")()(Map(\"data\" -> net))\n",
+    "val group = Symbol.Group(out1,out2)\n",
+    "group.listOutputs()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Relations to NDArray\n",
+    "As can be seen now, both Symbol and NDArray provide multi-dimensional array operations, such as c=a+b in MXNet. Sometimes users are confused which way to use. We briefly clarify the difference here, more detailed explanation are available [here](http://mxnet.io/architecture/program_model.html).\n",
+    "\n",
+    "The NDArray provides an imperative programming alike interface, in which the computations are evaluated sentence by sentence. While Symbol is closer to declarative programming, in which we first declare the computation, and then evaluate with data. Examples in this category include regular expression and SQL.\n",
+    "\n",
+    "The pros for NDArray:\n",
+    "\n",
+    "- straightforward\n",
+    "- easy to work with other language features (for loop, if-else condition, ..) and libraries (numpy, ..)\n",
+    "- easy to step-by-step debug\n",
+    "\n",
+    "The pros for Symbol:\n",
+    "\n",
+    "- provides almost all functionalities of NDArray, such as +, *, sin, and reshape\n",
+    "- provides a large number of neural network related operators such as Convolution, Activation, and BatchNorm\n",
+    "- provides automatic differentiation\n",
+    "- easy to construct and manipulate complex computations such as deep neural networks\n",
+    "- easy to save, load, and visualization\n",
+    "- easy for the backend to optimize the computation and memory usage\n",
+    "\n",
+    "We will show on the mixed programming tutorial how these two interfaces can be used together to develop a complete training program. This tutorial will focus on the usage of Symbol."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Symbol Manipulation *\n",
+    "One important difference of Symbol comparing to NDArray is that, we first declare the computation, and then bind with data to run.\n",
+    "\n",
+    "In this section we introduce the functions to manipulate a symbol directly. But note that, most of them are wrapped nicely by the mx.module. One can skip this section safely."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Shape Inference\n",
+    "For each symbol, we can query its inputs (or arguments) and outputs. We can also inference the output shape by given the input shape, which facilitates memory allocation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36margName\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mString\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\u001b[32m\"a\"\u001b[0m, \u001b[32m\"b\"\u001b[0m)\n",
+       "\u001b[36moutName\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mString\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\u001b[32m\"_plus0_output\"\u001b[0m)\n",
+       "\u001b[36margShape\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mShape\u001b[0m] = \u001b[33mVector\u001b[0m((2,3), (2,3))\n",
+       "\u001b[36moutShape\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mShape\u001b[0m] = \u001b[33mVector\u001b[0m((2,3))"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val argName = c.listArguments()  // get the names of the inputs\n",
+    "val outName = c.listOutputs()    // get the names of the outputs\n",
+    "val (argShape, outShape, _) = c.inferShape(Map(\"a\" -> Shape(2,3), \"b\" -> Shape(2,3)))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Bind with Data and Evaluate\n",
+    "The symbol c we constructed declares what computation should be run. To evaluate it, we need to feed arguments, namely free variables, with data first. We can do it by using the bind method, which accepts device context and a dict mapping free variable names to NDArrays as arguments and returns an executor. The executor provides method forward for evaluation and attribute outputs to get all results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "number of outputs = 1\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mex\u001b[0m: \u001b[32mExecutor\u001b[0m = ml.dmlc.mxnet.Executor@502139d3\n",
+       "\u001b[36mres12_3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m, \u001b[32m2.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val ex = c.bind(ctx=Context.cpu(), args=Map(\"a\" -> NDArray.ones(2,3), \n",
+    "                                \"b\" -> NDArray.ones(2,3)))\n",
+    "ex.forward()\n",
+    "println(\"number of outputs = \"+ ex.outputs.length)\n",
+    "ex.outputs(0).toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can evaluate the same symbol on GPU with different data\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mex_gpu\u001b[0m: \u001b[32mExecutor\u001b[0m = ml.dmlc.mxnet.Executor@19e1e3a7\n",
+       "\u001b[36mres14_2\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m, \u001b[32m5.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val ex_gpu = c.bind(ctx=Context.gpu(), args=Map(\"a\" -> NDArray.ones(shape=Shape(3,4), Context.gpu(), dtype = DType.Float32)*2,\n",
+    "                                    \"b\" -> NDArray.ones(shape=Shape(3,4), Context.gpu(), dtype = DType.Float32)*3))\n",
+    "ex_gpu.forward()\n",
+    "ex_gpu.outputs(0).toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load and Save\n",
+    "Similar to NDArray, we can serialize a Symbol object by using save and load methods directly. Different to the binary format chosen by NDArray, Symbol uses the more readable json format for serialization. The toJson method returns the json string."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{\n",
+      "  \"nodes\": [\n",
+      "    {\n",
+      "      \"op\": \"null\", \n",
+      "      \"name\": \"a\", \n",
+      "      \"inputs\": []\n",
+      "    }, \n",
+      "    {\n",
+      "      \"op\": \"null\", \n",
+      "      \"name\": \"b\", \n",
+      "      \"inputs\": []\n",
+      "    }, \n",
+      "    {\n",
+      "      \"op\": \"elemwise_add\", \n",
+      "      \"name\": \"_plus0\", \n",
+      "      \"inputs\": [[0, 0, 0], [1, 0, 0]]\n",
+      "    }\n",
+      "  ], \n",
+      "  \"arg_nodes\": [0, 1], \n",
+      "  \"node_row_ptr\": [0, 1, 2, 3], \n",
+      "  \"heads\": [[2, 0, 0]], \n",
+      "  \"attrs\": {\"mxnet_version\": [\"int\", 904]}\n",
+      "}\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mc2\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@234ea43a\n",
+       "\u001b[36mres15_3\u001b[0m: \u001b[32mBoolean\u001b[0m = \u001b[32mtrue\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "println(c.toJson)\n",
+    "c.save(\"symbol-c.json\")\n",
+    "val c2 = Symbol.load(\"symbol-c.json\")\n",
+    "c.toJson == c2.toJson"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Customized Symbol *\n",
+    "Most operators such as Symbol.Convolution and Symbol.Reshape are implemented in C++ for better performance. MXNet also allows users to write new operators using any frontend language such as Python/Scala. It often makes the developing and debugging much easier.\n",
+    "\n",
+    "To implement an operator in Python, we just need to define the two computation methods forward and backward with several methods for querying the properties, such as listArguments and inferShape.\n",
+    "\n",
+    "NDArray is the default type of arguments in both forward and backward. Therefore we often also implement the computation with  NDArray operations. \n",
+    "\n",
+    "We first create a subclass of Operator.CustomOp and then define forward and backward."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mclass \u001b[36mSoftmax\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "  class Softmax(_param: Map[String, String]) extends CustomOp {\n",
+    "\n",
+    "    override def forward(sTrain: Boolean, req: Array[String],\n",
+    "      inData: Array[NDArray], outData: Array[NDArray], aux: Array[NDArray]): Unit = {\n",
+    "      val xShape = inData(0).shape\n",
+    "      val x = inData(0).toArray.grouped(xShape(1)).toArray\n",
+    "      val yArr = x.map { it =>\n",
+    "        val max = it.max\n",
+    "        val tmp = it.map(e => Math.exp(e.toDouble - max).toFloat)\n",
+    "        val sum = tmp.sum\n",
+    "        tmp.map(_ / sum)\n",
+    "      }.flatten\n",
+    "      val y = NDArray.empty(xShape, outData(0).context)\n",
+    "      y.set(yArr)\n",
+    "      this.assign(outData(0), req(0), y)\n",
+    "      y.dispose()\n",
+    "    }\n",
+    "\n",
+    "    override def backward(req: Array[String], outGrad: Array[NDArray],\n",
+    "      inData: Array[NDArray], outData: Array[NDArray],\n",
+    "      inGrad: Array[NDArray], aux: Array[NDArray]): Unit = {\n",
+    "      val l = inData(1).toArray.map(_.toInt)\n",
+    "      val oShape = outData(0).shape\n",
+    "      val yArr = outData(0).toArray.grouped(oShape(1)).toArray\n",
+    "      l.indices.foreach { i =>\n",
+    "        yArr(i)(l(i)) -= 1.0f\n",
+    "      }\n",
+    "      val y = NDArray.empty(oShape, inGrad(0).context)\n",
+    "      y.set(yArr.flatten)\n",
+    "      this.assign(inGrad(0), req(0), y)\n",
+    "      y.dispose()\n",
+    "    }\n",
+    "  }"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here we use CustomOp.assign to assign the results to mxnet.NDArray based on the value of req, which could be \"over write\" or \"add to\".\n",
+    "Next we create a subclass of Operator.CustomOpProp for querying the properties."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mclass \u001b[36mSoftmaxProp\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    " class SoftmaxProp(needTopGrad: Boolean = false)\n",
+    "    extends CustomOpProp(needTopGrad) {\n",
+    "\n",
+    "    override def listArguments(): Array[String] = Array(\"data\", \"label\")\n",
+    "\n",
+    "    override def listOutputs(): Array[String] = Array(\"output\")\n",
+    "\n",
+    "    override def inferShape(inShape: Array[Shape]):\n",
+    "      (Array[Shape], Array[Shape], Array[Shape]) = {\n",
+    "      val dataShape = inShape(0)\n",
+    "      val labelShape = Shape(dataShape(0))\n",
+    "      val outputShape = dataShape\n",
+    "      (Array(dataShape, labelShape), Array(outputShape), null)\n",
+    "    }\n",
+    "\n",
+    "    override def createOperator(ctx: String, inShapes: Array[Array[Int]],\n",
+    "      inDtypes: Array[Int]): CustomOp = new Softmax(this.kwargs)\n",
+    "  }"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, we can use Symbol.Custom with the register name to use this operator\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```scala\n",
+    "val mlp = Symbol.Custom(\"softmax\")()(Map(\"data\" -> fc3,\n",
+    "        \"label\" -> label, \"op_type\" -> \"softmax\"))\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Advanced Usages *\n",
+    "### Type Cast\n",
+    "MXNet uses 32-bit float in default. Sometimes we want to use a lower precision data type for better accuracy-performance trade-off. For example, The Nvidia Tesla Pascal GPUs (e.g. P100) have improved 16-bit float performance, while GTX Pascal GPUs (e.g. GTX 1080) are fast on 8-bit integers.\n",
+    "\n",
+    "We can use the Symbol.Cast operator to convert the data type."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "(ListBuffer(Float32),ListBuffer(Float16))\n",
+      "(ListBuffer(Int32),ListBuffer(UInt8))"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@550368a3\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@63cbf626\n",
+       "\u001b[36margb\u001b[0m: \u001b[32mSeq\u001b[0m[\u001b[32mDType\u001b[0m.\u001b[32mDType\u001b[0m] = \u001b[33mListBuffer\u001b[0m(Float32)\n",
+       "\u001b[36moutb\u001b[0m: \u001b[32mSeq\u001b[0m[\u001b[32mDType\u001b[0m.\u001b[32mDType\u001b[0m] = \u001b[33mListBuffer\u001b[0m(Float16)\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@182f0324\n",
+       "\u001b[36margc\u001b[0m: \u001b[32mSeq\u001b[0m[\u001b[32mDType\u001b[0m.\u001b[32mDType\u001b[0m] = \u001b[33mListBuffer\u001b[0m(Int32)\n",
+       "\u001b[36moutc\u001b[0m: \u001b[32mSeq\u001b[0m[\u001b[32mDType\u001b[0m.\u001b[32mDType\u001b[0m] = \u001b[33mListBuffer\u001b[0m(UInt8)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = Symbol.Variable(\"data\")\n",
+    "val b = Symbol.Cast()()(Map(\"data\" -> a, \"dtype\" -> \"float16\"))\n",
+    "val (argb, outb, _) = b.inferType(Map(\"data\" -> DType.Float32))\n",
+    "println(argb, outb)\n",
+    "\n",
+    "val c = Symbol.Cast()()(Map(\"data\" -> a, \"dtype\" -> \"uint8\"))\n",
+    "val (argc, outc, _) = c.inferType(Map(\"data\" -> DType.Int32))\n",
+    "print(argc, outc)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Variable Sharing\n",
+    "Sometimes we want to share the contents between several symbols. This can be simply done by bind these symbols with the same array."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36ma\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@74892999\n",
+       "\u001b[36mb\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@33cf54a9\n",
+       "\u001b[36mc\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@52dcc5fe\n",
+       "\u001b[36md\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@3f5139ac\n",
+       "\u001b[36mdata\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@8b9a99a8\n",
+       "\u001b[36mex\u001b[0m: \u001b[32mExecutor\u001b[0m = ml.dmlc.mxnet.Executor@1b2182b8\n",
+       "\u001b[36mres19_7\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m6.0F\u001b[0m, \u001b[32m6.0F\u001b[0m, \u001b[32m6.0F\u001b[0m, \u001b[32m6.0F\u001b[0m, \u001b[32m6.0F\u001b[0m, \u001b[32m6.0F\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val a = Symbol.Variable(\"a\")\n",
+    "val b = Symbol.Variable(\"b\")\n",
+    "val c = Symbol.Variable(\"c\")\n",
+    "val d = a + b * c\n",
+    "\n",
+    "val data = NDArray.ones(2,3)*2\n",
+    "val ex = d.bind(ctx=Context.cpu(), args=Map(\"a\" -> data, \"b\" -> data, \"c\" -> data))\n",
+    "ex.forward()\n",
+    "ex.outputs(0).toArray"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Further Readings\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "- [NDArray API](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.NDArray)\n",
+    "- [Symbol API](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.Symbol)\n",
+    "- [Visualization API](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.Visualization$)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/tutorials/linear_regression_scala.ipynb b/scala/tutorials/linear_regression_scala.ipynb
new file mode 100644
index 000000000..4cd816a16
--- /dev/null
+++ b/scala/tutorials/linear_regression_scala.ipynb
@@ -0,0 +1,489 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# MXNet Basics - Linear Regression using MXNet"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Import necessary packages as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.io.{NDArrayIter}\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.module.{FitParams, Module}\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.optimizer.SGD\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.Callback.Speedometer\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import ml.dmlc.mxnet.io.{NDArrayIter}\n",
+    "import ml.dmlc.mxnet.module.{FitParams, Module}\n",
+    "import ml.dmlc.mxnet.optimizer.SGD\n",
+    "import ml.dmlc.mxnet.Callback.Speedometer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prepare Data\n",
+    "\n",
+    "MXNet uses data in the form of **Data Iterators**. The code below illustrates how to encode a dataset into an iterator that MXNet can use. The data used in the example is made up of 2d data points with corresponding integer labels. The function we are trying to learn is:\n",
+    "\n",
+    " y = x<sub>1</sub>  +  2x<sub>2</sub> ,\n",
+    " \n",
+    " where (x<sub>1</sub>,x<sub>2</sub>) is one training data point and y is the corresponding label. \n",
+    "\n",
+    "e.g. First label 5 is generated as follows:\n",
+    "\n",
+    "5 = 1 + 2*2 (where x1 = 1, x2=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mtrainData\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@57e47ffa)\n",
+       "\u001b[36mtrainLabel\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@a040dbc2)\n",
+       "\u001b[36mbatchSize\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m1\u001b[0m\n",
+       "\u001b[36mevalData\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@ed4b006d)\n",
+       "\u001b[36mevalLabel\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@f8bc2cd5)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "//Training data\n",
+    "val trainData = IndexedSeq(NDArray.array(Array(1, 2, 3, 4, 5, 6, 3, 2, 7, 1, 6, 9), shape = Shape(6, 1, 2)))\n",
+    "val trainLabel = IndexedSeq(NDArray.array(Array(5, 11, 17, 7, 9, 24), shape = Shape(6)))\n",
+    "val batchSize = 1\n",
+    "\n",
+    "//Evaluation Data\n",
+    "val evalData = IndexedSeq(NDArray.array(Array(7, 2, 6, 10, 12, 2), shape = Shape(3, 1, 2)))\n",
+    "val evalLabel = IndexedSeq(NDArray.array(Array(11, 26, 16), shape = Shape(3)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once we have the data ready, we need to put it into an iterator and specify parameters such as the 'batch_size', and 'shuffle' which will determine the size of data the iterator feeds during each pass, and whether or not the data will be shuffled respectively."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mtrainIter\u001b[0m: \u001b[32mNDArrayIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mevalIter\u001b[0m: \u001b[32mNDArrayIter\u001b[0m = non-empty iterator"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val trainIter = new NDArrayIter(trainData, trainLabel, batchSize, false, \"pad\")\n",
+    "val evalIter = new NDArrayIter(evalData, evalLabel, batchSize, false, \"pad\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the above example, we have made use of NDArrayIter, which is used to iterate over numpy arrays. In general, there are many different types of iterators in MXNet based on the type of data you will be using. Their complete documentation can be found at [Scala API](http://mxnet.io/api/scala/docs/index.html#package)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## MXNet Classes\n",
+    "\n",
+    "1. [Model Class](http://mxnet.io/api/scala/model.html): The model class in MXNet is used to define the overall entity of the model. It contains the variable we want to minimize, the training data and labels, and some additional parameters such as the learning rate and optimization algorithm are defined at the model level.\n",
+    "\n",
+    "2. [Module Class](http://mxnet.io/api/scala/module.html): The module class provides an intermediate and high-level interface for performing computation with neural networks in MXNet.\n",
+    "\n",
+    "3. [Symbols](http://mxnet.io/api/scala/symbol.html): The actual MXNet network is defined using symbols. MXNet has different types of symbols, including data placeholders, neural network layers, and loss function symbols based on our requirement.\n",
+    "\n",
+    "4. [IO](http://mxnet.io/api/scala/io.html): The IO class as we already saw works on the data, and carries out operations like breaking the data into batches and shuffling it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Defining the Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "MXNet uses **Symbols** for defining a model. [Symbols](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.Symbol) are the building blocks of the model and compose various components of the model. Some of the parts symbols are used to define are:\n",
+    "1. Variables: A variable is a placeholder for future data. This symbol is used to define a spot which will be filled with training data/labels in the future when we are trying to train the model.\n",
+    "2. Neural Network Layers: The layers of a network or any other type of model are also defined by Symbols. Such a *symbol* takes one of the previous symbols as its input, does some transformation on them, and creates an output. One such example is the \"Fully Connected\" symbol which specifies a fully connected layer of a network. \n",
+    "3. Output Symbols: Output symbols are MXNet's way of defining a loss. They are suffixed with the work \"Output\" (eg. the SoftmaxOutput layer\" . You can also create your [own loss](https://github.com/dmlc/mxnet/blob/5b6a0eeee174f28ff0272d17748513ecd52a9ebe/docs/tutorials/r/CustomLossFunction.md#how-to-use-your-own-loss-function). Some examples of existing losses are: LinearRegressionOutput, which computes the l2-loss between it's input symbol and the actual labels provided to it, SoftmaxOutput, which computs the categorical cross-entropy. \n",
+    "\n",
+    "The ones described above, and other symbols are chained one after the other, servng as input to one another to create the network topology. More information about the different types of symbols can be found [here](http://mxnet.io/api/scala/symbol.html)\n",
+    "    \n",
+    "    \n",
+    "   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdata\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@78525c12\n",
+       "\u001b[36mlabel\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@4e85112e\n",
+       "\u001b[36mfc1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@3c61948a\n",
+       "\u001b[36msoftmax\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@5ecd7ee4\n",
+       "\u001b[36mres4_4\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mString\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\u001b[32m\"data\"\u001b[0m, \u001b[32m\"fc1_weight\"\u001b[0m, \u001b[32m\"fc1_bias\"\u001b[0m, \u001b[32m\"label\"\u001b[0m)"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val data = Symbol.Variable(\"data\")\n",
+    "val label = Symbol.Variable(\"label\")\n",
+    "val fc1  = Symbol.FullyConnected(\"fc1\")()(Map(\"data\" -> data, \"num_hidden\" -> 1))\n",
+    "val softmax = Symbol.LinearRegressionOutput()()(Map(\"data\" -> fc1, \"label\" -> label))\n",
+    "softmax.listArguments()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The above network uses the following layers:\n",
+    "\n",
+    "1. FullyConnected: The fully connected symbol represents a fully connected layer of a neural network (without any activation being applied), which in essence, is just a linear regression on the input attributes. It takes the following parameters:\n",
+    "            a. data: Input to the layer (specify the symbol whose output should be fed here)\n",
+    "            b. num_hidden: Number of hidden dimension which specifies the size of the output of the layer\n",
+    "    \n",
+    "    \n",
+    "2. Linear Regression Output: Output layers in MXNet aim at implementing a loss. In our example, the Linear Regression Output layer is used which specifies that an l2 loss needs to be applied against it's input and the actual labels provided to this layer. The parameters to this layer are:\n",
+    "            a. data: Input to this layer (specify the symbol whose output should be fed here)\n",
+    "            b. Label: The training label against whom we will compare the input to the layer for calculation of l2 loss"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Note - *Naming Convention*: the label variable's name should be the same as the label_name parameter passed to your training data iterator. The default value of this is \"softmax_label\", but we have updated it to label in this tutorial as you can see in val label = Symbol.Variable(\"label\")**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, the network is stored into a *Model*, where you define the symbol who's value is to be minimised (in our case, softmax\"), the learning rate to be used while optimization and the number of epochs we want to train our model on."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can plot the network we have created in order to visualize it and save it by specifying \"path\" in `dot.render()`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdot\u001b[0m: \u001b[32mVisualization\u001b[0m.\u001b[32mDot\u001b[0m = ml.dmlc.mxnet.Visualization$Dot@72a23b8"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val dot = Visualization.plotNetwork(symbol=softmax, nodeAttrs = Map(\"shape\" -> \"oval\", \"fixedsize\" -> \"false\") )\n",
+    "dot.render(engine = \"dot\", fileName = \"linearRegression\", path = \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Training the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once we have defined the model structure, the next step is to train the parameters of the model to fit the training data. This is done by using the **fit()** function of the **Module** class."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mmod\u001b[0m: \u001b[32mModule\u001b[0m = ml.dmlc.mxnet.module.Module@76f4d37f"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val mod = new Module(softmax, labelNames = IndexedSeq(\"label\"))\n",
+    "\n",
+    "mod.fit(trainData = trainIter, evalData = scala.Option(evalIter), numEpoch = 1000, fitParams = new FitParams()\n",
+    "    .setOptimizer(new SGD(learningRate = 0.01f, momentum = 0.9f, wd = 0.0001f)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively, you can also use [FeedForward network](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.FeedForward) and use [Model API](http://mxnet.io/api/scala/docs/index.html#ml.dmlc.mxnet.Model) of MXNet to build the model instead of Module. This can be done as follows:\n",
+    "\n",
+    "```scala\n",
+    "    val model = new FeedForward(symbol = softmax, ctx = Context.cpu(0), numEpoch = 1000, optimizer = new SGD(learningRate = 0.01f, momentum = 0.9f, wd = 0.0001f))\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Using a trained model: (Testing and Inference) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once we have a trained model, we can do multiple things on it. We can use it for inference, we can evaluate the trained model on test data. This is shown below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mprobArrays\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\n",
+       "  ml.dmlc.mxnet.NDArray@85887f13,\n",
+       "  ml.dmlc.mxnet.NDArray@c68f7cef,\n",
+       "  ml.dmlc.mxnet.NDArray@d30a2eee\n",
+       ")\n",
+       "\u001b[36mprob1\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m11.000008F\u001b[0m)\n",
+       "\u001b[36mprob2\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m25.999908F\u001b[0m)\n",
+       "\u001b[36mprob3\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mFloat\u001b[0m] = \u001b[33mArray\u001b[0m(\u001b[32m15.999969F\u001b[0m)\n",
+       "\u001b[36mname\u001b[0m: \u001b[32mString\u001b[0m = \u001b[32m\"mse\"\u001b[0m\n",
+       "\u001b[36mvalue\u001b[0m: \u001b[32mFloat\u001b[0m = \u001b[32m3.1435168E-9F\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val probArrays  = mod.predict(evalIter)\n",
+    "\n",
+    "val prob1 = probArrays(0).toArray\n",
+    "val prob2 = probArrays(1).toArray\n",
+    "val prob3 = probArrays(2).toArray\n",
+    "\n",
+    "val (name, value) = mod.score(evalIter, new MSE()).get\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "We can also evaluate our model for some metric. In this example, we are evaulating our model's mean squared error on the evaluation data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let us try to add some noise to the evaluation data and see how the MSE changes\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mevalData\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@d5cb35ca)\n",
+       "\u001b[36mevalLabel\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mVector\u001b[0m(ml.dmlc.mxnet.NDArray@1319931b)\n",
+       "\u001b[36mevalIter\u001b[0m: \u001b[32mNDArrayIter\u001b[0m = non-empty iterator"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "//Evaluation Data\n",
+    "val evalData = IndexedSeq(NDArray.array(Array(7, 2, 6, 10, 12, 2), shape = Shape(3, 1, 2)))\n",
+    "val evalLabel = IndexedSeq(NDArray.array(Array(11.1f, 26.1f, 16.1f), shape = Shape(3))) //#Adding 0.1 to each of the values \n",
+    "\n",
+    "val evalIter = new NDArrayIter(evalData, evalLabel, batchSize, false, \"pad\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mname\u001b[0m: \u001b[32mString\u001b[0m = \u001b[32m\"mse\"\u001b[0m\n",
+       "\u001b[36mvalue\u001b[0m: \u001b[32mFloat\u001b[0m = \u001b[32m0.010007773F\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val (name, value) = mod.score(evalIter, new MSE()).get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, you can create your own metrics and use it to evaluate your model. More information on metrics [here](http://mxnet-test.readthedocs.io/en/latest/api/metric.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/tutorials/matrix_factorization.ipynb b/scala/tutorials/matrix_factorization.ipynb
new file mode 100644
index 000000000..322e04f1f
--- /dev/null
+++ b/scala/tutorials/matrix_factorization.ipynb
@@ -0,0 +1,556 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Matrix Factorization\n",
+    "In a recommendation system, there is a group of users and a set of items. Given that each users have rated some items in the system, we would like to predict how the users would rate the items that they have not yet rated, such that we can make recommendations to the users.\n",
+    "\n",
+    "Matrix factorization is one of the mainly used algorithm in recommendation systems. It can be used to discover latent features underlying the interactions between two different kinds of entities.\n",
+    "\n",
+    "Assume we assign a k-dimensional vector to each user and a k-dimensional vector to each item such that the dot product of these two vectors gives the user's rating of that item. We can learn the user and item vectors directly, which is essentially performing SVD on the user-item matrix. We can also try to learn the latent features using multi-layer neural networks.\n",
+    "\n",
+    "In this tutorial, we will work though the steps to implement these ideas in MXNet.\n",
+    "\n",
+    "## Prepare Data\n",
+    "\n",
+    "We use the [MovieLens](https://grouplens.org/datasets/movielens/) data here, but it can apply to other datasets as well. Each row of this dataset contains a tuple of user id, movie id, rating, and time stamp, we will only use the first three items. We first define the a batch which contains n tuples. It also provides name and shape information to MXNet about the data and label."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.util.Random\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.io.Source\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.collection.immutable.ListMap\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.collection.mutable.ArrayBuffer\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.collection.mutable\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.optimizer.SGD\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.Callback.Speedometer\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.{DataBatch, DataIter, NDArray, Shape}\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import scala.util.Random\n",
+    "import scala.io.Source\n",
+    "import scala.collection.immutable.ListMap\n",
+    "import scala.collection.mutable.ArrayBuffer\n",
+    "import scala.collection.mutable\n",
+    "import ml.dmlc.mxnet.optimizer.SGD\n",
+    "import ml.dmlc.mxnet.Callback.Speedometer\n",
+    "import ml.dmlc.mxnet.{DataBatch, DataIter, NDArray, Shape}\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then we define a data iterator, which returns a batch of tuples each time."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mclass \u001b[36mBDataIter\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "class BDataIter(filename: String, batch_size: Int) extends DataIter {\n",
+    "\n",
+    "  val data = ArrayBuffer[(Float, Float, Float)]()\n",
+    "  for (line <- Source.fromFile(filename).getLines()) {\n",
+    "    val arr = line.split(\"\\t\").map(_.trim)\n",
+    "    if(arr.length == 4){\n",
+    "      data += ((arr(0).toFloat, arr(1).toFloat, arr(2).toFloat))\n",
+    "    }\n",
+    "  }\n",
+    "\n",
+    "  val _provideData = ListMap(\"user\" -> Shape(batch_size), \"item\" -> Shape(batch_size))\n",
+    "  val _provideLabel = ListMap(\"score\" -> Shape(batch_size))\n",
+    "\n",
+    "  private var k = 0\n",
+    "\n",
+    "  override def next(): DataBatch = {\n",
+    "    if (!hasNext) throw new NoSuchElementException\n",
+    "    val users = ArrayBuffer[Float]()\n",
+    "    val items = ArrayBuffer[Float]()\n",
+    "    val scores = ArrayBuffer[Float]()\n",
+    "    for (i <- 0 to batch_size-1){\n",
+    "      val j = k * batch_size + i\n",
+    "      val (user, item, score) = data(j)\n",
+    "      users += user\n",
+    "      items += item\n",
+    "      scores += score\n",
+    "    }\n",
+    "    k +=1\n",
+    "    val data_all = Array(NDArray.array(users.toArray, shape = Shape(batch_size)),\n",
+    "      NDArray.array(items.toArray, shape = Shape(batch_size)))\n",
+    "    val label_all  = Array(NDArray.array(scores.toArray, shape = Shape(batch_size)))\n",
+    "\n",
+    "    val data_names = Array(\"user\", \"item\")\n",
+    "    val label_names = Array(\"score\")\n",
+    "\n",
+    "    new DataBatch(data=data_all,label=label_all, index=getIndex(), pad=getPad(), providedData=_provideData, providedLabel=_provideLabel)\n",
+    "  }\n",
+    "\n",
+    "  /**\n",
+    "    * reset the iterator\n",
+    "    */\n",
+    "  override def reset(): Unit = {\n",
+    "    scala.util.Random.shuffle(data)\n",
+    "    k = 0\n",
+    "  }\n",
+    "\n",
+    "  override def hasNext: Boolean = {\n",
+    "    k < (data.length / batch_size)\n",
+    "  }\n",
+    "\n",
+    "  override def batchSize: Int = batch_size\n",
+    "\n",
+    "  override def getData(): IndexedSeq[NDArray] = IndexedSeq()\n",
+    "\n",
+    "  override def getIndex(): IndexedSeq[Long] = IndexedSeq[Long]()\n",
+    "\n",
+    "  override def getLabel(): IndexedSeq[NDArray] = IndexedSeq()\n",
+    "\n",
+    "  override def getPad(): Int = 0\n",
+    "\n",
+    "  override def provideData: ListMap[String, Shape] = _provideData\n",
+    "\n",
+    "  override def provideLabel: ListMap[String, Shape] = _provideLabel\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we provide a function to obtain the data iterator:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mgetDataIter\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def getDataIter(batchSize: Int) ={\n",
+    "    val (dataTrain, dataTest) = (new BDataIter(\"/Users/roshanin/ml-100k/u1.base\", batchSize), new BDataIter(\"/Users/roshanin/ml-100k/u1.test\", batchSize))\n",
+    "    (dataTrain, dataTest)\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally we calculate the numbers of users and items for later use."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mmaxId\u001b[0m\n",
+       "\u001b[36mmu\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m944\u001b[0m\n",
+       "\u001b[36mmi\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m1683\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def maxId(fname: String) ={\n",
+    "    var mu = 0\n",
+    "    var mi =0\n",
+    "    for (line <- Source.fromFile(fname).getLines()) {\n",
+    "        val arr = line.split(\"\\t\").map(_.trim)\n",
+    "        \n",
+    "        if(arr.length == 4){\n",
+    "            mu = mu max arr(0).toInt  \n",
+    "            mi = mi max arr(1).toInt\n",
+    "        }\n",
+    "    }\n",
+    "    (mu+1, mi+1)\n",
+    "}\n",
+    "\n",
+    "val (mu, mi) = maxId(\"/Users/roshanin/ml-100k/u.data\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Optimization\n",
+    "We first implement the RMSE (root-mean-square error) measurement, which is commonly used by matrix factorization."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mRMSE\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def RMSE(label: NDArray, pred: NDArray): Float={\n",
+    "    val labelArr = label.toArray\n",
+    "    val predArr = pred.toArray\n",
+    "\n",
+    "    var ret: Float = 0.0f\n",
+    "    var n: Float = 0.0f\n",
+    "    \n",
+    "    for(i <- 0 to labelArr.length-1){\n",
+    "        ret += (labelArr(i) - predArr(i)) * (labelArr(i) - predArr(i))\n",
+    "        n += 1.0f\n",
+    "    }\n",
+    "    Math.sqrt(ret/n).asInstanceOf[Float]    \n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then we define a general training module, which is borrowed from the image classification application."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mtrain\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def train(network: Symbol, batchSize: Int, numEpoch: Int, learningRate: Float) {\n",
+    "    var batchSize = 64\n",
+    "    val (trainIter, testIter) = getDataIter(batchSize)\n",
+    "    val evalMetric = new CustomMetric(RMSE, name = \"rmse\")\n",
+    "    \n",
+    "//     val model = FeedForward.newBuilder(network)\n",
+    "//       .setContext(Context.cpu(0))\n",
+    "//       .setNumEpoch(numEpoch)\n",
+    "//       .setOptimizer(new SGD(learningRate = learningRate, momentum = 0.9f, wd = 0.0001f))\n",
+    "//       .setTrainData(trainIter)\n",
+    "//       .setEvalMetric(evalMetric)\n",
+    "//       .setEvalData(testIter)\n",
+    "//       .setBatchEndCallback(new Speedometer(batchSize, 20000/batchSize))\n",
+    "//       .build()\n",
+    "    \n",
+    "    val model = new FeedForward(ctx = Context.gpu(0),\n",
+    "                                symbol = network,\n",
+    "                                numEpoch = numEpoch,\n",
+    "                                optimizer = new SGD(learningRate = learningRate, momentum = 0.9f, wd = 0.0001f))\n",
+    "    \n",
+    "\n",
+    "      model.fit(trainData = trainIter,\n",
+    "                evalData = testIter,\n",
+    "                evalMetric = evalMetric,\n",
+    "                kvStore = null,\n",
+    "                batchEndCallback = new Speedometer(batchSize, 20000/batchSize),\n",
+    "               epochEndCallback = null) \n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Networks\n",
+    "Now we try various networks. We first learn the latent vectors directly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mplainNet\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def plainNet(k: Int) ={\n",
+    "    // input\n",
+    "    val user = Symbol.Variable(\"user\")\n",
+    "    val item = Symbol.Variable(\"item\")\n",
+    "    val score = Symbol.Variable(\"score\")\n",
+    "    // user feature lookup\n",
+    "    val user1 = Symbol.Embedding()()(Map(\"data\" -> user, \"input_dim\" -> mu,\n",
+    "                                        \"output_dim\" -> k))\n",
+    "    // item feature lookup\n",
+    "    val item1 = Symbol.Embedding()()(Map(\"data\" -> item, \"input_dim\" -> mi,\n",
+    "                                        \"output_dim\" -> k))\n",
+    " \n",
+    "    // predict by the inner product, which is elementwise product and then sum\n",
+    "    \n",
+    "    val pred0 = user1 * item1\n",
+    "    \n",
+    "    val pred1 = Symbol.sum_axis()()(Map(\"data\" -> pred0, \"axis\" -> 1))\n",
+    "    val pred2 = Symbol.Flatten()()(Map(\"data\" -> pred1))\n",
+    "    // loss layer\n",
+    "    val pred = Symbol.LinearRegressionOutput()()(Map(\"data\" -> pred2, \"label\" -> score))\n",
+    "    \n",
+    "    pred\n",
+    "}\n",
+    "\n",
+    "train(plainNet(64), batchSize=64, numEpoch=10, learningRate=.05f)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next we try to use 2 layers neural network to learn the latent variables, which stack a fully connected layer above the embedding layers:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mgetOneLayerMlp\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def getOneLayerMlp(hidden: Int, k: Int) ={\n",
+    "    // input\n",
+    "    val user = Symbol.Variable(\"user\")\n",
+    "    val item = Symbol.Variable(\"item\")\n",
+    "    val score = Symbol.Variable(\"score\")\n",
+    "    // user feature lookup\n",
+    "    val user1 = Symbol.Embedding()()(Map(\"data\" -> user, \"input_dim\" -> mu,\n",
+    "                                        \"output_dim\" -> k))\n",
+    "    val user2 = Symbol.Activation()()(Map(\"data\" -> user1, \"act_type\" -> \"relu\"))\n",
+    "    val user3 = Symbol.FullyConnected()()(Map(\"data\" -> user2, \"num_hidden\" -> hidden))\n",
+    "                          \n",
+    "    // item feature lookup\n",
+    "    val item1 = Symbol.Embedding()()(Map(\"data\" -> item, \"input_dim\" -> mi,\n",
+    "                                        \"output_dim\" -> k))\n",
+    "    val item2 = Symbol.Activation()()(Map(\"data\" -> item1, \"act_type\" -> \"relu\"))\n",
+    "    val item3 = Symbol.FullyConnected()()(Map(\"data\" -> item2, \"num_hidden\" -> hidden))\n",
+    " \n",
+    "    // predict by the inner product\n",
+    "    \n",
+    "    val pred0 = user3 * item3\n",
+    "    \n",
+    "    val pred1 = Symbol.sum_axis()()(Map(\"data\" -> pred0, \"axis\" -> 1))\n",
+    "    val pred2 = Symbol.Flatten()()(Map(\"data\" -> pred1))\n",
+    "                          \n",
+    "    // loss layer\n",
+    "    val pred = Symbol.LinearRegressionOutput()()(Map(\"data\" -> pred2, \"label\" -> score))\n",
+    "    pred\n",
+    "    \n",
+    "}\n",
+    "\n",
+    "train(getOneLayerMlp(64,64), batchSize=64, numEpoch=10, learningRate=.005f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Adding dropout layers to relief the over-fitting."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mgetOneLayerDropoutMlp\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def getOneLayerDropoutMlp(hidden: Int, k: Int) ={\n",
+    "    // input\n",
+    "    val user = Symbol.Variable(\"user\")\n",
+    "    val item = Symbol.Variable(\"item\")\n",
+    "    val score = Symbol.Variable(\"score\")\n",
+    "    // user feature lookup\n",
+    "    val user1 = Symbol.Embedding(\"user\")()(Map(\"data\" -> user, \"input_dim\" -> mu,\n",
+    "                                        \"output_dim\" -> k))\n",
+    "    val user2 = Symbol.Activation()()(Map(\"data\" -> user1, \"act_type\" -> \"relu\"))\n",
+    "    val user3 = Symbol.FullyConnected()()(Map(\"data\" -> user2, \"num_hidden\" -> hidden))\n",
+    "    val user4 = Symbol.Dropout()()(Map(\"data\" -> user3, \"p\" -> 0.5f))\n",
+    "                          \n",
+    "    // item feature lookup\n",
+    "    val item1 = Symbol.Embedding(\"item\")()(Map(\"data\" -> item, \"input_dim\" -> mi,\n",
+    "                                        \"output_dim\" -> k))\n",
+    "    val item2 = Symbol.Activation()()(Map(\"data\" -> item1, \"act_type\" -> \"relu\"))\n",
+    "    val item3 = Symbol.FullyConnected()()(Map(\"data\" -> item2, \"num_hidden\" -> hidden))\n",
+    "    val item4 = Symbol.Dropout()()(Map(\"data\" -> item3, \"p\" -> 0.5f))\n",
+    "\n",
+    "    // predict by the inner product\n",
+    "    \n",
+    "    val pred0 = user4 * item4\n",
+    "    \n",
+    "    val pred1 = Symbol.sum_axis()()(Map(\"data\" -> pred0, \"axis\" -> 1))\n",
+    "    val pred2 = Symbol.Flatten()()(Map(\"data\" -> pred1))\n",
+    "                          \n",
+    "    // loss layer\n",
+    "    val pred = Symbol.LinearRegressionOutput()()(Map(\"data\" -> pred2, \"label\" -> score))\n",
+    "    pred\n",
+    "    \n",
+    "}\n",
+    "                          \n",
+    "train(getOneLayerDropoutMlp(256, 512), batchSize=64, numEpoch=10, learningRate=.005f)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
diff --git a/scala/tutorials/mnist_scala.ipynb b/scala/tutorials/mnist_scala.ipynb
new file mode 100644
index 000000000..82f212215
--- /dev/null
+++ b/scala/tutorials/mnist_scala.ipynb
@@ -0,0 +1,437 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Tutorial on Hand Written Digit Recognition\n",
+    "\n",
+    "In this tutorial we will go through the basic use case of MXNet and also touch on some advanced usages. This example is based on the MNIST dataset, which contains 70,000 images of hand written characters with 28-by-28 pixel size.\n",
+    "\n",
+    "This tutorial covers the following topics:\n",
+    "- network definition.\n",
+    "- Variable naming.\n",
+    "- Basic data loading and training with feed-forward deep neural networks.\n",
+    "- Monitoring intermediate outputs for debuging.\n",
+    "- Custom training loop for advanced models.\n",
+    "\n",
+    "Let’s train a 3-layer multilayer perceptron on the MNIST dataset to classify handwritten digits. \n",
+    "\n",
+    "First, let's load mxnet scala jar in classpath."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": false
+   },
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "import required modules"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.optimizer.SGD\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.collection.mutable.ListBuffer\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import ml.dmlc.mxnet.optimizer.SGD\n",
+    "import scala.collection.mutable.ListBuffer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Network Definition\n",
+    "\n",
+    "Now, we can start constructing our network:\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mdata\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@2aec6c99\n",
+       "\u001b[36mfc1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@60bcbe48\n",
+       "\u001b[36mact1\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@60abab94\n",
+       "\u001b[36mfc2\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@2665643d\n",
+       "\u001b[36mact2\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@20b63589\n",
+       "\u001b[36mfc3\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@25fd8129\n",
+       "\u001b[36mmlp\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@18b77909"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// # Variables are place holders for input arrays. We give each variable a unique name.\n",
+    "val data = Symbol.Variable(\"data\")\n",
+    "\n",
+    "// The input is fed to a fully connected layer that computes Y=WX+b.\n",
+    "// This is the main computation module in the network.\n",
+    "// Each layer also needs an unique name. We'll talk more about naming in the next section.\n",
+    "val fc1 = Symbol.FullyConnected(name = \"fc1\")()(Map(\"data\" -> data, \"num_hidden\" -> 128))\n",
+    "\n",
+    "// Activation layers apply a non-linear function on the previous layer's output.\n",
+    "// Here we use Rectified Linear Unit (ReLU) that computes Y = max(X, 0).\n",
+    "val act1 = Symbol.Activation(name = \"relu1\")()(Map(\"data\" -> fc1, \"act_type\" -> \"relu\"))\n",
+    "\n",
+    "val fc2 = Symbol.FullyConnected(name = \"fc2\")()(Map(\"data\" -> act1, \"num_hidden\" -> 64))\n",
+    "val act2 = Symbol.Activation(name = \"relu2\")()(Map(\"data\" -> fc2, \"act_type\" -> \"relu\"))\n",
+    "val fc3 = Symbol.FullyConnected(name = \"fc3\")()(Map(\"data\" -> act2, \"num_hidden\" -> 10))\n",
+    "\n",
+    "// Finally we have a loss layer that compares the network's output with label and generates gradient signals.\n",
+    "val mlp = Symbol.SoftmaxOutput(name = \"softmax\")()(Map(\"data\" -> fc3))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Variable Naming\n",
+    "MXNet requires variable names to follow certain conventions:\n",
+    "- All input arrays have a name. This includes inputs (data & label) and model parameters (weight, bias, etc).\n",
+    "- Arrays can be renamed by creating named variable. Otherwise, a default name is given as 'SymbolName_ArrayName'. For example, FullyConnected symbol fc1's weight array is named as 'fc1_weight'.\n",
+    "- Although you can also rename weight arrays with variables, weight array's name should always end with '_weight' and bias array '_bias'. MXNet relies on the suffixes of array names to correctly initialize & update them.\n",
+    "Call listArguments method on a symbol to get the names of all its inputs:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres3\u001b[0m: \u001b[32mIndexedSeq\u001b[0m[\u001b[32mString\u001b[0m] = \u001b[33mArrayBuffer\u001b[0m(\n",
+       "  \u001b[32m\"data\"\u001b[0m,\n",
+       "  \u001b[32m\"fc1_weight\"\u001b[0m,\n",
+       "  \u001b[32m\"fc1_bias\"\u001b[0m,\n",
+       "  \u001b[32m\"fc2_weight\"\u001b[0m,\n",
+       "  \u001b[32m\"fc2_bias\"\u001b[0m,\n",
+       "  \u001b[32m\"fc3_weight\"\u001b[0m,\n",
+       "  \u001b[32m\"fc3_bias\"\u001b[0m,\n",
+       "  \u001b[32m\"softmax_label\"\u001b[0m\n",
+       ")"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "mlp.listArguments()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Data Loading\n",
+    "\n",
+    "We download the MNIST data using the [get_mnist_data script](https://github.com/dmlc/mxnet/blob/master/scala-package/core/scripts/get_mnist_data.sh). Now we can create data iterators from our MNIST data. A data iterator returns a batch of data examples each time for the network to process. MXNet provide a suite of basic DataIters for parsing different data format. \n",
+    "\n",
+    "Here we use MNISTIter, which wraps around a numpy array and each time slice a chunk from it along the first dimension. \n",
+    "Change path of input files according to your system.\n",
+    "Load the training and validation data using DataIterators as follows:\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mtrainDataIter\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator\n",
+       "\u001b[36mvalDataIter\u001b[0m: \u001b[32mDataIter\u001b[0m = non-empty iterator"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// load MNIST dataset\n",
+    "val trainDataIter = IO.MNISTIter(Map(\n",
+    "  \"image\" -> \"data/train-images-idx3-ubyte\",\n",
+    "  \"label\" -> \"data/train-labels-idx1-ubyte\",\n",
+    "  \"data_shape\" -> \"(1, 28, 28)\",\n",
+    "  \"label_name\" -> \"softmax_label\",\n",
+    "  \"batch_size\" -> \"50\",\n",
+    "  \"shuffle\" -> \"1\",\n",
+    "  \"flat\" -> \"0\",\n",
+    "  \"silent\" -> \"0\",\n",
+    "  \"seed\" -> \"10\"))\n",
+    "\n",
+    "val valDataIter = IO.MNISTIter(Map(\n",
+    "  \"image\" -> \"data/t10k-images-idx3-ubyte\",\n",
+    "  \"label\" -> \"data/t10k-labels-idx1-ubyte\",\n",
+    "  \"data_shape\" -> \"(1, 28, 28)\",\n",
+    "  \"label_name\" -> \"softmax_label\",\n",
+    "  \"batch_size\" -> \"50\",\n",
+    "  \"shuffle\" -> \"1\",\n",
+    "  \"flat\" -> \"0\", \"silent\" -> \"0\"))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Training\n",
+    "With the network and data source defined, we can finally start to train our model. We do this with MXNet's convenience wrapper for FeedForward builder.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mmodel\u001b[0m: \u001b[32mFeedForward\u001b[0m = ml.dmlc.mxnet.FeedForward@3d40f3b0"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// setup model and fit the training data\n",
+    "val model = FeedForward.newBuilder(mlp) // Use the network we just defined\n",
+    "      .setContext(Context.cpu()) // Run on CPU \n",
+    "      .setNumEpoch(10) // Train for 10 epochs\n",
+    "      .setOptimizer(new SGD(learningRate = 0.1f, momentum = 0.9f, wd = 0.0001f)) // Learning rate, \n",
+    "//Momentum and Weight decay for regularization\n",
+    "      .setTrainData(trainDataIter) // Training data set\n",
+    "      .setEvalData(valDataIter) // Testing data set. MXNet computes scores on test set every epoch\n",
+    "      .build()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Evaluation¶\n",
+    "\n",
+    "After the model is trained, we can evaluate it on a held out validation dataset and compare the predicted labels with the real labels."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mprobArrays\u001b[0m: \u001b[32mArray\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mArray\u001b[0m(ml.dmlc.mxnet.NDArray@7d18083e)\n",
+       "\u001b[36mprob\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@426c4e40\n",
+       "\u001b[36mlabels\u001b[0m: \u001b[32mListBuffer\u001b[0m[\u001b[32mNDArray\u001b[0m] = \u001b[33mListBuffer\u001b[0m(\n",
+       "  ml.dmlc.mxnet.NDArray@2accfeb0,\n",
+       "  ml.dmlc.mxnet.NDArray@5f74e322,\n",
+       "  ml.dmlc.mxnet.NDArray@63792053,\n",
+       "  ml.dmlc.mxnet.NDArray@3ed497c6,\n",
+       "  ml.dmlc.mxnet.NDArray@51274f83,\n",
+       "  ml.dmlc.mxnet.NDArray@4a0bed53,\n",
+       "  ml.dmlc.mxnet.NDArray@476a7696,\n",
+       "  ml.dmlc.mxnet.NDArray@3370c62e,\n",
+       "  ml.dmlc.mxnet.NDArray@32e012b4,\n",
+       "  ml.dmlc.mxnet.NDArray@d1db10e,\n",
+       "  ml.dmlc.mxnet.NDArray@2b949b3e,\n",
+       "  ml.dmlc.mxnet.NDArray@43f2a5f5,\n",
+       "  ml.dmlc.mxnet.NDArray@27a485dd,\n",
+       "  ml.dmlc.mxnet.NDArray@21c6cebf,\n",
+       "  ml.dmlc.mxnet.NDArray@30d3eb9f,\n",
+       "  ml.dmlc.mxnet.NDArray@1932251f,\n",
+       "  ml.dmlc.mxnet.NDArray@47b7adba,\n",
+       "  ml.dmlc.mxnet.NDArray@22e85214,\n",
+       "  ml.dmlc.mxnet.NDArray@1f29cb21,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36my\u001b[0m: \u001b[32mNDArray\u001b[0m = ml.dmlc.mxnet.NDArray@76d55afb\n",
+       "\u001b[36mpredictedY\u001b[0m: \u001b[32mNDArrayFuncReturn\u001b[0m = ml.dmlc.mxnet.NDArrayFuncReturn@4786ef4d"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val probArrays = model.predict(valDataIter)\n",
+    "// in this case, we do not have multiple outputs\n",
+    "require(probArrays.length == 1)\n",
+    "val prob = probArrays(0)\n",
+    "\n",
+    "// get real labels\n",
+    "valDataIter.reset()\n",
+    "val labels = ListBuffer.empty[NDArray]\n",
+    "while (valDataIter.hasNext) {\n",
+    "  val evalData = valDataIter.next()\n",
+    "  labels += evalData.label(0).copy()\n",
+    "}\n",
+    "val y = NDArray.concatenate(labels)\n",
+    "\n",
+    "// get predicted labels\n",
+    "val predictedY = NDArray.argmax_channel(prob)\n",
+    "require(y.shape == predictedY.shape)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can also evaluate the model's accuracy on the entire test set:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Final accuracy = 0.9609\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mnumCorrect\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m9609\u001b[0m\n",
+       "\u001b[36mnumTotal\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m10000\u001b[0m\n",
+       "\u001b[36macc\u001b[0m: \u001b[32mFloat\u001b[0m = \u001b[32m0.9609F\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "// calculate accuracy\n",
+    "var numCorrect = 0\n",
+    "var numTotal = 0\n",
+    "for ((labelElem, predElem) <- y.toArray zip predictedY.toArray) {\n",
+    "  if (labelElem == predElem) {\n",
+    "    numCorrect += 1\n",
+    "  }\n",
+    "  numTotal += 1\n",
+    "}\n",
+    "val acc = numCorrect.toFloat / numTotal\n",
+    "println(s\"Final accuracy = $acc\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next Steps¶\n",
+    "Check out more MXNet Scala resources below.\n",
+    "\n",
+    "[Scala API](http://mxnet.io/api/scala/)\n",
+    "\n",
+    "[More Scala Examples](https://github.com/dmlc/mxnet/tree/master/scala-package/examples/src/main/scala/ml/dmlc/mxnet/examples)\n",
+    "\n",
+    "[MXNet tutorials index](http://mxnet.io/tutorials/index.html)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/scala/tutorials/predict_imagenet_scala.ipynb b/scala/tutorials/predict_imagenet_scala.ipynb
new file mode 100644
index 000000000..936b54a9f
--- /dev/null
+++ b/scala/tutorials/predict_imagenet_scala.ipynb
@@ -0,0 +1,409 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "# Predict with pre-trained models\n",
+    "\n",
+    "This is a demo for predicting with a pre-trained model on the full imagenet dataset, which contains over 10 million images and 10 thousands classes. For a more detailed explanation, please refer to [predict.ipynb](https://github.com/dmlc/mxnet-notebooks/blob/master/python/how_to/predict.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Jupyter Scala kernel\n",
+    "Add mxnet scala jar which is created as a part of MXNet Scala package installation in classpath as follows:\n",
+    "\n",
+    "**Note**: Process to add this jar in your scala kernel classpath can differ according to the scala kernel you are using.\n",
+    "\n",
+    "We have used [jupyter-scala kernel](https://github.com/alexarchambault/jupyter-scala) for creating this notebook."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "```\n",
+    "classpath.addPath(<path_to_jar>)\n",
+    "\n",
+    "e.g\n",
+    "classpath.addPath(\"mxnet-full_2.11-osx-x86_64-cpu-0.1.2-SNAPSHOT.jar\")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "classpath.addPath(\"/Users/roshanin/mxnet/scala-package/assembly/osx-x86_64-cpu/target/mxnet-full_2.11-osx-x86_64-cpu-0.10.1-SNAPSHOT.jar\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Import necessary libraries:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet._\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.module.{FitParams, Module}\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mscala.collection.immutable.ListMap\u001b[0m\n",
+       "\u001b[32mimport \u001b[36msys.process._\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import ml.dmlc.mxnet._\n",
+    "import ml.dmlc.mxnet.module.{FitParams, Module}\n",
+    "import scala.collection.immutable.ListMap\n",
+    "import sys.process._"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Download the pretrained model as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "     0K .......... .......... .......... .......... .......... 22%  121K 1s\n",
+      "    50K .......... .......... .......... .......... .......... 45%  157K 1s\n",
+      "   100K .......... .......... .......... .......... .......... 68%  205K 0s\n",
+      "   150K .......... .......... .......... .......... .......... 91%  297K 0s\n",
+      "   200K .......... ........                                   100%  121K=1.3s"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mres2_0\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m\n",
+       "\u001b[36mres2_1\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m\n",
+       "\u001b[36mres2_2\u001b[0m: \u001b[32mInt\u001b[0m = \u001b[32m0\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "\"wget http://data.mxnet.io/models/imagenet-11k/resnet-152/resnet-152-symbol.json -P model/ -q --show-progress\"!\n",
+    "\n",
+    "\"wget http://data.mxnet.io/models/imagenet-11k/resnet-152/resnet-152-0000.params -P model/ -q --show-progress\"!\n",
+    "\n",
+    "\"wget http://data.mxnet.io/models/imagenet-11k/synset.txt -P model/ -q --show-progress\"!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Initialization\n",
+    "We first load the model into memory with loadCheckpoint. It returns the symbol (see [symbol_scala.ipynb](https://github.com/dmlc/mxnet-notebooks/blob/master/scala/basic/symbol_scala.ipynb)) definition of the neural network, and parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "log4j:WARN No appenders could be found for logger (MXNetJVM).\n",
+      "log4j:WARN Please initialize the log4j system properly.\n",
+      "log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mresnet\u001b[0m: \u001b[32mSymbol\u001b[0m = ml.dmlc.mxnet.Symbol@7ce32cb\n",
+       "\u001b[36margParamsResnet\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"stage3_unit14_bn2_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@c296fced,\n",
+       "  \u001b[32m\"stage3_unit17_conv1_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@75334891,\n",
+       "  \u001b[32m\"stage3_unit8_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@783b9ecd,\n",
+       "  \u001b[32m\"stage2_unit7_bn3_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@eeb61ac7,\n",
+       "  \u001b[32m\"stage1_unit3_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@b5c72854,\n",
+       "  \u001b[32m\"stage3_unit14_bn1_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@8c34f84f,\n",
+       "  \u001b[32m\"stage3_unit10_bn1_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@b2d0a2b8,\n",
+       "  \u001b[32m\"stage3_unit2_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@40533d09,\n",
+       "  \u001b[32m\"stage3_unit18_conv1_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@285d08cd,\n",
+       "  \u001b[32m\"stage3_unit20_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@53a87a2c,\n",
+       "  \u001b[32m\"stage3_unit29_bn3_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@957e96f2,\n",
+       "  \u001b[32m\"stage2_unit3_conv2_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@d08559ca,\n",
+       "  \u001b[32m\"stage3_unit25_bn2_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@c8bc6afe,\n",
+       "  \u001b[32m\"stage4_unit3_bn1_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@2d738385,\n",
+       "  \u001b[32m\"stage1_unit3_bn1_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@92d3313e,\n",
+       "  \u001b[32m\"stage3_unit1_conv3_weight\"\u001b[0m -> ml.dmlc.mxnet.NDArray@9e2ee99f,\n",
+       "  \u001b[32m\"stage2_unit1_bn3_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@b299bc79,\n",
+       "  \u001b[32m\"stage3_unit14_bn1_gamma\"\u001b[0m -> ml.dmlc.mxnet.NDArray@8666ccb1,\n",
+       "  \u001b[32m\"stage3_unit23_bn2_beta\"\u001b[0m -> ml.dmlc.mxnet.NDArray@bd6a94a2,\n",
+       "\u001b[33m...\u001b[0m\n",
+       "\u001b[36mauxParamsResnet\u001b[0m: \u001b[32mMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mNDArray\u001b[0m] = \u001b[33mMap\u001b[0m(\n",
+       "  \u001b[32m\"stage2_unit2_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@afe461e9,\n",
+       "  \u001b[32m\"stage3_unit21_bn2_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@7f1aa42b,\n",
+       "  \u001b[32m\"stage3_unit6_bn2_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@8daf19ce,\n",
+       "  \u001b[32m\"stage3_unit27_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@5596f71e,\n",
+       "  \u001b[32m\"stage3_unit36_bn1_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@5af9c364,\n",
+       "  \u001b[32m\"stage2_unit2_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@288a2825,\n",
+       "  \u001b[32m\"stage3_unit34_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@71f7a2f4,\n",
+       "  \u001b[32m\"stage3_unit32_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@77ef95b0,\n",
+       "  \u001b[32m\"stage3_unit19_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@be708ef0,\n",
+       "  \u001b[32m\"stage3_unit24_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@875dbdf3,\n",
+       "  \u001b[32m\"stage3_unit36_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@6a1d3ad4,\n",
+       "  \u001b[32m\"stage3_unit15_bn2_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@a18108c4,\n",
+       "  \u001b[32m\"stage3_unit1_bn2_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@74ea2c0c,\n",
+       "  \u001b[32m\"stage1_unit3_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@ce018bfd,\n",
+       "  \u001b[32m\"stage3_unit16_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@59b2b0ed,\n",
+       "  \u001b[32m\"stage3_unit14_bn1_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@77839f79,\n",
+       "  \u001b[32m\"stage3_unit15_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@83704fc2,\n",
+       "  \u001b[32m\"stage2_unit4_bn3_moving_mean\"\u001b[0m -> ml.dmlc.mxnet.NDArray@c9cd7bfe,\n",
+       "  \u001b[32m\"stage2_unit4_bn3_moving_var\"\u001b[0m -> ml.dmlc.mxnet.NDArray@d96fd59e,\n",
+       "\u001b[33m...\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val (resnet, argParamsResnet, auxParamsResnet) = Model.loadCheckpoint(\"model/resnet-152\", 0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create a model for this model on GPU 0.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mmod\u001b[0m: \u001b[32mModule\u001b[0m = ml.dmlc.mxnet.module.Module@d76fa60\n",
+       "\u001b[36mdataShapesResnet\u001b[0m: \u001b[32mListMap\u001b[0m[\u001b[32mString\u001b[0m, \u001b[32mShape\u001b[0m] = \u001b[33mMap\u001b[0m(\u001b[32m\"data\"\u001b[0m -> (1,3,224,224))"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val mod = new Module(resnet, contexts = Context.cpu())\n",
+    "val dataShapesResnet = ListMap(\"data\" -> Shape(1, 3, 224, 224))\n",
+    "mod.bind(dataShapes=dataShapesResnet, forTraining = false)\n",
+    "mod.setParams(argParamsResnet, auxParamsResnet)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next we define the function to obtain an image by a given URL and the function for predicting.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": []
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "classpath.addPath(\"/usr/local/opt/opencv3/share/OpenCV/java/opencv-320.jar\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[32mimport \u001b[36morg.opencv.core.Core\u001b[0m\n",
+       "\u001b[32mimport \u001b[36morg.opencv.imgcodecs.Imgcodecs\u001b[0m\n",
+       "\u001b[32mimport \u001b[36morg.opencv.imgproc.Imgproc\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mml.dmlc.mxnet.NDArray\u001b[0m\n",
+       "\u001b[32mimport \u001b[36morg.opencv.core.Mat\u001b[0m\n",
+       "\u001b[32mimport \u001b[36morg.opencv.core.CvType\u001b[0m\n",
+       "\u001b[32mimport \u001b[36mjava.util.ArrayList\u001b[0m\n",
+       "\u001b[32mimport \u001b[36morg.opencv.core.Size\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import org.opencv.core.Core\n",
+    "//import org.opencv.highgui.Highgui\n",
+    "import org.opencv.imgcodecs.Imgcodecs\n",
+    "import org.opencv.imgproc.Imgproc\n",
+    "import ml.dmlc.mxnet.NDArray\n",
+    "import org.opencv.core.Mat\n",
+    "import org.opencv.core.CvType\n",
+    "import java.util.ArrayList\n",
+    "import org.opencv.core.Size"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "defined \u001b[32mfunction \u001b[36mgetImage\u001b[0m"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "def getImage(filename: String) ={\n",
+    "    System.loadLibrary(Core.NATIVE_LIBRARY_NAME);\n",
+    "    println(String.format(\"Loaded OpenCV %s\", Core.VERSION))\n",
+    "    println(filename)\n",
+    "    val mat = Imgcodecs.imread(filename)\n",
+    "    val greyMat = new Mat()\n",
+    "    Imgproc.cvtColor(mat, greyMat, Imgproc.COLOR_BGR2RGB)\n",
+    "    val resizeMat = new Mat()\n",
+    "    Imgproc.resize(mat, resizeMat, new Size(224, 224))\n",
+    "    \n",
+    "    val typeMat = new Mat()\n",
+    "    resizeMat.convertTo(typeMat, CvType.CV_32F)\n",
+    "    \n",
+    "    val size = (typeMat.total * typeMat.channels).toInt\n",
+    "    val buff = new Array[Float](size)\n",
+    "    typeMat.get(0, 0, buff)\n",
+    "    \n",
+    "    println(buff)\n",
+    "    typeMat\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Loaded OpenCV 3.2.0\n",
+      "0.jpg\n",
+      "[F@4a8d4eba\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "\u001b[36mimg\u001b[0m: \u001b[32mMat\u001b[0m = Mat [ 224*224*CV_32FC3, isCont=true, isSubmat=false, nativeObj=0x7f98f89efb50, dataAddr=0x16c766020 ]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "val img = getImage(\"0.jpg\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "anaconda-cloud": {},
+  "kernelspec": {
+   "display_name": "Scala 2.11",
+   "language": "scala211",
+   "name": "scala211"
+  },
+  "language_info": {
+   "codemirror_mode": "text/x-scala",
+   "file_extension": ".scala",
+   "mimetype": "text/x-scala",
+   "name": "scala211",
+   "pygments_lexer": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}