Skip to content

Commit

Permalink
Type-os and added great links to learning more about Machine Learning
Browse files Browse the repository at this point in the history
  • Loading branch information
MarkyV authored and amueller committed Jan 2, 2013
1 parent 85c0b45 commit 3352bba
Show file tree
Hide file tree
Showing 10 changed files with 104 additions and 102 deletions.
19 changes: 9 additions & 10 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Contributing code

**Note: This document is just to get started, visit [**Contributing
page**](http://scikit-learn.org/stable/developers/index.html#coding-guidelines)
for the full contributor's guide. Make sure to read it carefully to make
for the full contributor's guide. Please be sure to read it carefully to make
the code review process go as smoothly as possible and maximize the
likelihood of your contribution to get merged.**
likelihood of your contribution being merged.**

How to contribute
-----------------
Expand All @@ -29,7 +29,7 @@ GitHub:

and start making changes. Never work in the ``master`` branch!

4. Work on this copy, on your computer, using Git to do the version
4. Work on this copy on your computer using Git to do the version
control. When you're done editing, do:

$ git add modified_files
Expand All @@ -43,8 +43,8 @@ Finally, go to the web page of the your fork of the scikit-learn repo,
and click 'Pull request' to send your changes to the maintainers for
review. request. This will send an email to the committers.

(If any of the above seems like magic to you, then look up the [Git documentation](http://git-scm.com/documentation)
on the web.)
(If any of the above seems like magic to you, then look up the
[Git documentation](http://git-scm.com/documentation) on the web.)

It is recommended to check that your contribution complies with the
following rules before submitting a pull request:
Expand All @@ -64,7 +64,7 @@ following rules before submitting a pull request:
to other methods available in scikit-learn.

- At least one paragraph of narrative documentation with links to
references in the literature (with PDF links when possible) and
```` references in the literature (with PDF links when possible) and
the example.
The documentation should also include expected time and space
Expand All @@ -76,7 +76,7 @@ scale in dimensionality: n_features is expected to be lower than
You can also check for common programming errors with the following
tools:
- Code with a good unittest coverage (at least 80%), check with:
- Code with good unittest coverage (at least 80%), check with:
$ pip install nose coverage
$ nosetests --with-coverage path/to/tests_for_package
Expand Down Expand Up @@ -119,7 +119,7 @@ reStructuredText documents (like this one), tutorials, etc.
reStructuredText documents live in the source code repository under the
doc/ directory.
You can edit the documentation using any text editor, and then generate
You can edit the documentation using any text editor and then generate
the HTML output by typing ``make html`` from the doc/ directory.
Alternatively, ``make`` can be used to quickly generate the
documentation without the example gallery. The resulting HTML files will
Expand All @@ -133,7 +133,7 @@ For building the documentation, you will need
When you are writing documentation, it is important to keep a good
compromise between mathematical and algorithmic details, and give
intuition to the reader on what the algorithm does. It is best to always
start with a small paragraph with a hand-waiving explanation of what the
start with a small paragraph with a hand-waving explanation of what the
method does to the data and a figure (coming from an example)
illustrating it.
Expand All @@ -143,4 +143,3 @@ Further Information
Visit the [Contributing Code](http://scikit-learn.org/stable/developers/index.html#coding-guidelines)
section of the website for more information including conforming to the
API spec and profiling contributed code.

24 changes: 13 additions & 11 deletions doc/tutorial/basic/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,10 @@ Machine learning: the problem setting

In general, a learning problem considers a set of n
`samples <http://en.wikipedia.org/wiki/Sample_(statistics)>`_ of
data and try to predict properties of unknown data. If each sample is
more than a single number, and for instance a multi-dimensional entry
data and then tries to predict properties of unknown data. If each sample is
more than a single number and, for instance, a multi-dimensional entry
(aka `multivariate <http://en.wikipedia.org/wiki/Multivariate_random_variable>`_
data), is it said to have several attributes,
or **features**.
data), is it said to have several attributes or **features**.

We can separate learning problems in a few large categories:

Expand All @@ -35,9 +34,12 @@ We can separate learning problems in a few large categories:
samples belong to two or more classes and we
want to learn from already labeled data how to predict the class
of unlabeled data. An example of classification problem would
be the digit recognition example, in which the aim is to assign
each input vector to one of a finite number of discrete
categories.
be the handwritten digit recognition example, in which the aim is
to assign each input vector to one of a finite number of discrete
categories. Another way to think of classification is as a discrete
(as opposed to continuous) form of supervised learning where one has a
limited number of categories and for each of the n samples provided,
one is to try to label them with the correct categroy or class.

* `regression <http://en.wikipedia.org/wiki/Regression_analysis>`_:
if the desired output consists of one or more
Expand All @@ -52,7 +54,7 @@ We can separate learning problems in a few large categories:
it is called `clustering <http://en.wikipedia.org/wiki/Cluster_analysis>`_,
or to determine the distribution of data within the input space, known as
`density estimation <http://en.wikipedia.org/wiki/Density_estimation>`_, or
to project the data from a high-dimensional space down to two or thee
to project the data from a high-dimensional space down to two or three
dimensions for the purpose of *visualization*
(:ref:`Click here <unsupervised-learning>`
to go to the Scikit-Learn unsupervised learning page).
Expand All @@ -62,8 +64,8 @@ We can separate learning problems in a few large categories:
Machine learning is about learning some properties of a data set
and applying them to new data. This is why a common practice in
machine learning to evaluate an algorithm is to split the data
at hand in two sets, one that we call a **training set** on which
we learn data properties, and one that we call a **testing set**,
at hand into two sets, one that we call the **training set** on which
we learn data properties and one that we call the **testing set**
on which we test these properties.

.. _loading_example_dataset:
Expand Down Expand Up @@ -142,7 +144,7 @@ the classes to which unseen samples belong.
In `scikit-learn`, an estimator for classification is a Python object that
implements the methods `fit(X, y)` and `predict(T)`.

An example of estimator is the class ``sklearn.svm.SVC`` that
An example of an estimator is the class ``sklearn.svm.SVC`` that
implements `support vector classification
<http://en.wikipedia.org/wiki/Support_vector_machine>`_. The
constructor of an estimator takes as arguments the parameters of the
Expand Down
4 changes: 3 additions & 1 deletion doc/tutorial/common_includes/info.txt
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
Meant to share common RST file snippets that we want to reuse by inclusion in the real tutorial to lower the maintenance burden of redundant sections.
Meant to share common RST file snippets that we want to reuse by inclusion
in the real tutorial in order to lower the maintenance burden
of redundant sections.
6 changes: 5 additions & 1 deletion doc/tutorial/statistical_inference/finding_help.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ clarification in the docstring or the online documentation, please feel free to
ask on the `Mailing List <http://scikit-learn.sourceforge.net/support.html>`_


Q&A communities with Machine Learning practictioners
Q&A communities with Machine Learning practitioners
----------------------------------------------------

:Metaoptimize/QA:
Expand All @@ -36,3 +36,7 @@ Q&A communities with Machine Learning practictioners
.. _`good freely available textbooks on machine learning`: http://metaoptimize.com/qa/questions/186/good-freely-available-textbooks-on-machine-learning

.. _`What are some good resources for learning about machine learning`: http://www.quora.com/What-are-some-good-resources-for-learning-about-machine-learning

-- _'An excellent free online course for Machine Learning taught by Professor Andrew Ng of Stanford': https://www.coursera.org/course/ml

-- _'Another excellent free online course that takes a more general approach to Artificial Intelligence':http://www.udacity.com/overview/Course/cs271/CourseRev/1
14 changes: 7 additions & 7 deletions doc/tutorial/statistical_inference/model_selection.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,9 @@ of the computer.

*

- Split it K folds, train on K-1, test on left-out
- Split it K folds, train on K-1 and then test on left-out

- Make sure that all classes are even accross the folds
- Make sure that all classes are even across the folds

- Leave one observation out

Expand Down Expand Up @@ -155,8 +155,8 @@ estimator during the construction and exposes an estimator API::
0.94228356336260977


By default the :class:`GridSearchCV` uses a 3-fold cross-validation. However, if
it detects that a classifier is passed, rather than a regressor, it uses
By default, the :class:`GridSearchCV` uses a 3-fold cross-validation. However,
if it detects that a classifier is passed, rather than a regressor, it uses
a stratified 3-fold.

.. topic:: Nested cross-validation
Expand All @@ -167,7 +167,7 @@ a stratified 3-fold.
array([ 0.97996661, 0.98163606, 0.98330551])

Two cross-validation loops are performed in parallel: one by the
:class:`GridSearchCV` estimator to set `gamma`, the other one by
:class:`GridSearchCV` estimator to set `gamma` and the other one by
`cross_val_score` to measure the prediction performance of the
estimator. The resulting scores are unbiased estimates of the
prediction score on new data.
Expand All @@ -183,8 +183,8 @@ Cross-validated estimators
----------------------------

Cross-validation to set a parameter can be done more efficiently on an
algorithm-by-algorithm basis. This is why, for certain estimators, the
sklearn exposes :ref:`cross_validation` estimators, that set their parameter
algorithm-by-algorithm basis. This is why for certain estimators the
sklearn exposes :ref:`cross_validation` estimators that set their parameter
automatically by cross-validation::

>>> from sklearn import linear_model, datasets
Expand Down
12 changes: 4 additions & 8 deletions doc/tutorial/statistical_inference/putting_together.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ Putting it all together
Pipelining
============

We have seen that some estimators can transform data, and some estimators
can predict variables. We can create combined estimators:
We have seen that some estimators can transform data and that some estimators
can predict variables. We can also create combined estimators:

.. image:: ../../auto_examples/images/plot_digits_pipe_1.png
:target: ../../auto_examples/plot_digits_pipe.html
Expand All @@ -26,7 +26,7 @@ Face recognition with eigenfaces
=================================

The dataset used in this example is a preprocessed excerpt of the
"Labeled Faces in the Wild", aka LFW_:
"Labeled Faces in the Wild", also known as LFW_:

http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)

Expand Down Expand Up @@ -71,10 +71,6 @@ Expected results for the top 5 most represented people in the dataset::
Open problem: Stock Market Structure
=====================================

Can we predict the variation in stock prices for Google?
Can we predict the variation in stock prices for Google over a given time frame?

:ref:`stock_market`




15 changes: 7 additions & 8 deletions doc/tutorial/statistical_inference/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ these arrays is the **samples** axis, while the second is the
features: their sepal and petal length and width, as detailed in
`iris.DESCR`.

When the data is not intially in the `(n_samples, n_features)` shape, it
needs to be preprocessed to be used by the scikit.
When the data is not initially in the `(n_samples, n_features)` shape, it
needs to be preprocessed in order to by used by scikit.

.. topic:: An example of reshaping data: the digits dataset
.. topic:: An example of reshaping data would be the digits dataset

.. image:: ../../auto_examples/datasets/images/plot_digits_last_image_1.png
:target: ../../auto_examples/datasets/plot_digits_last_image.html
Expand All @@ -46,7 +46,7 @@ needs to be preprocessed to be used by the scikit.
>>> pl.imshow(digits.images[-1], cmap=pl.cm.gray_r) #doctest: +SKIP
<matplotlib.image.AxesImage object at ...>

To use this dataset with the scikit, we transform each 8x8 image in a
To use this dataset with the scikit, we transform each 8x8 image into a
feature vector of length 64 ::

>>> data = digits.images.reshape((digits.images.shape[0], -1))
Expand All @@ -68,16 +68,16 @@ Estimators objects
**Fitting data**: the main API implemented by scikit-learn is that of the
`estimator`. An estimator is any object that learns from data;
it may a classification, regression or clustering algorithm or
it may be a classification, regression or clustering algorithm or
a `transformer` that extracts/filters useful features from raw data.

All estimator objects expose a `fit` method, that takes a dataset
All estimator objects expose a `fit` method that takes a dataset
(usually a 2-d array):

>>> estimator.fit(data)

**Estimator parameters**: All the parameters of an estimator can be set
when it is instantiated, or by modifying the corresponding attribute::
when it is instantiated or by modifying the corresponding attribute::

>>> estimator = Estimator(param1=1, param2=2)
>>> estimator.param1
Expand All @@ -90,4 +90,3 @@ underscore::

>>> estimator.estimated_param_ #doctest: +SKIP


Loading

0 comments on commit 3352bba

Please sign in to comment.