revise readme from scratch

add LICENSE and Changelog.md and include in the Sphinx documentation fix logo path in Readme add some information on installing the package add note on the CLA
emdgroup · Jun 14, 2021 · 1c7f2d5 · 1c7f2d5
1 parent e077380
commit 1c7f2d5
Show file tree

Hide file tree

Showing 11 changed files with 443 additions and 11 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,85 @@
+# EMD Group Individual Contributor License Agreement
+
+Thank you for your interest in contributing to Merck KGaA, Darmstadt, Germany and other legal entities belonging to the group ("We" or "Us").
+
+This contributor agreement ("Agreement") documents the rights granted by contributors to Us. To make this document effective, please sign it and send it to Us by electronic submission. This is a legally binding document, so please read it carefully before agreeing to it. The Agreement may cover more than one software project managed by Us.
+
+## 1. Definitions
+
+"You" means the individual who Submits a Contribution to Us.
+
+"Contribution" means any work of authorship that is Submitted by You to Us in which You own or assert ownership of the Copyright. If You do not own the Copyright in the entire work of authorship, please follow the instructions in Section 3(d).
+
+"Copyright" means all rights protecting works of authorship owned or controlled by You, including copyright, moral and neighboring rights, as appropriate, for the full term of their existence including any extensions by You.
+
+"Material" means the work of authorship which is made available by Us to third parties. When this Agreement covers more than one software project, the Material means the work of authorship to which the Contribution was Submitted. After You Submit the Contribution, it may be included in the Material.
+
+"Submit" means any form of electronic, verbal, or written communication sent to Us or our representatives, including but not limited to electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, Us for the purpose of discussing and improving the Material, but excluding communication that is conspicuously marked or otherwise designated in writing by You as "Not a Contribution."
+
+"Submission Date" means the date on which You Submit a Contribution to Us.
+
+"Effective Date" means the date You execute this Agreement or the date You first Submit a Contribution to Us, whichever is earlier.
+
+"Media" means any portion of a Contribution which is not software.
+
+## 2. Grant of Rights
+
+### 2.1 Copyright License
+
+a) You retain ownership of the Copyright in Your Contribution and have the same rights to use or license the Contribution which You would have had without entering into the Agreement.
+
+b) To the maximum extent permitted by the relevant law, You grant to Us a perpetual, worldwide, non-exclusive, transferable, royalty-free, irrevocable license under the Copyright covering the Contribution, with the right to sublicense such rights through multiple tiers of sublicensees, to reproduce, modify, display, perform and distribute the Contribution as part of the Material; provided that this license is conditioned upon compliance with Section 2.3.
+
+### 2.2 Patent License
+
+For patent claims including, without limitation, method, process, and apparatus claims which You own, control or have the right to grant, now or in the future, You grant to Us a perpetual, worldwide, non-exclusive, transferable, royalty-free, irrevocable patent license, with the right to sublicense these rights to multiple tiers of sublicensees, to make, have made, use, sell, offer for sale, import and otherwise transfer the Contribution and the Contribution in combination with the Material (and portions of such combination). This license is granted only to the extent that the exercise of the licensed rights infringes such patent claims; and provided that this license is conditioned upon compliance with Section 2.3.
+
+### 2.3 Outbound License
+
+As a condition on the grant of rights in Sections 2.1 and 2.2, We agree to license the Contribution only under the terms of the license or licenses which We are using on the Submission Date for the Material or the following additional licenses: **Apache License 2.0** (including any right to adopt any future version of a license if permitted).
+
+In addition, We may use the following licenses for Media in the Contribution: **Creative Commons Zero v1.0 Universal** (including any right to adopt any future version of a license if permitted).
+
+### 2.4 Moral Rights
+
+If moral rights apply to the Contribution, to the maximum extent permitted by law, You waive and agree not to assert such moral rights against Us or our successors in interest, or any of our licensees, either direct or indirect.
+
+### 2.5 Our Rights
+
+You acknowledge that We are not obligated to use Your Contribution as part of the Material and may decide to include any Contribution We consider appropriate.
+
+### 2.6 Reservation of Rights
+
+Any rights not expressly licensed under this section are expressly reserved by You.
+
+## 3. Agreement
+
+You confirm that:
+
+a) You have the legal authority to enter into this Agreement.
+
+b) You own the Copyright and patent claims covering the Contribution which are required to grant the rights under Section 2.
+
+c) The grant of rights under Section 2 does not violate any grant of rights which You have made to third parties, including Your employer. If You are an employee, You have had Your employer approve this Agreement or sign the Entity version of this document. If You are less than eighteen years old, please have Your parents or guardian sign the Agreement.
+
+d) You have followed the instructions of the third party owner of copyright, if You do not own the Copyright in the entire work of authorship Submitted.
+
+## 4. Disclaimer
+
+EXCEPT FOR THE EXPRESS WARRANTIES IN SECTION 3, THE CONTRIBUTION IS PROVIDED "AS IS". MORE PARTICULARLY, ALL EXPRESS OR IMPLIED WARRANTIES INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT ARE EXPRESSLY DISCLAIMED BY YOU TO US. TO THE EXTENT THAT ANY SUCH WARRANTIES CANNOT BE DISCLAIMED, SUCH WARRANTY IS LIMITED IN DURATION TO THE MINIMUM PERIOD PERMITTED BY LAW.
+
+## 5. Consequential Damage Waiver
+
+TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT WILL YOU BE LIABLE FOR ANY LOSS OF PROFITS, LOSS OF ANTICIPATED SAVINGS, LOSS OF DATA, INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL AND EXEMPLARY DAMAGES ARISING OUT OF THIS AGREEMENT REGARDLESS OF THE LEGAL OR EQUITABLE THEORY (CONTRACT, TORT OR OTHERWISE) UPON WHICH THE CLAIM IS BASED.
+
+## 6. Miscellaneous
+
+6.1 This Agreement will be governed by and construed in accordance with the laws of Germany excluding its conflicts of law provisions. Under certain circumstances, the governing law in this section might be superseded by the United Nations Convention on Contracts for the International Sale of Goods ("UN Convention") and the parties intend to avoid the application of the UN Convention to this Agreement and, thus, exclude the application of the UN Convention in its entirety to this Agreement.
+
+6.2 This Agreement sets out the entire agreement between You and Us for Your Contributions to Us and overrides all other agreements or understandings.
+
+6.3 If You or We assign the rights or obligations received through this Agreement to a third party, as a condition of the assignment, that third party must agree in writing to abide by all the rights and obligations in the Agreement.
+
+6.4 The failure of either party to require performance by the other party of any provision of this Agreement in one situation shall not affect the right of a party to require such performance at any time in the future. A waiver of performance under a provision in one situation shall not be considered a waiver of the performance of the provision in the future or a waiver of the provision in its entirety.
+
+6.5 If any provision of this Agreement is found void and unenforceable, such provision will be replaced to the extent possible with a provision that comes closest to the meaning of the original provision and which is enforceable. The terms and conditions set forth in this Agreement shall apply notwithstanding any failure of essential purpose of this Agreement or any limited remedy to the maximum extent possible under law.
diff --git a/Changelog.md b/Changelog.md
@@ -0,0 +1,9 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## Unreleased
+### Added
+- Initial version of a documentation
diff --git a/GeneralIntroduction.md b/GeneralIntroduction.md
@@ -0,0 +1,85 @@
+# General Introduction
+
+The purpose of NMF is to learn *parts-based representations* of data, which is achieved by separating the data into a set of **dictionary elements** and corresponding **activations** (see [1]_).
+Both the dictionary elements and their activations are required to be *non-negative*, such that the induced superposition of dictionary elements (weighted with their corresponding activation terms) reconstructs the data in a purely *additive* way.
+This has the effect that characteristic features emerging in the dictionary during the learning process must correspond to meaningful parts of the data, since each individual feature can only be added to the reconstruction the data but not be subtracted from it.
+
+## Notation
+TODO: add section
+
+## Non-Negative Matrix Factorization
+
+In its simplest variant, the NMF task can be formulated as a pure matrix factorization problem, where the data is represented by a non-negative matrix :math:`V \in \mathbb{R}_{\geq 0}^{S \times D}` that is to be approximated through a product of a non-negative dictionary matrix :math:`W \in \mathbb{R}_{\geq 0}^{K \times D}` and a non-negative activation matrix :math:`H \in \mathbb{R}_{\geq 0}^{S \times K}`,
+
+.. math::
+    V \approx H W.  
+    :label: nmf
+
+.. note::
+    In contrast to most of the NMF literature, we represent individual data points by row vectors instead of column vectors in order to be consistent with the row-major data representation used in the code.
+
+By defining an appropriate divergence measure :math:`D`, the factorization task can be translated into a proper optimization problem of the following form (see [2]_),
+
+.. math::
+    \min_{W, H} D(V \mid H W) \quad \text{subject to} \quad W \geq 0, H \geq 0.
+
+A common choice is the *Frobenius norm*, which measures the quadratic difference between the data and its reconstruction,
+
+.. math::
+    D(V \mid R) = \lVert V - R \rVert_F = \sqrt{ \sum_{s=1}^S \sum_{d=1}^D \lvert V_{sd} - R_{sd} \rvert^2 }.
+
+
+## Sparse Coding
+TODO: add section
+
+## Transform Invariance
+
+Abstractly speaking, the dictionary matrix :math:`W` in Equation :eq:`nmf` contains :math:`K` *characteristic features* represented through its row vectors :math:`\lbrace W_k \rbrace`, which are superimposed via the corresponding activation vector :math:`H_s` to form the input sample :math:`V_s`,
+
+.. math::
+    V_{s} \approx \sum_k H_{sk} W_{k}.
+    :label: nmf_synthesis
+
+
+As can be seen from the above equation, the individual dictionary elements :math:`\lbrace W_k \rbrace` have the same size as the samples :math:`\lbrace V_s \rbrace`.
+In many applications, however, typical features contained in the data are smaller than the individual samples and exhibit certain kinds of *transform invariance*.
+
+For example, image data is typically composed of smaller constituents, which represent different parts of objects and can contribute to the image at all possible locations on the pixel grid.
+This particular degree of freedom stems from the simple fact that objects can usually move freely within a scene and can hence appear at different locations in the recorded image, rendering the characteristic image features *invariant under change of location* (shift invariance).
+Other types of invariances related to image data arise from additional spatial transforms of the involved objects depicted in the scene (such as scaling, rotation, mirroring) or changes in the lightning conditions and the measurement process (e.g. change of color or contrast).
+In general, invariances can be also observed in other types of data, such as audio recordings, where each individual characteristic feature (e.g. a tone) belongs to a larger part (a chord), which in turn may occur in different timbres, in different keys, for different durations, and so on.
+
+Instead of attempting to capture all possible instantiations of the involved dictionary elements that could be generated through their applicable transforms (which would require an exponentially large dictionary), a more data-efficient approach is to decouple the transforms from their dictionary elements and learn a *transform-invariant dictionary*.
+This can be achieved by encoding the transforms explicitly into the model,
+
+.. math::
+    V_{s} = \sum_k \sum_m H_{smk} T_m[\tilde{W}_{k}].
+    :label: tnmf_synthesis
+
+Herein, the set of possible transforms of a given dictionary element :math:`\tilde{W}_k` (which in the following is referred to as an **elementary atom**) is described through a **transform operator** :math:`T : \mathbb{R}_{\geq 0}^L \times \lbrace 1, \ldots, M \rbrace \rightarrow \mathbb{R}_{\geq 0}^S`, which can be indexed to refer to a particular instantiation of the transform.
+In the image case, for instance, :math:`T` could describe all possible shifts of a smaller image patch within an image region, with :math:`T_m` corresponding to a particular shift of the patch to a specific location on the pixel grid.
+The corresponding activations are stored in an **activation tensor** :math:`H \in \mathbb{R}_{\geq 0}^{S \times M \times K}`, whose element :math:`H_{smk}` quantifies the contribution of the :math:`m`-th transform of the :math:`k`-th dictionary element to the :math:`s`-th data sample.
+Note, in particular, that the sizes of :math:`V_s` and :math:`W_k` are no longer coupled in this model since the transform operator :math:`T` maps each dictionary element from a separate **latent space** :math:`\mathbb{R}_{\geq 0}^L`, whose dimensionality :math:`L` can be defined independently, to the sample space :math:`\mathbb{R}_{\geq 0}^D`.
+
+For the data reconstruction part, the synthesis procedure in Equation :eq:`tnmf_synthesis` is, in fact, equivalent to that of Equation :eq:`nmf_synthesis` when using an extended dictionary :math:`W` that contains all possible transforms of the original elements.
+However, the important difference to note is that each dictionary element of that extended dictionary would be considered an independent parameter in the latter approach whereas all transformed versions of the elements are coupled through their elementary atoms :math:`\lbrace \tilde{W}_k \rbrace` and hence need to be identified through the same shared parameters.
+
+TODO: add documentation on inhibition regularization term
+
+## Multi-channel Data
+TODO: add section
+
+## References
+
+.. [1] Lee, D.D., Seung, H.S., 2000. Algorithms for Non-negative Matrix Factorization,
+    in: Proceedings of the 13th International Conference on Neural Information
+    Processing Systems. pp. 535–541. https://doi.org/10.5555/3008751.3008829
+
+.. [2] Févotte, C., & Idier, J, 2011. Algorithms for Nonnegative Matrix Factorization with the β-divergence.
+    Neural computation, 23(9), pp. 2421-2456. https://doi.org/10.1162/NECO_a_00168
+
+## Purpose of this Package
+This package provides a toolset to learn invariant data representations of the form described in Equation :eq:`tnmf_synthesis` for arbitrary transform types, i.e., it can be used to find the latent dictionary "behind" the underlying transform.
+
+In the specific case of image data and shift invariance (to mention only one of many possible combinations of data and transforms), the package allows to extract a dictionary of image patches that reconstruct a given image through a specific arrangement of their shifted versions.
+In this sense, it allows to "undo" the data-generating transform operation so that the learned dictionary encodes the input *modulo* shift.