diff --git a/joss.05664/10.21105.joss.05664.crossref.xml b/joss.05664/10.21105.joss.05664.crossref.xml new file mode 100644 index 0000000000..9517dc05ee --- /dev/null +++ b/joss.05664/10.21105.joss.05664.crossref.xml @@ -0,0 +1,238 @@ + + + + 20230825T030514-25c7c9390bb5eb927c063a5392d35eb54e671061 + 20230825030514 + + JOSS Admin + admin@theoj.org + + The Open Journal + + + + + Journal of Open Source Software + JOSS + 2475-9066 + + 10.21105/joss + https://joss.theoj.org + + + + + 08 + 2023 + + + 8 + + 88 + + + + dcTensor: An R package for discrete matrix/tensor +decomposition + + + + Koki + Tsuyuzaki + https://orcid.org/0000-0003-3797-2148 + + + + 08 + 25 + 2023 + + + 5664 + + + 10.21105/joss.05664 + + + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + http://creativecommons.org/licenses/by/4.0/ + + + + Software archive + 10.5281/zenodo.8275544 + + + GitHub review issue + https://github.com/openjournals/joss-reviews/issues/5664 + + + + 10.21105/joss.05664 + https://joss.theoj.org/papers/10.21105/joss.05664 + + + https://joss.theoj.org/papers/10.21105/joss.05664.pdf + + + + + + Binary matrix factorization with +applications + Zhang + ICDM 2007 + 10.1109/icdm.2007.99 + 2007 + Zhang, Z., Li, T., Ding, C., & +Zhang, X. (2007). Binary matrix factorization with applications. ICDM +2007, 391–400. +https://doi.org/10.1109/icdm.2007.99 + + + Probabilistic non-negative matrix +factorization with binary components + Ma + MDPI mathematics + 10.3390/math9111189 + 2021 + Ma, X., Gao, J., Liu, X., Zhang, T., +& Tang, Y. (2021). Probabilistic non-negative matrix factorization +with binary components. MDPI Mathematics, 1189. +https://doi.org/10.3390/math9111189 + + + Nonnegative matrix and tensor +factorizations + Cichocki + 2009 + Cichocki, A., Zdunek, R., Phan, A. +H., & Amari, S. (2009). Nonnegative matrix and tensor +factorizations. Wiley. + + + Non-negative tensor factorization using alpha +and beta divergence + Cichocki + ICASSP ’07 + 10.1109/icassp.2007.367106 + 2007 + Cichocki, A., Zdunek, R., Choi, S., +Plemmons, R., & Amari, S. (2007). Non-negative tensor factorization +using alpha and beta divergence. ICASSP ’07, III-1393-III-1396. +https://doi.org/10.1109/icassp.2007.367106 + + + Nonnegative tucker +decomposition + Kim + IEEE CVPR + 10.1109/cvpr.2007.383405 + 2007 + Kim, Y.-D., & Choi, S. (2007). +Nonnegative tucker decomposition. IEEE CVPR, 1–8. +https://doi.org/10.1109/cvpr.2007.383405 + + + Learning the parts of objects by non-negative +matrix factorization + Lee + Nature + 401 + 10.1038/44565 + 1999 + Lee, D., & Seung, H. (1999). +Learning the parts of objects by non-negative matrix factorization. +Nature, 401, 788–791. +https://doi.org/10.1038/44565 + + + Benchmarking principal component analysis for +large-scale single-cell RNA-sequencing + Tsuyuzaki + BMC Genome Biology + 21(1) + 10.1186/s13059-019-1900-3 + 2020 + Tsuyuzaki, K., Sato, H., Sato, K., +& Nikaido, I. (2020). Benchmarking principal component analysis for +large-scale single-cell RNA-sequencing. BMC Genome Biology, 21(1), 9. +https://doi.org/10.1186/s13059-019-1900-3 + + + Extracting gene expression profiles common to +colon and pancreatic adenocarcinoma using simultaneous nonnegative +matrix factorization + Badea + Pacific Symposium on +Biocomputing + 10.1142/9789812776136_0027 + 2008 + Badea, L. (2008). Extracting gene +expression profiles common to colon and pancreatic adenocarcinoma using +simultaneous nonnegative matrix factorization. Pacific Symposium on +Biocomputing, 279–290. +https://doi.org/10.1142/9789812776136_0027 + + + Discovery of multi-dimensional modules by +integrative analysis of cancer genomic data + Zhang + Nucleic Acids Research + 40(19) + 10.1093/nar/gks725 + 2012 + Zhang, C.-C., S. Liu, Li, W., Shen, +H., Laird, P. W., & Zhou, X. J. (2012). Discovery of +multi-dimensional modules by integrative analysis of cancer genomic +data. Nucleic Acids Research, 40(19), 9379–9391. +https://doi.org/10.1093/nar/gks725 + + + Probabilistic latent tensor +factorization + Yilmaz + IVA/ICA 2010 + 10.1007/978-3-642-15995-4_43 + 2010 + Yilmaz, Y. K. (2010). Probabilistic +latent tensor factorization. IVA/ICA 2010, 346–353. +https://doi.org/10.1007/978-3-642-15995-4_43 + + + A non-negative matrix factorization method +for detecting modules in heterogeneous omics multi-modal +data + Yang + Bioinformatics + 32(1) + 10.1093/bioinformatics/btv544 + 2016 + Yang, Z., & Michailidis, G. +(2016). A non-negative matrix factorization method for detecting modules +in heterogeneous omics multi-modal data. Bioinformatics, 32(1), 1–8. +https://doi.org/10.1093/bioinformatics/btv544 + + + Stochastic optimization for PCA and +PLS + Arora + 2012 50th Annual Allerton Conference on +Communication, Control, and Computing (Allerton) + 2012 + Arora, R. (2012). Stochastic +optimization for PCA and PLS. 2012 50th Annual Allerton Conference on +Communication, Control, and Computing (Allerton), +861–868. + + + + + + diff --git a/joss.05664/10.21105.joss.05664.jats b/joss.05664/10.21105.joss.05664.jats new file mode 100644 index 0000000000..4cddbc53c4 --- /dev/null +++ b/joss.05664/10.21105.joss.05664.jats @@ -0,0 +1,478 @@ + + +
+ + + + +Journal of Open Source Software +JOSS + +2475-9066 + +Open Journals + + + +5664 +10.21105/joss.05664 + +dcTensor: An R package for discrete matrix/tensor +decomposition + + + +https://orcid.org/0000-0003-3797-2148 + +Tsuyuzaki +Koki + + + + + + +Department of Artificial Intelligence Medicine, Graduate +School of Medicine, Chiba University, Japan + + + + +Laboratory for Bioinformatics Research, RIKEN Center for +Biosystems Dynamics Research, Japan + + + + +27 +6 +2023 + +8 +88 +5664 + +Authors of papers retain copyright and release the +work under a Creative Commons Attribution 4.0 International License (CC +BY 4.0) +2022 +The article authors + +Authors of papers retain copyright and release the work under +a Creative Commons Attribution 4.0 International License (CC BY +4.0) + + + +R +discrete matrix factorization +discrete tensor factorization +dimension reduction + + + + + + Summary +

Matrix factorization (MF) is a widely used approach to extract + significant patterns in a data matrix. MF is formalized as the + approximation of a data matrix + + X + by the matrix product of two factor matrices + + + U + and + + V. + Because this formalization has a large number of degrees of freedom, + some constraints are imposed on the solution. Non-negative matrix + factorization (NMF) imposing a non-negative solution for the factor + matrices is a widely used algorithm to decompose non-negative matrix + data matrix. Due to the interpretability of its non-negativity and the + convenience of using decomposition results as clustering, there are + many applications of NMF in image processing, audio processing, and + bioinformatics + (Cichocki + et al., 2009).

+

A discrete version of NMF can also be considered by imposing a + binary solution (e.g., {0,1}) for the factor matrices extracted from + the data matrix and it is called binary matrix factorization (BMF) + (Z. + Zhang et al., 2007). BMF is recently featured in some data + science domains such as market basket data, document-term data, Web + click-stream data, DNA microarray expression profiles, or + protein-protein complex interaction networks.

+

Although BMF is becoming more used, in the current data analysis, + further extensions are required. For example, we may need a ternary + solution (e.g., {0,1,2}) instead of a binary one. Here, I call it + ternary matrix factorization (TMF). TMF would contribute to the + extraction of ordered patterns, such as stages of disease severity. It + is also possible to apply the discretization to only one of the two + factor matrices ( + + U + or + + V) + and here I call it semi-binary matrix factorization (SBMF) + (Ma et al., + 2021) or semi-ternary matrix factorization (STMF). This + extension contributes to the extraction of discrete patterns in + continuous-valued matrix data. Finally, there is a growing demand to + extend MF to the simultaneous factorization of multiple matrices or + tensors (high-dimensional arrays) + (Cichocki + et al., 2009). Such heterogeneous data sets are obtained when + multiple measurements with a common data structure are performed under + different experimental conditions. Therefore, it is very convenient if + discretization is available to such heterogeneous data structures. To + meet these requirements, I originally developed + dcTensor, which is an R/CRAN package to perform + some discrete matrix/tensor decomposition algorithms + (https://cran.r-project.org/web/packages/dcTensor/index.html).

+
+ + Statement of need +

There are some tools to perform BMF such as + Nimfa, libmf, + recosystem, and + Origami.jl but there is no implementation to + perform TMF, SBMF, STMF, or extensions of MF to multiple matrices or + tensor. For this reason, I originally implemented such discrete + matrix/tensor decomposition algorithms in R language, which is one of + the popular open-source programming languages.

+

dcTensor provides the matrix/tensor + decomposition functions as follows:

+ + +

MF against a matrix data

+ + +

dNMF: Discretized Non-negative + Matrix Factorization + (Cichocki + et al., 2009; + Lee + & Seung, 1999)

+
+ +

dSVD: Discretized Singular Value + Decomposition + (Tsuyuzaki + et al., 2020)

+
+
+
+ +

MF against multiple matrices data

+ + +

dsiNMF: Discretized Simultaneous + Non-negative Matrix Factorization + (Badea, + 2008; + Cichocki + et al., 2009; + Yilmaz, + 2010; + C.-C. + Zhang S. Liu et al., 2012)

+
+ +

djNMF: Discretized Joint + Non-negative Matrix Factorization + (Cichocki + et al., 2009; + Yang + & Michailidis, 2016)

+
+ +

dPLS: Discretized Partial Least + Squares + (Arora, + 2012)

+
+
+
+ +

Tensor Decomposition

+ + +

dNTF: Discretized Non-negative CP + Decomposition + (Cichocki + et al., 2007, + 2009)

+
+ +

dNTD: Discretized Non-negative + Tucker Decomposition + (Cichocki + et al., 2009; + Kim + & Choi, 2007)

+
+
+
+
+
+ + Example +

For the demonstration, here I show that SBMF can be easily + performed on any machine where R is pre-installed by using the + following commands in R:

+ # Install package required (one per computer) +install.packages("dcTensor") + +# Load required package (once per R instance) +library("dcTensor") +library("nnTensor") +library("fields") + +# Load Toy data +data <- toyModel("NMF") + +# Perform SBMF +set.seed(1234) +out <- dNMF(data, Bin_U=1E+6, J=5) + +# Reconstruction of the data matrix +rec.data <- out$U %*% t(out$V) + +# Visualization +layout(rbind(1:2, 3:4)) +image.plot(data, main="Original Data", legend.mar=8, zlim=c(0, max(data))) +image.plot(rec.data, main="Reconstructed Data", legend.mar=8, zlim=c(0,max(data))) +hist(out$U, breaks=100) +hist(out$V, breaks=100) + +

Semi-binary Matrix Factorization + (SBMF).

+ +
+

In the top left of + [fig:sbmf], we can + see that the demo data has five significant patterns as blocks. In the + top right of + [fig:sbmf], we can + see that the reconstructed data, which is the matrix product of the + factor matrices + + U + and + + V, + also has the same patterns and this means the optimization of SBMF is + properly converged. In the bottom left of + [fig:sbmf], we can + see that + + U + is binary ({0,1}), but + + V + is not (the bottom right of + [fig:sbmf]), which + means the solution is semi-binary. This solution is imposed by setting + a large value against Bin_U argument in dNMF function, which is the + binary regularization parameter for + + U. + dNMF also has Bin_V argument, which is the binary regularization + parameter for + + V. + Setting large values against Bin_U and Bin_V, BMF can also be + obtained. Likewise, the ternary solutions (TMF and STMF) can be + obtained by ternary regularization parameters such as Ter_U and + Ter_V.

+
+ + + + + + + ZhangZ. + LiT. + DingC. + ZhangX. + + Binary matrix factorization with applications + ICDM 2007 + 2007 + 10.1109/icdm.2007.99 + 391 + 400 + + + + + + MaX. + GaoJ. + LiuX. + ZhangT. + TangY. + + Probabilistic non-negative matrix factorization with binary components + MDPI mathematics + 2021 + 10.3390/math9111189 + 1189 + + + + + + + CichockiA. + ZdunekR. + PhanA. H. + AmariS. + + Nonnegative matrix and tensor factorizations + Wiley + 2009 + + + + + + CichockiA. + ZdunekR. + ChoiS. + PlemmonsR + AmariS. + + Non-negative tensor factorization using alpha and beta divergence + ICASSP ’07 + 2007 + 10.1109/icassp.2007.367106 + III + 1393-III-1396 + + + + + + KimY.-D. + ChoiS. + + Nonnegative tucker decomposition + IEEE CVPR + 2007 + 10.1109/cvpr.2007.383405 + 1 + 8 + + + + + + LeeD. + SeungH. + + Learning the parts of objects by non-negative matrix factorization + Nature + 1999 + 401 + 10.1038/44565 + 788 + 791 + + + + + + TsuyuzakiK. + SatoH. + SatoK. + NikaidoI. + + Benchmarking principal component analysis for large-scale single-cell RNA-sequencing + BMC Genome Biology + 2020 + 21(1) + 10.1186/s13059-019-1900-3 + 9 + + + + + + + BadeaL. + + Extracting gene expression profiles common to colon and pancreatic adenocarcinoma using simultaneous nonnegative matrix factorization + Pacific Symposium on Biocomputing + 2008 + 10.1142/9789812776136_0027 + 279 + 290 + + + + + + ZhangC.-C.S. Liu + LiW. + ShenH. + LairdP. W. + ZhouX. J. + + Discovery of multi-dimensional modules by integrative analysis of cancer genomic data + Nucleic Acids Research + 2012 + 40(19) + 10.1093/nar/gks725 + 9379 + 9391 + + + + + + YilmazY. K. + + Probabilistic latent tensor factorization + IVA/ICA 2010 + 2010 + 10.1007/978-3-642-15995-4_43 + 346 + 353 + + + + + + YangZ. + MichailidisG. + + A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data + Bioinformatics + 2016 + 32(1) + 10.1093/bioinformatics/btv544 + 1 + 8 + + + + + + AroraR. + + Stochastic optimization for PCA and PLS + 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton) + 2012 + 861 + 868 + + + + +
diff --git a/joss.05664/10.21105.joss.05664.pdf b/joss.05664/10.21105.joss.05664.pdf new file mode 100644 index 0000000000..71266d48f6 Binary files /dev/null and b/joss.05664/10.21105.joss.05664.pdf differ diff --git a/joss.05664/media/figure.png b/joss.05664/media/figure.png new file mode 100644 index 0000000000..eeed097b6a Binary files /dev/null and b/joss.05664/media/figure.png differ