rowDuplicated() and rowAnyDuplicated() #152

karoliskoncevicius · 2019-06-24T23:55:54Z

This issue is a question / feature request.

Do you think it would make sense to add functions like duplicated() and anyDuplicated() optimized to work on every row/column to this package?

The text was updated successfully, but these errors were encountered:

MLopez-Ibanez · 2022-10-11T13:46:55Z

I was looking for this today...

yaccos · 2022-10-11T13:58:43Z

matrixStats is primarily intended for numerical operations on matrices, not dataframe-like operations such as duplicated(). Besides, it would only work reliably for integer matrices because double matrices suffer from floating point imprecision.

MLopez-Ibanez · 2022-11-20T22:38:01Z

matrixStats is primarily intended for numerical operations on matrices, not dataframe-like operations such as duplicated(). Besides, it would only work reliably for integer matrices because double matrices suffer from floating point imprecision.

It could have a tolerance parameter that defaults to sqrt(.Machine$double.eps) like all.equal(). There are many numerical operations where being able to detect duplicated vectors (or close to duplicated vectors) would be useful.

karoliskoncevicius · 2023-02-27T20:34:36Z

@yaccos I would not be quick to agree that duplicated() is a data.frame-like operation. Sure it works on entries of data.frame but it is also used to test if there are repeating values in a vector - this use is what I have in mind here.

We can easily have matrices of counts or ranks. matrixStats itself has rowRanks() and rowCounts(). Then checking if there are duplicates in rows/columns might be necessary. Non-parametric tests such as Mann-Whitney test is one example.

karoliskoncevicius mentioned this issue Jan 5, 2025

WISH: rowTabulates/colTabulates for floating-point values #274

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rowDuplicated() and rowAnyDuplicated() #152

rowDuplicated() and rowAnyDuplicated() #152

karoliskoncevicius commented Jun 24, 2019

MLopez-Ibanez commented Oct 11, 2022

yaccos commented Oct 11, 2022

MLopez-Ibanez commented Nov 20, 2022

karoliskoncevicius commented Feb 27, 2023

rowDuplicated() and rowAnyDuplicated() #152

rowDuplicated() and rowAnyDuplicated() #152

Comments

karoliskoncevicius commented Jun 24, 2019

MLopez-Ibanez commented Oct 11, 2022

yaccos commented Oct 11, 2022

MLopez-Ibanez commented Nov 20, 2022

karoliskoncevicius commented Feb 27, 2023