Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowDuplicated() and rowAnyDuplicated() #152

Open
karoliskoncevicius opened this issue Jun 24, 2019 · 4 comments
Open

rowDuplicated() and rowAnyDuplicated() #152

karoliskoncevicius opened this issue Jun 24, 2019 · 4 comments

Comments

@karoliskoncevicius
Copy link

This issue is a question / feature request.

Do you think it would make sense to add functions like duplicated() and anyDuplicated() optimized to work on every row/column to this package?

@MLopez-Ibanez
Copy link

I was looking for this today...

@yaccos
Copy link
Contributor

yaccos commented Oct 11, 2022

matrixStats is primarily intended for numerical operations on matrices, not dataframe-like operations such as duplicated(). Besides, it would only work reliably for integer matrices because double matrices suffer from floating point imprecision.

@MLopez-Ibanez
Copy link

matrixStats is primarily intended for numerical operations on matrices, not dataframe-like operations such as duplicated(). Besides, it would only work reliably for integer matrices because double matrices suffer from floating point imprecision.

It could have a tolerance parameter that defaults to sqrt(.Machine$double.eps) like all.equal(). There are many numerical operations where being able to detect duplicated vectors (or close to duplicated vectors) would be useful.

@karoliskoncevicius
Copy link
Author

@yaccos I would not be quick to agree that duplicated() is a data.frame-like operation. Sure it works on entries of data.frame but it is also used to test if there are repeating values in a vector - this use is what I have in mind here.

We can easily have matrices of counts or ranks. matrixStats itself has rowRanks() and rowCounts(). Then checking if there are duplicates in rows/columns might be necessary. Non-parametric tests such as Mann-Whitney test is one example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants