Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate row.names error message #6

Open
jackgisby opened this issue Jul 26, 2020 · 3 comments
Open

Duplicate row.names error message #6

jackgisby opened this issue Jul 26, 2020 · 3 comments

Comments

@jackgisby
Copy link

jackgisby commented Jul 26, 2020

Originally posted as a stackoverflow question

Whilst attempting to run consensus clustering using M3C, I get an error - my console output (actual row names changed for example code):

running consensus cluster algorithm for real data...
done.
Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘ABCDEF’, ‘ABCDGH’ 
> traceback()
6: stop("duplicate 'row.names' are not allowed")
5: `.rowNamesDF<-`(x, value = value)
4: `row.names<-.data.frame`(`*tmp*`, value = newerdes$ID)
3: `row.names<-`(`*tmp*`, value = newerdes$ID)
2: M3Creal(as.matrix(mydata), maxK = maxK, reps = repsreal, pItem = pItem, 
       pFeature = 1, clusterAlg = clusteralg, distance = distance, 
       title = "/home/christopher/Desktop/", des = des, lthick = lthick, 
       dotsize = dotsize, x1 = pacx1, x2 = pacx2, seed = seed, removeplots = removeplots, 
       silent = silent, fsize = fsize, method = method, objective = objective)
1: M3C::M3C(dissADJ, iters = 25, repsref = 1, repsreal = 100, clusteralg = "hc", 
       objective = "PAC", cores = 3)

I ran the equivalent of the following using M3C:

df_wide_matrix  # my expression matrix
any(duplicated(colnames(df_wide_matrix)))  # result = FALSE

M3C::M3C(df_wide_matrix, iters=2, repsref=2, repsreal=2, clusteralg="hc", objective="PAC")

I assumed the issue is caused by the fact the first four characters of each of these features are equal ("ABCD"). I therefore temporarily changed their respective names prior to running M3C:

dup_ids <- which(colnames(dissADJ) %in% c("ABCDEF", "ABCDGH"))
colnames(dissADJ)[dup_ids] <- c("A", "B")

M3C::M3C(df_wide_matrix, iters=2, repsref=2, repsreal=2, clusteralg="hc", objective="PAC")

M3C then runs correctly. This works as a solution, but was wondering if I had missed something or if this is a bug?

@hamidghaedi
Copy link

Working on TCGA data , I am getting the same error:

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed

Trying what @jack mentioned- trimming sample ID initiate with a unique character string, error turned to :

Error in `[.data.frame`(df, neworder2) : undefined columns selected
> traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(df, neworder2)
3: df[neworder2]
2: M3Creal(as.matrix(mydata), maxK = maxK, reps = repsreal, pItem = pItem, 
       pFeature = 1, clusterAlg = clusteralg, distance = distance, 
       title = "/home/christopher/Desktop/", des = des, lthick = lthick, 
       dotsize = dotsize, x1 = pacx1, x2 = pacx2, seed = seed, removeplots = removeplots, 
       silent = silent, fsize = fsize, method = method, objective = objective)
1: M3C(pro.vst, des = clin, removeplots = FALSE, iters = 25, objective = "PAC", 
       fsize = 8, lthick = 1, dotsize = 1.25)

@CamGriffiths
Copy link

CamGriffiths commented Dec 10, 2020

I got the same error as @hamidghaedi while running M3C. I managed to track it down to the following line of code (line 476 on the M3C.R file):

df <- data.frame(m_matrix)

Many of my sample names (column names) started with a number and the data.frame() function added an "X" to the beginning of each name that started with a number ("1" becomes "X1"). This caused a mismatch with the names listed in neworder2.

To get around this problem, I changed all of my sample names to start with a letter and M3C is now running correctly.

Edit: This workaround can be easily applied by using the data.frame() function on your input dataset before running M3C.

@hamidghaedi
Copy link

Cool. I will try this solution.
If you mind, please post your solution on StackOverflow entry also:
https://stackoverflow.com/questions/65010759/clustering-by-m3c-package-error-in-data-framedf-neworder2-undefined-c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants