Skip to content
This repository has been archived by the owner on Jan 13, 2022. It is now read-only.

MonetDBLite removed from CRAN #38

Open
KnutJaegersberg opened this issue Apr 22, 2019 · 15 comments
Open

MonetDBLite removed from CRAN #38

KnutJaegersberg opened this issue Apr 22, 2019 · 15 comments

Comments

@KnutJaegersberg
Copy link

MonetDBLite removed from CRAN - do some dependencies have check issues or something?

@vadimnazarov
Copy link

I would like to know about this issue as well. My R package depends on MonetDBLite and it's quite unfortunate to suddenly realise that people can't install it due to MonetDBLite sudden disappearance to nowhere. Is it possible to somehow help so the solution will be up quickly?

@vadimnazarov
Copy link

Any updates?

@cboettig
Copy link

cboettig commented May 8, 2019

@vadimnazarov I understand that the continued changes needed to track the MonetDB codebase (while keeping step with CRAN's changing requirements for checks) have led @hannesmuehleisen and team to develop a new database package at https://github.com/cwida/duckdb which promises several improvements over MonetDBLite. Hannes can no doubt provide more details but meanwhile you might want to keep an eye on duckdb or take it for a spin! Hopefully it will be on CRAN soon.

@vadimnazarov
Copy link

I see, thank you for notifying! So did I understand you correctly: there will be no MonetDB for R, but MonetDB itself will live and thrive?

@hannes
Copy link
Contributor

hannes commented May 10, 2019

MonetDB itself will live on, yes. Thanks @cboettig for the explanation here.

@vadimnazarov
Copy link

Got it, thank you! Can't wait for the duckdb on CRAN. On the side note - will there be any workaround to use MonetDB from R? What to do if I want to connect to the existing MonetDB database, and don't use the embedded database?

@palmaresk8
Copy link

I think we all need a word about this because several packages now depend on MonetDBLite and it was becoming a standard for data analysis on R (see all the examples of https://github.com/ajdamico/asdfree). In my case, I was using MonetDBLite on Python and R on a Windows platform. Also, I was using MonetDBLite not only as an embedded database but also to connect to a MonetDB Server database. So you can imagine my surprise when I updated to R 3.6 and discovered that MonetDBLite was not in CRAN anymore. Now I really don't know what features will remain in this new package and what features will be drop forever.

@nilescbn
Copy link

nilescbn commented Jun 9, 2019

I too want to express some disappointment that MonetDBLite is going away. At the same time, I'm very appreciative of those who have the skills and dedication to work on open source projects like this. Not having those skills, I can only imagine the effort it takes to maintain MonetDBLite.

The duckdb project does look exciting. Will it be as fast as MonetDBLite?

MonetDBLite and dplyr have become my preferred method for working with a dataset that's ~1.7 GB in size (just over 3 million rows and 91 columns). Even when I just want to load the whole thing into memory, I've found nothing faster than MonetDBLite (this includes vroom, data.table's fread(), and the fst package).

And for query-like data manipulations, using dplyr with MonetDBLite on disk is faster for many things than using dplyr with the data in memory. I'm also a huge fan of data.table. It's just slightly faster than MonetDBLite for the things I do.

I installed duckdb this weekend and played around with it. It's great to see that the dplyr compatibility is already working. Yet it seems to be much slower. Using the same dataset, loads the data ~7x slower than MonetDBLite.

Again, very appreciative of the time and efforts!

@hannes
Copy link
Contributor

hannes commented Jun 11, 2019

I installed duckdb this weekend and played around with it. It's great to see that the dplyr compatibility is already working. Yet it seems to be much slower. Using the same dataset, loads the data ~7x slower than MonetDBLite.

We have not optimized the loader yet, it will happen though.

@hannes
Copy link
Contributor

hannes commented Jun 13, 2019

@nilescbn @Mytherin has just pushed upgrades to the CSV loader, please try again and see if the performance issue is still present.

@nilescbn
Copy link

nilescbn commented Jul 7, 2019

@hannesmuehleisen, my apologies, I didn't see the notification of your last message. I only noticed today as I was browsing for updates. I tried updating duckdb earlier, using remotes with build = FALSE, but the install failed this time (on both Windows 10 and Linux Mint). I will keep trying it. To be clear, the performance issue I was having related to loading the data into R from a MonetDBLite table (i.e. using dplyr's collect() function). I don't know if that's connected to the CSV loader or not. Either way, I'm looking forward to trying it out.

@winston-p
Copy link

I'm curious: why can't we install MonetDBLite from Github directly? Is that version still working/stable?

@cboettig
Copy link

cboettig commented Jan 1, 2020

@winston-p you can, of course. But without it on CRAN other users cannot publish packages to CRAN that depend on MonetDBLite. duckdb has been working well for me on windows, mac and linux, looking forward to seeing it on CRAN.

@winston-p
Copy link

@cboettig I see, thanks for explaining!

@nilescbn
Copy link

nilescbn commented Feb 2, 2020

I was able to get duckdb installed again thanks to the CRAN-like repo: duckdb/duckdb#392.

Thank you for creating that @hannesmuehleisen.

Comparing speeds again, I'm seeing duckdb close the gap some but yet MonetDBLite is still ~2x faster in completing queries. For others who have similar questions about speed differences, these two issues may be of interest:

duckdb/duckdb#407

duckdb/duckdb#11

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants