-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
80 recommend movies #81
Conversation
…ot entirely clear what its signature ought to be, but this seems like a decent guess
…o 80-recommend-movies
…hing". Also, DRY up the docstring so it doesn't repeat what the signature already told us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work, glad we're getting movies now. Have a few thoughts and recommendations, but thanks for putting in the time while in Tahoe!
… against large_movie_id
…ess; replace with a constant
Implemented the requested changes. |
… the expected recommended titles came back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few additional comments, but it's basically all set.
Several days ago I responded to all requested code changes with either a SHA1 hash or "this is out-of-scope for the current PR". I don't recall hearing any pushback on "out-of-scope" arguments, but perhaps my memory is faulty. ATM on review number 2 of PR 80 I am seeing that "Sky requests changes". But honestly I have no idea what changes you require in this branch before you would approve it. Please clarify. It is my hope that we can get some edits merged down to I have made many fewer edits that I would have liked to see included. In particular I have been reluctant to mess with the 83-engine-hints branch and its series of bug fixes, based on my (incorrect) belief that 80 would soon hit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't totally know how github reviews work in the context of approvals/comments/requests, and I don't have the best reference yet of when something isn't worth making a comment for, so I'll clarify my intention. I wanted to request the DATA_DIR still be changed after the because the codebase and others on the project have expected the dataset to live in the /data directory. The rest are just observations that weren't crucial, like an unused dependency that I assumed was left over from testing.
We should definitely clarify after this week how large we expect prs to be in a week and what sorts of things merit a request for a current pr instead of pushing to another pr, but this still looks great. Thanks for all the help.
I don't exactly understand what that means, but that's cool. I invite you to submit a PR to that effect, which I will quickly approve. I agree that we do not often see project participants doing something like |
This PR puts Netflix prize data in a format where it can be conveniently queried and subsetted. Call me crazy, but I tend to think in JOINs, and I appreciate having an RDBMS sweat the memory management details for cases where not everything will conveniently fit in core. So sqlalchemy is managing some sqlite tables. They are safe to DROP, or alternatively just
rm out/movies.sqlite
and they're all gone and will be rebuilt on the next run.This PR also trains a LightFM model on (some of) the prize data, and shows how to make predictions. They are, frankly, unimpressive, but I figured we need something in the code base that demonstrates how to do it, so we can all build upon it. The tests run quickly, to support an interactive edit-run cycle.
The
etl(max_rows=101_000_000)
call ingests everything. And then it will be preserved -- we skip ETL if a table is already populated. Setting it to a smaller number makes interactive edit-debug cycles go much faster.pytest
(pipenv run test) regrettably suppresses the initial ETL progress bar, so consider instead using "pipenv run python -m unittest tests//_test.py" during that first twenty-minute run.The "give me two movies you like and I will recommend some more" is frankly still aspirational at this point, but we're not far from it. That's where
two_movies_rec_test.py
got its name.There are some design notes in
make_recommendation.py
, which may serve as the basis for future PRs. Once this merges down tomain
, it's possible we will view thelightfm_recommendation.py
module as obsolete and slated for deletion. It did offer me some inspiration for the current work.I'm happy to accept any and all comments, but please be sensitive to what is in-scope for the current PR as opposed to a subsequent PR. It would be useful to get this merged down, with light edits, this week.