Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download data #72

Merged
merged 9 commits into from
Feb 11, 2025
Merged

Download data #72

merged 9 commits into from
Feb 11, 2025

Conversation

JamesKohlsRepo
Copy link
Collaborator

Converted my fork into a branch called Download-Data

these are intended to be run as pipenv scripts:
pipenv run download_data
pipenv run delete_data

they can also be run from file
Updated Pipfile to specify versions.
Updated Scripts definitions.
Moved scripts to process_data repository.
Edited header comment in download_data.
Pipfile Show resolved Hide resolved
mediabridge/data_processing/download_data.py Show resolved Hide resolved
@jhanley634
Copy link
Collaborator

Hullo, James. Just checking in. This branch looks like it's nearly ready for a merge. Do my remarks mostly make sense? Is there any trouble with $ git mergetool for resolving merge conflicts against main? We are here to help -- pipe up and we can lend a hand.

Copy link
Collaborator

@skyfenton skyfenton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good, but make sure to merge changes from main and delete your other PR.

Corrected nits and did some clean-up. Simplified the download_data function with pathlib.
generate new pipfile
Manually combined pipfile and regenerated pipfile.lock, reverted requests version to try and avoid problems
tried to fix errors in pipfile
Copy link
Collaborator

@skyfenton skyfenton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, just a couple extra comments to help get this finished! Basically it seems like just five(ish) changes (take as many commits as you need):

  1. Modify the download_data operations to extract the tar.gz download to the same folder, delete the original file, then extract training_set.tar into training_set/.
  2. Simplify clean_up_data (honestly, probably could just remove it and use just rm).
  3. Reorder pipfile alphabetically
  4. Delete pipfile.lock and run pipenv install again to regenerate it.
  5. Pull from main and resolve any merge conflicts.

mediabridge/data_processing/download_data.py Show resolved Hide resolved
Pipfile Outdated Show resolved Hide resolved
@audiodude audiodude dismissed skyfenton’s stale review February 11, 2025 16:03

Good enough to merge as-is

@audiodude
Copy link
Collaborator

audiodude commented Feb 11, 2025

I've fixed the merge conflict with the following commands:

git checkout download-data
git merge main
code .
git add Pipfile
rm Pipfile.lock
pipenv install
git add Pipfile.lock
git commit
git push origin download-data

The code . command is where I opened Pipfile in VSCode and manually merged the changes.

Let's merge this now and work on follow ups in #85. I've copied any remain comments into that issue and resolved them.

@audiodude audiodude merged commit 6a5336c into main Feb 11, 2025
4 checks passed
@audiodude audiodude deleted the download-data branch February 11, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide script that downloads Netflix prize data and puts it in the right place
4 participants