-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Major refactorings and adaptations for new SOTorrent release
- Loading branch information
Showing
61 changed files
with
346 additions
and
133 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
#!/bin/bash | ||
|
||
if [ "$1" = "so-dump" ]; then | ||
7za a Badges.csv.7z Badges.csv && rm Badges.csv | ||
7za a Comments.csv.7z Comments.csv && rm Comments.csv | ||
7za a PostHistory.csv.7z PostHistory.csv && rm PostHistory.csv | ||
7za a PostLinks.csv.7z PostLinks.csv && rm PostLinks.csv | ||
7za a Posts.csv.7z Posts.csv && rm Posts.csv | ||
7za a Tags.csv.7z Tags.csv && rm Tags.csv | ||
7za a Users.csv.7z Users.csv && rm Users.csv | ||
7za a Votes.csv.7z Votes.csv && rm Votes.csv | ||
elif [ "$1" = "sotorrent" ]; then | ||
7za a PostBlockDiff.csv.7z PostBlockDiff.csv && rm PostBlockDiff.csv | ||
7za a PostVersion.csv.7z PostVersion.csv && rm PostVersion.csv | ||
7za a PostBlockVersion.csv.7z PostBlockVersion.csv && rm PostBlockVersion.csv | ||
7za a PostVersionUrl.csv.7z PostVersionUrl.csv && rm PostVersionUrl.csv | ||
7za a CommentUrl.csv.7z CommentUrl.csv && rm CommentUrl.csv | ||
7za a TitleVersion.csv.7z TitleVersion.csv && rm TitleVersion.csv | ||
7za a StackSnippetVersion.csv.7z StackSnippetVersion.csv && rm StackSnippetVersion.csv | ||
7za a PostViews.csv.7z PostViews.csv && rm PostViews.csv | ||
7za a PostTags.csv.7z PostTags.csv && rm PostTags.csv | ||
fi |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
SELECT temp.PostId AS PostId, tags.Id AS TagId | ||
FROM `sotorrent-org.2020_03_15.Tags` tags | ||
JOIN `sotorrent-org.2020_03_15.PostTagsTemp` temp | ||
ON tags.TagName = temp.Tag; | ||
|
||
=> `sotorrent-org.2020_03_15.PostTags` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
#!/bin/sh | ||
|
||
root_password="_AqUjvtv68E\$N!r]" | ||
sotorrent_password="4ar7JKS2mfgGHiDA" | ||
log_file="sotorrent.log" | ||
sotorrent_db="sotorrent20_03" | ||
|
||
# absolute path to XML and CSV files (consider MySQL's secure-file-priv option) | ||
# escape slashes in path because the string is used in a sed command | ||
data_path="F:\/Temp\/" # Cygwin | ||
#data_path="\/tmp\/" # Linux | ||
|
||
rm -f $log_file | ||
|
||
echo "Creating temporary PostTags table..." | tee -a "$log_file" | ||
mysql $sotorrent_db -u root --password="$root_password" < ./sql/create_posttags_temp.sql >> $log_file 2>&1 | ||
|
||
echo "Loading temporary PostTags table..." | tee -a "$log_file" | ||
sed -e"s/<PATH>/$data_path/g" ./sql/load_posttags_temp.sql > ./sql/load_posttags_temp_absolute_paths.sql | ||
echo "Reading PostTags.xml from $data_path..." | ||
mysql $sotorrent_db -u root --password="$root_password" < ./sql/load_posttags_temp_absolute_paths.sql >> $log_file 2>&1 | ||
rm ./sql/load_posttags_temp_absolute_paths.sql | ||
|
||
echo "Deleting temporary PostTags table..." | tee -a "$log_file" | ||
mysql $sotorrent_db -u root --password="$root_password" < ./sql/delete_posttags_temp.sql >> $log_file 2>&1 | ||
|
||
echo "Finished." | tee -a "$log_file" | ||
|
||
# Next step: Upload table to BigQuery and replace tags by tag ids |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
#!/bin/sh | ||
|
||
root_password="_AqUjvtv68E\$N!r]" | ||
sotorrent_password="4ar7JKS2mfgGHiDA" | ||
log_file="sotorrent.log" | ||
sotorrent_db="sotorrent20_03" | ||
|
||
# absolute path to XML and CSV files (consider MySQL's secure-file-priv option) | ||
# escape slashes in path because the string is used in a sed command | ||
data_path="F:\/Temp\/" # Cygwin | ||
#data_path="\/tmp\/" # Linux | ||
|
||
rm -f $log_file | ||
|
||
echo "Loading PostTags table..." | tee -a "$log_file" | ||
dir=`pwd` | ||
cd "$data_path" | ||
echo "Extracting PostTags.csv.7z in $..." | ||
7za e "PostTags.csv.7z" | ||
cd "$dir" | ||
echo "Reading PostTags.csv from $data_path..." | ||
sed -e"s/<PATH>/$data_path/g" ./sql/load_posttags.sql | sed -e"s/<VERSION>/$version/g" > ./sql/load_posttags_absolute_paths.sql | ||
mysql $sotorrent_db -u root --password="$root_password" < ./sql/load_posttags_absolute_paths.sql >> $log_file 2>&1 | ||
rm ./sql/load_posttags_absolute_paths.sql | ||
cd "$data_path" | ||
rm "PostTags.csv" | ||
cd "$dir" | ||
|
||
echo "Finished." | tee -a "$log_file" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
[ | ||
{ | ||
"mode": "REQUIRED", | ||
"name": "PostId", | ||
"type": "INTEGER" | ||
}, | ||
{ | ||
"mode": "REQUIRED", | ||
"name": "Tag", | ||
"type": "STRING" | ||
} | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
CREATE TABLE `PostTagsTemp` ( | ||
PostId INT NOT NULL, | ||
Tag VARCHAR(40) NOT NULL | ||
); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
DROP TABLE IF EXISTS `PostTagsTemp`; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
SET foreign_key_checks = 0; | ||
LOAD DATA INFILE '<PATH>PostTags.csv' INTO TABLE `PostTags` | ||
CHARACTER SET utf8mb4 | ||
FIELDS TERMINATED BY ',' | ||
OPTIONALLY ENCLOSED BY '\"' | ||
ESCAPED BY '\"' | ||
LINES TERMINATED BY '\n' | ||
IGNORE 1 LINES | ||
(PostId, TagId); | ||
SET foreign_key_checks = 1; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
SET foreign_key_checks = 0; | ||
LOAD DATA INFILE '<PATH>PostTagsTemp.csv' INTO TABLE `PostTagsTemp` | ||
CHARACTER SET utf8mb4 | ||
FIELDS TERMINATED BY ',' | ||
OPTIONALLY ENCLOSED BY '\"' | ||
ESCAPED BY '\"' | ||
LINES TERMINATED BY '\n' | ||
IGNORE 1 LINES | ||
(PostId, Tag); | ||
SET foreign_key_checks = 1; |
Oops, something went wrong.