forked from sameeragarwal/blinkdb
-
Notifications
You must be signed in to change notification settings - Fork 4
SampleClean Retreat Summer 2014 Demo
Sanjay Krishnan edited this page May 15, 2014
·
8 revisions
Create the table
sampleclean> CREATE TABLE wikipedia (title string, text string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '#' LINES TERMINATED BY '\n';
OK
Time taken: 0.035 seconds
Load the data into the table
sampleclean> LOAD DATA LOCAL INPATH 'data/files/wikipedia_abstracts.csv'
OVERWRITE INTO TABLE wikipedia;
Copying data from file:/home/sanjayk/sampleclean_dev/blinkdb/data/files/wikipedia_abstracts.csv
Copying file: file:/home/sanjayk/sampleclean_dev/blinkdb/data/files/wikipedia_abstracts.csv
Loading data to table default.wikipedia
Deleted file:/user/hive/warehouse/wikipedia
OK
Time taken: 14.109 seconds
Run some queries on the dataset
sampleclean> SELECT COUNT(1) FROM wikipedia;
OK
4004479
Time taken: 8.535 seconds
Number of articles Referring to "Apple" sampleclean> SELECT COUNT(1) FROM wikipedia where lower(text) like '%apple%'; OK 11261 Time taken: 36.853 seconds
Number of articles Referring to "Google" sampleclean> SELECT COUNT(1) FROM wikipedia where lower(text) like '%google%'; OK 7400 Time taken: 33.109 seconds