The word frequency utility.
freq
counts the words in a document and displays their frequencies.
Here is an example output of the first few results of freq
when run over the
lyrics to
Daft Punk's Technologic:
$ freq technologic.txt | head
quick: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
technologic: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
erase: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
fix: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
trash: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
change: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
mail: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
upgrade: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
charge: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
point: ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
NOTE: The example above excludes the word "it"
Your mission, if you wish to accept it, is to write freq
based on the
following requirements:
- Read an input file (either specified by command-line arguments or from STDIN)
- Scrape out all words containing only letters in the english alphabet
- For simplicity's sake, convert all scraped words to lowercase
- Count the frequency of each word found
- Display the word followed by a bar representing how many times that word appears in the document
Here are some edge cases you may come across:
Input | Expected Result |
---|---|
hello | hello |
well-dressed | well dressed |
Jim O'Heir | jim oheir |
Hello, world! | hello world |
<span style="float: right"> | span style float right |
Got time to spare? Try implementing these features:
- Allow the user to specify a minimum word length
- Sort entries by frequency
- Allow the user to specify a max bar length
- If the input file is a web address, fetch the document from the web
Good luck! We're all counting on you.
by Jordan Scales (http://jordanscales.com). Stevens Open Source Society, Fall 2013.