In this lab you will develop some general-purpose data structures that, with modular design, can be re-used for other labs - most notably, the Tiny Search Engine (Labs 4-5-6).
Grading will focus on CS50 coding style - including consistent formatting, selection of identifier names, and use of meaningful comments - in addition to correctness and testing.
Your C code must compile without producing any compiler warnings.
You will lose points if the compiler produces warnings when using our CS50-standard compiler flags (as shown in the bag/Makefile
).
Recall our policy about programs that crash.
If your submitted code fails to compile, or triggers a segmentation fault, you will fail all/some of our correctness tests, and lose points for correctness on those test cases. Write defensive code: each function should check its pointer parameters for NULL, and take some appropriate (safe) action. Write solid unit tests, test drivers, and use regression testing as your development proceeds.
If your submitted version has known bugs, that is, cases where it fails your own test cases, and you describe those cases in your README file, we will halve the number of points you lose for those cases. In short, it is far better for you to demonstrate you know about the bug than to submit and hope we won't find it.
👉 Accept the assignment, which will lead you to clone a starter kit that implements the bag, file, and mem modules.
👉 Add three new modules, each of which defines a different data structure.
- (25 points) set
- (25 points) counters
- (40 points) hashtable
- (10 points) overall, including git behavior
The specific data structures are defined in the sections below.
In the table below, we compare these data structures with the two we explored in class.
Most of these data structures allow you to store a collection of "items".
Both the set and hashtable are examples of an abstract data structure called a dictionary, which provide methods like insert(key, item)
and item = retrieve(key)
, where the key
allows the structure to distinguish among the items it stores.
Behavior | list | bag | set | counters | hashtable |
---|---|---|---|---|---|
stores an item | yes | yes | yes | no | yes |
uses a key | no | no | yes | yes | yes |
keeps items in order | yes | no | no | no | no |
retrieval | first item | any item | by key | by key | by key |
insertion of duplicates | allowed | allowed | error | increment count | error |
Notice that
- a list keeps items in order, but a
bag
or aset
does not. - a set and hashtable allow you to retrieve a specific item (indicated by its key) whereas a bag might return any item.
- because the bag and list don't distinguish among items they store, they can hold duplicates; the others cannot.
- the counters data structure maintains a set of counters, each identified by a key, but it stores no items. Instead, it keeps a counter for each key. Attempting to insert a duplicate key results in an increment of the counter.
A bag is an unordered collection of items. The bag starts empty, grows as the caller adds one item at a time, and shrinks as the caller extracts one item at a time. It could be empty, or could contain hundreds of items. Items are indistinguishable, so the extract function is free to return any item from the bag.
The starter kit includes our bag module, in which bag.c
implements a bag of void*
, and exports the following functions through bag.h
(see that file for more detailed documentation comments):
bag_t* bag_new(void);
void bag_insert(bag_t* bag, void* item);
void* bag_extract(bag_t* bag);
void bag_print(bag_t* bag, FILE* fp, void (*itemprint)(FILE* fp, void* item));
void bag_iterate(bag_t* bag, void* arg, void (*itemfunc)(void* arg, void* item) );
void bag_delete(bag_t* bag, void (*itemdelete)(void* item) );
A set maintains an unordered collection of (key,item) pairs; any given key can only occur in the set once. It starts out empty and grows as the caller inserts new (key,item) pairs. The caller can retrieve items by asking for their key, but cannot remove or update pairs. Items are distinguished by their key.
Your set.c
should implement a set of void*
items with char*
keys, and export exactly the following functions through set.h
(see that file for more detailed documentation comments):
set_t* set_new(void);
bool set_insert(set_t* set, const char* key, void* item);
void* set_find(set_t* set, const char* key);
void set_print(set_t* set, FILE* fp, void (*itemprint)(FILE* fp, const char* key, void* item) );
void set_iterate(set_t* set, void* arg, void (*itemfunc)(void* arg, const char* key, void* item) );
void set_delete(set_t* set, void (*itemdelete)(void* item) );
A counter set is a set of counters, each distinguished by an integer key.
It's a set - each key can only occur once in the set - and it tracks a counter for each key.
It starts empty.
Each time counters_add
is called on a given key, the corresponding counter is incremented.
The current counter value can be retrieved by asking for the relevant key.
Your counters.c
should implement a set of integer counters with int
keys (where keys must be non-negative) and export exactly the following functions through counters.h
(see that file for more detailed documentation comments):
counters_t* counters_new(void);
int counters_add(counters_t* ctrs, const int key);
int counters_get(counters_t* ctrs, const int key);
bool counters_set(counters_t* ctrs, const int key, const int count);
void counters_print(counters_t* ctrs, FILE* fp);
void counters_iterate(counters_t* ctrs, void* arg, void (*itemfunc)(void* arg, const int key, const int count));
void counters_delete(counters_t* ctrs);
A hashtable is a set of (key,item) pairs. It acts just like a set, but is far more efficient for large collections.
Your hashtable.c
should implement a set of void*
with char*
keys, and export exactly the following functions through hashtable.h
(see that file for more detailed documentation comments):
hashtable_t* hashtable_new(const int num_slots);
bool hashtable_insert(hashtable_t* ht, const char* key, void* item);
void* hashtable_find(hashtable_t* ht, const char* key);
void hashtable_print(hashtable_t* ht, FILE* fp, void (*itemprint)(FILE* fp, const char* key, void* item));
void hashtable_iterate(hashtable_t* ht, void* arg, void (*itemfunc)(void* arg, const char* key, void* item) );
void hashtable_delete(hashtable_t* ht, void (*itemdelete)(void* item) );
The starter kit provides code for the hash function.
- Your modules must implement exactly the above interface.
Do not modify those function prototypes.
Indeed, you should have no need to edit the header (
.h
) files we provide. - If you need support data types (likely
struct
types), those should be defined within your module's source file (set.c
, etc.) so they are not visible to users of the module. - If your module needs helper functions, those should be defined within that module's source file and marked
static
so they are not visible to users of the module. - Your modules must print nothing (except, of course, in the
xxx_print()
function). If you want to add debugging prints, they must be protected by something like#ifdef DEBUG
or#ifdef TEST
. (You can see some examples inbag.c
where we've protected some debugging code with#ifdef MEMTEST
, and a spot in thebag/Makefile
that controls that flag from the compiler command line.) - Your modules must have no global variables.
- Your modules must have no
main()
function; as modules, they are meant to be used by other programs (exception: unit-test code hidden by#ifdef
). - Your modules set and hashtable, like bag, store
void*
items; this type is C's way of describing a "pointer to anything".- The caller (user of your module) must pass a pointer (address of some item) to your code; your data structure holds that pointer, and later returns it to the caller in response to an 'extract' or 'find' operation.
- Your module doesn't know, or doesn't care, what kind of things the items are. Your module doesn't allocate memory for items, free memory for items, or copy items - it just tracks the pointer to the item.
- The caller is responsible for the item pointer, which must be allocated (somehow) by the caller.
The modules'
_delete
function (like a destructor) allows the caller to provide a customitemdelete
function that knows how to free any memory consumed by an item. - For this reason, the caller must avoid inserting the same item (same address) multiple times; later, the
itemdelete
method would be called multiple times on that item... which could lead to trouble.
- Both set and hashtable work with string-type keys.
When adding a new item with
set_insert()
orhashtable_insert()
, both modules make their own copy of the string - presumably in memory allocated bymalloc()
.- The module is then responsible for this memory - and later freeing it - just like any other memory it allocates. This 'copy' semantic is convenient for the caller, who need not worry about how to allocate and manage the key string after inserting it into the set or hashtable.
- You may assume that a non-NULL
key
is a proper C string; that is, it is null-terminated.
- Your code must have no memory leaks. We will check!
- You may find the mem module (provided) useful - or use the native
malloc
andfree
. - You may find valgrind useful.
- You may find the mem module (provided) useful - or use the native
- Your module must have a unit-test mechanism, either included within the module code (see, for example, the bottom of
file.c
in the file module) or as a test driver (see, for example,bagtest.c
in the bag module). - Your modules must each have a
Makefile
to compile and test the module code.- Your
Makefile
must have amake test
target that runs your unit test. - Your
Makefile
should have amake clean
target that cleans up the directory of any files created bymake
ormake test
.
- Your
- Your code should, as always, follow CS50 coding style.
Notice that the module interfaces, defined in the
.h
files we provide, follows that naming convention, preceding each module's function names with the name of the module, and an underscore (e.g.,counters_new()
).
You are encouraged to follow the style and layout of the bag
module when developing new modules.
You can also learn a lot from our binary trees example. You are welcome to copy snippets of code from this (or any other) CS50 example code as long as you add a comment indicating you've done so.
We suggest implementing the set and counters as simplified linked lists, much like we did for bag
.
Each should be an independent implementation because they differ in detail and semantics.
Your hashtable module, on the other hand, should make use of the set data structure. Indeed, your hashtable should likely be an array of pointers to sets. Allocating an array of pointers can be tricky; recall the unit about C arrays.
Linked lists were demonstrated in sorter4.c through sorter7.c, although you will need to generalize. They were also covered in CS10; see the notes.
Hashtables were also covered in CS10: notes, slides.
Each of the four subdirectories of your lab3
repo must include the following files:
-
README.md
, which describes how your program should be compiled and executed, along with an explanation of your approach to the implementation and any assumptions you made about the assignment. Seebag/README.md
for an example. -
Makefile
, with the default target building the relevant.o
file, and two additional "phony" targets:make clean
that cleans the directory of anything built by Make; andmake test
that will compile and run a test program with appropriate parameters and inputs. -
testing.out
, which is the output of runningmake test &> testing.out
inside that subdirectory. Your Makefile may implement the tests itself or run a program or script to handle all the testing. You should include all necessary test materials (C programs, bash scripts, input files) in your submission. Your Makefile and/or test program should include comments explaining your test. As an example, see thebag
module in the starter kit; that program rolls all the tests into one C program. -
.gitignore
, which should list any files you do not want git to commit, ever. Notably, this should include the name of any executable binary programs your Makefile produces, and scratch files produced by your testing programs.
Your lab3
directory must contain the following, plus any programs, scripts, or data files you need for running your tests:
.
|-- .gitignore # provided by starter kit
|-- Makefile # provided by starter kit
|-- README.md # be sure to include your name and username
|-- bag
| |-- .gitignore # provided by starter kit
| |-- Makefile # provided by starter kit
| |-- README.md # provided by starter kit
| |-- bag.c # provided by starter kit
| |-- bag.h # provided by starter kit
| |-- bagtest.c # provided by starter kit
| |-- test.names # provided by starter kit
| `-- testing.out # provided by starter kit
|-- counters
| |-- .gitignore
| |-- Makefile
| |-- README.md
| |-- counters.c
| `-- counters.h # provided by starter kit
|-- hashtable
| |-- .gitignore
| |-- Makefile
| |-- README.md
| |-- hash.c # provided by starter kit
| |-- hash.h # provided by starter kit
| |-- hashtable.c
| `-- hashtable.h # provided by starter kit
|-- lib
| |-- Makefile # provided by starter kit
| |-- README.md # provided by starter kit
| |-- file.c # provided by starter kit
| |-- file.h # provided by starter kit
| |-- mem.c # provided by starter kit
| `-- mem.h # provided by starter kit
`-- set
|-- .gitignore
|-- Makefile
|-- README.md
|-- set.c
`-- set.h # provided by starter kit
This listing is produced by the
tree
program. Neat, huh?
Please read those instructions carefully!