Steps:

Steps for SIFT extraction

build_vocabulary.m

get images
extract sift features from images
get descriptors from extracted features
cluster the descriptors

will find similar features in each image and create visual words for each of it
obtain dictionary with visual words.

get_bag_of_sifts.m

extract sift features of the image
get the descriptor for each point
match the feature descriptors with the vocabulary of visual words (vocab.mat)
build the histogram with the features descriptors

it will be created with the frequency of each feature in an image each feature will correspond to a visual word in the dictionary
the visual words with the highest frequency will is the class of that image (prediction)

visual words -> a set of numbers representing a feature

spatial_pyramid.m

get images
extract sift features from images
get descriptors from extracted features
find the minimum distance of the the extracted features and the ons from the already computed vocabulary D = vl_alldist2(vocab',features)
[~,ind] = min(D);.
construct a histogram with those values.

It will be the histogram with SIFT features for Level 0 of the pyramid.
Create a matrix with the total levels of the pyramid 6.1 Each level will have a number of quadrants 6.2 Each quadrant will be represented with a histogram of its SIFT features. 6.3 Then each level will have those histograms concantated into a row, for the pyramid.

In will result into a bigger histogram
Apply the appropriate weight to each level