A demo of collaborative filtering based on -
1. Latent factors
2. Implicit feedback
3. Neighbourhood
4. SLIM algorithm
To be capable of processing data with large numbers of users and items, the source data is read and processed in chunks, and both the source and target data are stored in PyTables format:
To create the final set of TFRecords for model training, some intermediate arrays are derived, which include, namely, "interaction", "correlation", and "neighbours":
For more explanation about the three arrays, refer to the Jupyter Notebook in ./demo
.
A TensorFlow model was created with the three individual on-off options available:
For more details, refer to the Jupyter Notebook in ./demo
.
The small MovieLens dataset "ml-latest-small" with ~100,000 ratings is used for this demo.
For details about the searched parameters, refer to the Jupyter Notebook in ./demo
.
- Clone this repository
git clone https://github.com/rmwkwok/colabfilter.git
cd colabfilter
- (Recommended) Setup and activate virtual environment
virtualenv venv
source venv/bin/activate
- Install requirements
pip install -r requirements.txt
- (Recommended) Add kernel for the created virtual environment to be usable in Jupyter
python -m ipykernel install --user --name=colabfilter-venv
- Build cython programs (correlations and neighbours are computed by Cython programs)
python setup.py build_ext --inplace
- Open the Jupyter Notebook in
./demo
with Jupyter-lab or Jupyter-notebook
- Verify that the kernel (named "colabfilter-venv") added in the step 4 is there
jupyter kernelspec list
- Uninstall the kernel
jupyter kernelspec uninstall colabfilter-venv
- Deactivate the virtual environment
deactivate
- Delete the folder containing the virtual environment, and delete the cloned repository.