GitHub - boostcampaitech6/level1-bookratingprediction-recsys-04: level1-bookratingprediction-recsys-04 created by GitHub Classroom

요약

본 프로젝트는 사용자와 아이템의 상호작용 정보와 메타 데이터를 활용하여 소비자의 책 평점 예측을 위한 효과적인 모델을 제작하는 것에 목적이 있다. 다양한 유형의 데이터를 활용하기 위해 FFM, DCN, DeepCoNN, ROP-CNN, CNN_FM, Catboost, XGBoost과 같은 여러 모델들을 사용하였다. 이 모델들의 장점을 취합하기 위해 이를 앙상블하여 최종적인 결과를 도출하였다. 본 프로젝트의 결과 catboost : DCN : others를 7:2:1의 비율로 앙상블한 모델의 Test RMSE가 2.14로 가장 성능이 높았으며 최종적으로 이를 제출하였다.

개요

뉴스기사나 짧은 러닝 타임의 동영상처럼 간결하게 콘텐츠를 즐길 수 있는 ‘숏폼 콘텐츠’에 비해 소비자들이 부담 없이 쉽게 선택할 수 있지만, 책은 완독을 위해 보다 긴 물리적인 시간이 필요하다. 또한 제목, 저자, 표지, 카테고리 등 한정된 정보로 내용을 유추하고 구매를 결정해야 하므로 선택에 더욱 신중을 가하게 된다. 우리는 소비자들의 책 구매 결정에 대한 도움을 주고자 책에 대한 메타 데이터(books), 고객에 대한 메타 데이터(users), 고객이 책에 남긴 평점 데이터(ratings)를 활용해, 1과 10 사이 평점을 예측하는 모델을 구축하고자 한다.

개발 환경

python == 3.10
pytorch == 1.12.1

conda create -n book_env python=3.10
conda activate book_env
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia pip install -r requirement.txt

멤버


김시윤_T6025	박승아_T6059	이재권_T6131	이현주_T6143	장재원_T6149	홍훈_T6188

팀 구성 및 역할

프로젝트 수행 절차

프로젝트 구조

📦 level1-bookratingprediction-recsys-04
├─ 📂 code
│  ├─ 📂 data
│  │  ├─ 📂 images
│  │  ├─ books.csv
│  │  ├─ sample_submission.csv
│  │  ├─ test_ratings.csv
│  │  ├─ train_ratings.csv
│  │  └─ users.csv
│  ├─ 📂 src
│  │  ├─ 📂 data
│  │  │  ├─ __init__.py
│  │  │  ├─ context_data.py
│  │  │  ├─ dl_data.py
│  │  │  ├─ ffm_preprocessing.py
│  │  │  ├─ image_data.py
│  │  │  └─ text_data.py
│  │  ├─ 📂 ensembles
│  │  │  └─ ensembles.py
│  │  ├─ 📂 model
│  │  │  ├─ 📂 CNN_FM
│  │  │  │  ├─ CNN_FM_model.py
│  │  │  │  └─ __init__.py
│  │  │  ├─ 📂 DCN
│  │  │  │  ├─ DCN_model.py
│  │  │  │  └─ __init__.py
│  │  │  ├─ 📂 DeepCoNN
│  │  │  │  ├─ DeepCoNN_model.py
│  │  │  │  └─ __init__.py
│  │  │  ├─ 📂 ROP_CNN
│  │  │  │  ├─ ROP_CNN_model.py
│  │  │  │  └─ __init__.py
│  │  │  └─ __init__.py
│  │  ├─ 📂 train
│  │  │  ├─ __init__.py
│  │  │  └─ trainer.py
│  │  ├─ __init__.py
│  │  └─ utils.py
│  ├─ Catboost.py
│  ├─ automl.py
│  ├─ ensemble.py
│  ├─ ensembles.sh
│  ├─ evaluation.py
│  ├─ main.py
│  ├─ postpro.py
│  ├─ rmse2w.py
│  ├─ xgb_main.py
│  └─ requirement.txt
├─ 📂 scripts
│  └─ run_scripts.sh
├─ .gitignore
└─ 📝 README.md

©generated by Project Tree Generator

실행 방법

개별 모델

FFM, DCN, CNN-FM, DeepCoNN, ROP-CNN

python main.py --model model_name

CatBoost

python Catboost.py

XGBoost

python xgb_main.py

AutoML

python automl.py

앙상블

python ensemble.py --ensemble_files file_names

실험 결과

RMSE

Ensemble 1 : Catboost, DCN-par ensemble (weight: 0.7, 0.3)

전체적으로 성능이 뛰어났던 Catboost와 DCN-par 모델을 앙상블하여 개별 모델보다 향상된 결과를 도출하였다.
Test RMSE는 public 2.140, private 2.133으로 가장 좋은 성능을 보였다.

Ensemble 2 : Catboost, DCN-par, Others(XGboost, H2OAutoML, CNN-FM, DeepCoNN, ROP_CNN) (weight: 0.7, 0.2, 0.1)

다양한 모델을 앙상블하여 모델의 단점을 상호보완하려는 시도를 하였다.
머신러닝 기반 모델부터 이미지, 텍스트를 활용하는 모델들까지 모두 취합하여 앙상블하였다.
Test RMSE는 public 2.141, private 2.134로 나타났다.

Report

Wrap Up Report

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github		.github
code		code
scripts		scripts
.gitignore		.gitignore
README.md		README.md
WrapUpReport.pdf		WrapUpReport.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

요약

개요

개발 환경

멤버

팀 구성 및 역할

프로젝트 수행 절차

프로젝트 구조

실행 방법

개별 모델

앙상블

실험 결과

RMSE

Ensemble 1 : Catboost, DCN-par ensemble (weight: 0.7, 0.3)

Ensemble 2 : Catboost, DCN-par, Others(XGboost, H2OAutoML, CNN-FM, DeepCoNN, ROP_CNN) (weight: 0.7, 0.2, 0.1)

Report

About

Releases

Packages

Contributors 6

Languages

boostcampaitech6/level1-bookratingprediction-recsys-04

Folders and files

Latest commit

History

Repository files navigation

요약

개요

개발 환경

멤버

팀 구성 및 역할

프로젝트 수행 절차

프로젝트 구조

실행 방법

개별 모델

앙상블

실험 결과

RMSE

Ensemble 1 : Catboost, DCN-par ensemble (weight: 0.7, 0.3)

Ensemble 2 : Catboost, DCN-par, Others(XGboost, H2OAutoML, CNN-FM, DeepCoNN, ROP_CNN) (weight: 0.7, 0.2, 0.1)

Report

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages