Click rate prediction algorithm

title	author	date	output
README	Dusan Grubjesic email: [email protected]	August 11, 2015	html_document

Click rate prediction algorithm

This is click rate prediction algorithm using spark, writen in python api of spark: pyspark.

Data

Data was taken from Criteo Labs and is sample of Kaggle Display Advertising Challenge Dataset. It can be downloaded after you accept the agreement http://labs.criteo.com/downloads/2014-kaggle-display-advertising-challenge-dataset/.

It is structured as lines of observations where first is click or no click(1,0) and rest is features

Before start

You must have installed apache spark and python. Also you have to change location of sample in ClickRate.py to where you downloaded it and spark context if you want to change from local to cluster. Sh file is only used for simpler starting and if you want to use it you have to change to your settings.

_{I have apache spark pre-bult with hadoop 2.6, python 3.4 and numpy package installed}

Process

Sample is first parsed and loaded in context.
Transformed so it can be used in logistic regression
Model created from train data
Set of log loss validations
Iterations of logistic regressions for best hyperparamaters

_{additional explanations are in code}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
ClickRate.py		ClickRate.py
README.md		README.md
startClickRate.sh		startClickRate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Click rate prediction algorithm

Data

Before start

Process

About

Releases

Packages

Languages

dusanGrubjesic/Spark_Linear_Regression

Folders and files

Latest commit

History

Repository files navigation

Click rate prediction algorithm

Data

Before start

Process

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages