Study_Datascience

1. Linear Regression

SST(total) = SSR(explained) + SSE(unexplained)
OLS (ordinary least squares) model - minimum SSE [lowest error]
R-squared = SSR/SST (measure how much of the total variability of the dataset, ranging 0 to 1 variability)

2. Multiple Linear Regression

adjusted R-squared (<R-squared) measures how well your model fits the data, but it penalizes the use of variables that are meaningless for the regression
increases R-squared but decreases adjusted R-squared ⇒ omit the variable

3. Logistic Regression

-> The logistic regression predicts the probability of an event occurring

Maximum likelihood estimation (MLE) : estimates how likely it is that the model at hand describes the real underlying relationship of the variables
LL-null ( log likelihood-null) : the log-likelihood of a model which has no independent variables
LLR(log likelihood ratio) : measures if our model is statistically different from LL-null

4. k-means clustering

multivariate statistical technique that groups observations on the basis some of their features or variables they are described by

→ maximize the similarity of observations within a cluster and maximize the dissimilarity between clusters

1. choose the num of clusters [The elbow method]
  * minimize the distance between points in a cluster (low WCSS within-cluster sum of squares)
2. specify the cluster seeds
3. assign each point to a centroid
4. adjust the centroids

5. Hierarchical clustering

pros and cons of dendrogram

Pros

Hierarchical clustering shows all the possible linkages between clusters
No need to preset the number of clusters like K-means
Many methods to perform hierarchical clustering

Cons

Scaleability
Computational expensive

6. Basic Neural Network

'Data - Model - Objective function - Optimization algorithm'

The objective function

: a measure of how well our model’s outputs match the targets.

two types

loss (supervised learning)
- The lower the loss function, the higher the level of accuracy of the model
reward (reinforcement learning)
- The higher the reward function, the higher the level of accuracy of the model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Study_Datascience

1. Linear Regression

2. Multiple Linear Regression

3. Logistic Regression

4. k-means clustering

5. Hierarchical clustering

6. Basic Neural Network

Files

README.md

Latest commit

History

README.md

File metadata and controls

Study_Datascience

1. Linear Regression

2. Multiple Linear Regression

3. Logistic Regression

4. k-means clustering

5. Hierarchical clustering

6. Basic Neural Network