Skip to content
This repository has been archived by the owner on Nov 13, 2021. It is now read-only.

this algorithm may be faster? #84

Open
ruananswer opened this issue Mar 1, 2017 · 2 comments
Open

this algorithm may be faster? #84

ruananswer opened this issue Mar 1, 2017 · 2 comments

Comments

@ruananswer
Copy link

Twitter anomaly detection contain two parts: stl decomposition and generalized esd anomaly detection.
first, stl decomposition, if we use Coarse granularity to decompose the series, the time will decrease,this just a trick, but it can help reduce the time.
second, generalized esd can easily be implemented in o(nlogn), not o(n^2) in this repository. just like this:

medianIndex <- n/2
left <- 1
right <- n
nowLength <- n
 for (i in 1L:max_outliers) {
 if(one_tail){
   p <- 1 - alpha/(n-i+1)
 } else {
   p <- 1 - alpha/(2*(n-i+1))
}

 t <- qt(p,(n-i-1L))
lambda_critical <- t*(n-i) / sqrt((n-i-1+t**2)*(n-i+1))if (left >= right) break
if (nowLength < 1) break
# remove largest
# remove the max diff   left or right
if (abs(data[[2L]][aresOrder[left]] - dataMedian) > abs(data[[2L]][aresOrder[right]] - dataMedian)) {
  temp_max_idx <- aresOrder[left]
  left <- left + 1
  medianIndex <- medianIndex + 1
}
else {
  temp_max_idx <- aresOrder[right]
  right <- right - 1
  medianIndex <- medianIndex - 1
}
# get the R
R <- abs((data[[2L]][temp_max_idx] - dataMedian) / dataStd)

# recalculate the dataMean and dataStd
# use math sd
#dataStd <- sqrt((nowLength * (dataStd**2 + dataMean**2) - data[[2L]][temp_max_idx]**2 - (nowLength * dataMean - data[[2L]][temp_max_idx])**2/(nowLength - 1)) / (nowLength - 1))
# use statics sd
dataStd <- sqrt(((nowLength - 1) * (dataStd**2 + dataMean**2) - data[[2L]][temp_max_idx]**2 - ((nowLength - 1) * dataMean - data[[2L]][temp_max_idx])**2/(nowLength - 2)) / (nowLength - 2))
dataMean <- (dataMean * nowLength - data[[2L]][temp_max_idx]) / (nowLength - 1)
dataMedian <- data[[2L]][aresOrder[medianIndex]]
nowLength <- nowLength - 1
#record the inx
R_idx[i] <- data[[1L]][temp_max_idx]
R_score[i] <- R
if (R < lambda_critical || is.nan(dataStd)) {
  break
}
#points(temp_max_idx, data[[2L]][temp_max_idx], col = "red")

num_anoms <- i

}

for more detail please read anomalydetection
also i have a java implememtion twitter-anomalyDetection-java

@ruananswer
Copy link
Author

i have some experiments, the size of 10000 series data as input, the anomaly detection part raised 60 times compared with older.

@fedemolina
Copy link

This was added?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants