this algorithm may be faster? #84

ruananswer · 2017-03-01T07:23:40Z

Twitter anomaly detection contain two parts: stl decomposition and generalized esd anomaly detection.
first, stl decomposition, if we use Coarse granularity to decompose the series, the time will decrease，this just a trick, but it can help reduce the time.
second, generalized esd can easily be implemented in o(nlogn), not o(n^2) in this repository. just like this:

medianIndex <- n/2
left <- 1
right <- n
nowLength <- n
 for (i in 1L:max_outliers) {
 if(one_tail){
   p <- 1 - alpha/(n-i+1)
 } else {
   p <- 1 - alpha/(2*(n-i+1))
}

 t <- qt(p,(n-i-1L))
lambda_critical <- t*(n-i) / sqrt((n-i-1+t**2)*(n-i+1))if (left >= right) break
if (nowLength < 1) break
# remove largest
# remove the max diff   left or right
if (abs(data[[2L]][aresOrder[left]] - dataMedian) > abs(data[[2L]][aresOrder[right]] - dataMedian)) {
  temp_max_idx <- aresOrder[left]
  left <- left + 1
  medianIndex <- medianIndex + 1
}
else {
  temp_max_idx <- aresOrder[right]
  right <- right - 1
  medianIndex <- medianIndex - 1
}
# get the R
R <- abs((data[[2L]][temp_max_idx] - dataMedian) / dataStd)

# recalculate the dataMean and dataStd
# use math sd
#dataStd <- sqrt((nowLength * (dataStd**2 + dataMean**2) - data[[2L]][temp_max_idx]**2 - (nowLength * dataMean - data[[2L]][temp_max_idx])**2/(nowLength - 1)) / (nowLength - 1))
# use statics sd
dataStd <- sqrt(((nowLength - 1) * (dataStd**2 + dataMean**2) - data[[2L]][temp_max_idx]**2 - ((nowLength - 1) * dataMean - data[[2L]][temp_max_idx])**2/(nowLength - 2)) / (nowLength - 2))
dataMean <- (dataMean * nowLength - data[[2L]][temp_max_idx]) / (nowLength - 1)
dataMedian <- data[[2L]][aresOrder[medianIndex]]
nowLength <- nowLength - 1
#record the inx
R_idx[i] <- data[[1L]][temp_max_idx]
R_score[i] <- R
if (R < lambda_critical || is.nan(dataStd)) {
  break
}
#points(temp_max_idx, data[[2L]][temp_max_idx], col = "red")

num_anoms <- i

}

for more detail please read anomalydetection
also i have a java implememtion twitter-anomalyDetection-java

The text was updated successfully, but these errors were encountered:

ruananswer · 2017-03-01T07:26:53Z

i have some experiments, the size of 10000 series data as input, the anomaly detection part raised 60 times compared with older.

fedemolina · 2020-05-05T13:19:02Z

This was added?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

this algorithm may be faster? #84

this algorithm may be faster? #84

ruananswer commented Mar 1, 2017

ruananswer commented Mar 1, 2017

fedemolina commented May 5, 2020

this algorithm may be faster? #84

this algorithm may be faster? #84

Comments

ruananswer commented Mar 1, 2017

ruananswer commented Mar 1, 2017

fedemolina commented May 5, 2020