-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When low_loss increases, why does annualized loss decrease? #11
Comments
low_loss
increases, why does annualized loss decrease?
It appears this happens for the first ~0.5% of the range, after which it starts to behave as I would expect (losses increase as the out_al = widgets.Output()
def distribution(frequency, low_loss, high_loss):
factor = -0.5 / norm.ppf(0.05)
mu = (math.log(low_loss) + math.log(high_loss)) / 2.
shape = factor * (math.log(high_loss) - math.log(low_loss))
return lognorm(shape, scale=math.exp(mu))
def annualized_loss(frequency, low_loss, high_loss):
return frequency * distribution(frequency, low_loss, high_loss).mean()
def gen_al_curve(freq, mx):
r = list(range(1, mx))
line = []
for i in r:
l = annualized_loss(freq, i, mx)
if len(line) > 0 and l > line[len(line) - 1]:
print(f'inflects at {i} ({l})')
line.append(l)
plt.plot(line)
with out_al:
plt.show()
display(out_al)
gen_al_curve(0.5, 10000) And see the following: Which shows the inflection happens when But this does lead to an interesting problem. Say I was trying to model the risk posed by MITRE ATT&CK Impact T1531 (Account Access Removal). Loss of access to a single account might have an average cost of $2 or so and be very likely (say 99% chance it will happen to at least one customer a year). But a large breach that resulted in thousands of customers loosing access to accounts could be very costly (lets say $1,000,000). If I model this as one loss scenario I would have:
Which doesn't seem to accurately capture the risk (in 1 year I'm getting roughly 4 * I know I've put a bit of a wall of text here, so to summarize:
|
Re: summary 1, yes it's expected. The lognormal shape is proportional to the difference log(hi) - log(low) which is equivalent to saying it's the ratio of high/low. The lognormal is by definition non-negative, with high shape values causing the left side of the distribution to be very steep near the origin, which causes the right side of the distribution to explode out to higher values. The Wikipedia entry for lognormal distribution has a diagram which shows this effect (where sigma is the shape value). Re: summary 2, yes, if you're using the lognormal, usually you don't want the high/low ratio to be so high (500,000 in your example) because the distribution is unusually skewed, leading to the unrealistic values for annual loss that you observed. There are a few things I can suggest: first, you're treating the frequency as a probability that can't exceed 1, but it can exceed 1 when you intend to say that it is expected to occur more than once a year on average. (Note that I had a bug that enforced frequency <=1 but that's fixed since PR#10. Sorry if that caused confusion) Re: summary 3, It does seem like the single account scenario is distinct from the "large breach" situation. In that case, you could have the small-scale loss have a smaller high end loss, while the large breach starts at a higher low loss. For example (just for illustration, I am sure you can come up with better numbers):
which seems to capture your desired behavior better. |
First off, great effort - really good to see quantitative approaches to InfoSec risk, thanks. This may be a stats newbie question, but consider the following
test.csv
:The
low_loss
value is $1, and thehigh_loss
is $100,000. The output ofriskquant --file test.csv
is:So, the annualized loss is $72,200 (72% of the
high_loss
).When we increment the
low_loss
value to $1000 without changing thehigh_loss
like so:The output of
riskquant --file test.csv
is:So now the annualized loss is only %13 of the
high_loss
, which seems counter intuitive - I would have thought that raising thelow_loss
whilst holding all other things equal would have increased the annualized loss. If we continue raising thelow_loss
value, we get some interesting behaviour:Gives an annualized loss of $50,000, so it appears to go back up but still not quite as high as when the lowest loss was 1.
What am I missing here? Is this right? If so, why?
The text was updated successfully, but these errors were encountered: