Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When low_loss increases, why does annualized loss decrease? #11

Open
Osipion opened this issue Mar 12, 2020 · 2 comments
Open

When low_loss increases, why does annualized loss decrease? #11

Osipion opened this issue Mar 12, 2020 · 2 comments

Comments

@Osipion
Copy link

Osipion commented Mar 12, 2020

First off, great effort - really good to see quantitative approaches to InfoSec risk, thanks. This may be a stats newbie question, but consider the following test.csv:

A,DoS,0.5,1,100000

The low_loss value is $1, and the high_loss is $100,000. The output of riskquant --file test.csv is:

A,DoS,"$72,200"

So, the annualized loss is $72,200 (72% of the high_loss).

When we increment the low_loss value to $1000 without changing the high_loss like so:

A,DoS,0.5,1000,100000

The output of riskquant --file test.csv is:

A,DoS,"$13,300"

So now the annualized loss is only %13 of the high_loss, which seems counter intuitive - I would have thought that raising the low_loss whilst holding all other things equal would have increased the annualized loss. If we continue raising the low_loss value, we get some interesting behaviour:

A,DoS,0.5,99999,100000

Gives an annualized loss of $50,000, so it appears to go back up but still not quite as high as when the lowest loss was 1.

What am I missing here? Is this right? If so, why?

@Osipion Osipion changed the title When low_loss increases, why does annualized loss decrease? When low_loss increases, why does annualized loss decrease? Mar 12, 2020
@Osipion
Copy link
Author

Osipion commented Mar 12, 2020

It appears this happens for the first ~0.5% of the range, after which it starts to behave as I would expect (losses increase as the low_loss increases). I pulled out the SimpleLoss code into a notebook:

out_al = widgets.Output()

def distribution(frequency, low_loss, high_loss):
    factor = -0.5 / norm.ppf(0.05)
    mu = (math.log(low_loss) + math.log(high_loss)) / 2.
    shape = factor * (math.log(high_loss) - math.log(low_loss))
    return lognorm(shape, scale=math.exp(mu))

def annualized_loss(frequency, low_loss, high_loss):
    return frequency * distribution(frequency, low_loss, high_loss).mean()

def gen_al_curve(freq, mx):
    r = list(range(1, mx))
    line = []
    for i in r:
        l = annualized_loss(freq, i, mx)
        if len(line) > 0 and l > line[len(line) - 1]:
            print(f'inflects at {i} ({l})')
        line.append(l)
    plt.plot(line)
    with out_al:
        plt.show()
    
display(out_al)
gen_al_curve(0.5, 10000)

And see the following:

image

Which shows the inflection happens when low_loss is 46/10,000 (0.46% of the high_loss). So I guess this is an expected behaviour for "too small" values?

But this does lead to an interesting problem. Say I was trying to model the risk posed by MITRE ATT&CK Impact T1531 (Account Access Removal). Loss of access to a single account might have an average cost of $2 or so and be very likely (say 99% chance it will happen to at least one customer a year). But a large breach that resulted in thousands of customers loosing access to accounts could be very costly (lets say $1,000,000). If I model this as one loss scenario I would have:

frequency: 0.99
low_impact: $2
high_impact: $1,000,000
annualized_loss: $3,992,790.4

Which doesn't seem to accurately capture the risk (in 1 year I'm getting roughly 4 * high_impact).

I know I've put a bit of a wall of text here, so to summarize:

  1. Is the initial decrease in annualized loss expected and correct?
  2. Do I need to be cautious of this when I structure my loss scenarios?
  3. Am I picking my scenarios wrong (e.g. should a single account losing access be a completely different scenario from multiple accounts losing access)?

@mdeshon
Copy link
Contributor

mdeshon commented Mar 15, 2020

Re: summary 1, yes it's expected. The lognormal shape is proportional to the difference log(hi) - log(low) which is equivalent to saying it's the ratio of high/low. The lognormal is by definition non-negative, with high shape values causing the left side of the distribution to be very steep near the origin, which causes the right side of the distribution to explode out to higher values. The Wikipedia entry for lognormal distribution has a diagram which shows this effect (where sigma is the shape value).

Re: summary 2, yes, if you're using the lognormal, usually you don't want the high/low ratio to be so high (500,000 in your example) because the distribution is unusually skewed, leading to the unrealistic values for annual loss that you observed.

There are a few things I can suggest: first, you're treating the frequency as a probability that can't exceed 1, but it can exceed 1 when you intend to say that it is expected to occur more than once a year on average. (Note that I had a bug that enforced frequency <=1 but that's fixed since PR#10. Sorry if that caused confusion)

Re: summary 3, It does seem like the single account scenario is distinct from the "large breach" situation. In that case, you could have the small-scale loss have a smaller high end loss, while the large breach starts at a higher low loss.

For example (just for illustration, I am sure you can come up with better numbers):

  • Small account loss: Frequency: 10, Low impact: $2, High impact: $1000
  • Large scale breach: Frequency: 0.2, Low impact: $10,000, High impact: $1,000,000
>>> s1 = simpleloss.SimpleLoss('SMALL', 'Small loss', 10, 2, 1000)
>>> s2 = simpleloss.SimpleLoss('LARGE', 'Large loss', 0.2, 10000, 1000000)
>>> s1.annualized_loss()
2663.505611088662
>>> s2.annualized_loss()
53279.60195471087

which seems to capture your desired behavior better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants