Sampling Bias

The amount of data in Cybersecurity research is sometimes overwhelming.  To get around this amount of data, we sample. That means we choose a subset of the overwhelming ocean of data and deal with a much smaller pond.

We must be careful when we do this sampling, it’s possible to introduce sampling bias.   This is when some parts of the data ocean are more likely to be picked than other parts. Continuing with the ocean analogy, it’s when you pick all your samples out of the Bermuda Triangle but ignore the entire Atlantic Ocean.  Your results aren’t generalizable to the entire Atlantic Ocean in that case, they’re only relevant to the Bermuda Triangle.

Which is exactly the problem with sampling bias.

Returning to the world of Cybersecurity, suppose you’re studying network traffic.  If you only study the traffic that your organization sees, then that’s not generalizable to the world at large, unless you have a very good reasoning behind it.  The statement ‘My Organization is just like the Internet’ is one that requires a lot of explanation. It’s possible that it’s true, from a certain point of view, but you must explain that point of view.

DTRAP wants your results to be generalizable, that is, to have external validity.  Avoid sampling bias in your research.

Share