If you’re just getting started with data analytics, you’ll be getting to grips with some relatively complex statistical concepts. One such concept is probability distribution—a mathematical function that tells us the probabilities of occurrence of different possible outcomes in an experiment. There are six main types of distribution, but today we’ll be focusing on just one: the Poisson distribution.
By the end of this post, you’ll have a clear understanding of what the Poisson distribution is and what it’s used for in data analytics and data science. We’ve divided our guide as follows:
1. What is the Poisson process?
Before we talk about the Poisson distribution itself and its applications, let’s first introduce the Poisson process. In short, the Poisson process is a model for a series of discrete events where the average time between events is known, but the exact timing of events is random. The occurrence of an event is also purely independent of the one that happened before.
So let’s bring this theory to life with a real-world example. We all get frustrated when our internet connection is unstable. If we assume that one failure doesn’t influence the probability of the next one, we might say that it follows the Poisson process, where the event in question is “internet failure”. All we need to know is the average time between these failures. However, there is a set of criteria that needs to be met:
- The events of such a process are independent of each other.
- The average rate of event occurrences per unit of time (e.g. per month) is constant.
- Two events (e.g. internet failure or no internet failure) cannot occur simultaneously.
2. What is the Poisson distribution?
While the Poisson process is the model we use to describe events that occur independently of each other, the Poisson distribution allows us to turn these “descriptions” into meaningful insights. So, let’s now explain exactly what the Poisson distribution is.
The Poisson distribution is a discrete probability distribution
As you might have already guessed, the Poisson distribution is a discrete probability distribution which indicates how many times an event is likely to occur within a specific time period. But what is a discrete probability distribution?
Right, let’s first align on the concepts! A probability distribution is a mathematical function that gives the probabilities of possible outcomes happening in an experiment. As you might already know, probability distributions are used to define different types of random variables. These variables can be either discrete or continuous. When talking about Poisson distribution, we’re looking at discrete variables, which may take on only a countable number of distinct values, such as internet failures (to go back to our earlier example).
3. What is the Poisson distribution used for?
Now we know what the Poisson distribution is and what it looks like in action, it’s time to zoom out again and see where the Poisson distribution fits into the bigger picture.
As you know, data analytics is all about drawing meaningful insights from raw data; insights which can be used to make smart decisions. Poisson distributions are commonly used to find the probability that an event might happen a specific amount of times based on how often it usually occurs. Based on these insights and future predictions, organizations can plan accordingly.
For example, an insurance company might use Poisson distribution to calculate the probability of a number of car accidents happening in the next six months, which in turn will inform how they price the cost of car insurance.
4. Key takeaways
We have now covered a complete introduction to the Poisson distribution. There is certainly a lot more to be explored and plenty more exciting problems to solve, but hopefully this has given you a good starting point from which to continue your journey of discovery!
Before we finish, let’s summarize the main properties of Poisson distribution and the key takeaways from what we’ve covered:
- Poisson distributions are used to find the probability that an event might happen a definite number of times based on how often it usually occurs.
- The average number of outcomes per specific time interval is represented by λ and is called an event rate.
- The events are independent, meaning the number of events that occur in any interval of time is independent of the number of events that occur in any other interval.
- The probability of an event is proportional to the length of time in question (e.g. a week or a month).
- The probability of an event in a particular time duration is the same for all equivalent time durations.