Llyr Jones is a Developer and Mathematical Analyst for Grid-Tools. He is a Maths graduate from St Andrews University and is currently specializing in data trend analysis for Datamaker™.
The Pareto data trend can be thought of as the mathematical representation of the so-called "80-20 rule" of economics. In fact, this trend was named after Italian economist Vilfredo Pareto who first used it to describe the allocation of wealth among individuals, while observing that 80% of the wealth was controlled by 20% of the population (hence the "80-20 rule").
This distribution is most interesting because it crops up in so many different situations in addition to wealth distribution. It can be used to describe any phenomenon where there are a large amount of small values but very few large values. For example, Microsoft noted that by fixing the top 20% of the most reported bugs, 80% of the errors and crashes would be eliminated. The purpose of the Pareto chart is to highlight the most important among a (typically large) set of factors. In quality control, it often represents the most common sources of defects, the highest occurring type of defect, or the most frequent reasons for customer complaints, and so on. Examples include:
For the mathematically-minded, the Pareto trend is characterised by the following equation:
Where xm is the minimum of possible values, and α is a shape parameter that determines how quickly the curve drops (in the context of wealth modeling, α is referred to as the Pareto index).
I have put together this data set of 400 points in two dimensions, where the x dimension is what I would call the 'source' and, in this case, is uniformly distributed on the range [1, 5] and the y dimension is a simulated Pareto trend (with parameter alpha = 1.00) with medium variability. Click here to download this data set.