The Pareto data trend

Embed Pareto data trends into your test data

Key Developer Profile: Llyr Jones


Llyr photo

Llyr Jones is a Developer and Mathematical Analyst for Grid-Tools. He is a Maths graduate from St Andrews University and is currently specializing in data trend analysis for Datamaker™.

Notes from Llyr on the Pareto data trend

The Pareto data trend can be thought of as the mathematical representation of the so-called "80-20 rule" of economics. In fact, this trend was named after Italian economist Vilfredo Pareto who first used it to describe the allocation of wealth among individuals, while observing that 80% of the wealth was controlled by 20% of the population (hence the "80-20 rule").

This distribution is most interesting because it crops up in so many different situations in addition to wealth distribution. It can be used to describe any phenomenon where there are a large amount of small values but very few large values. For example, Microsoft noted that by fixing the top 20% of the most reported bugs, 80% of the errors and crashes would be eliminated. The purpose of the Pareto chart is to highlight the most important among a (typically large) set of factors. In quality control, it often represents the most common sources of defects, the highest occurring type of defect, or the most frequent reasons for customer complaints, and so on. Examples include:

  • Sizes of human settlements: it is easily seen that there are very few large cities relative to the number of small villages
  • Computer processor scheduling: there are a large number of short jobs as opposed to a very small number of long jobs
  • Hard disk drive error rates: large amount of small errors against a small amount of large errors
  • Standardised price returns on individual stocks
  • ...and many more

For the mathematically-minded, the Pareto trend is characterised by the following equation:

Pareto trend

Where xm is the minimum of possible values, and α is a shape parameter that determines how quickly the curve drops (in the context of wealth modeling, α is referred to as the Pareto index).


Pareto data trend for data creation

Randomised Pareto trend example


I have put together this data set of 400 points in two dimensions, where the x dimension is what I would call the 'source' and, in this case, is uniformly distributed on the range [1, 5] and the y dimension is a simulated Pareto trend (with parameter alpha = 1.00) with medium variability. Click here to download this data set.

Back to the top