WRITING AND SOFTWARE BY IAN SHARPE

Randomisation in Excel

Random numbers that don't repeat, randomly ordered lists and generating lottery numbers

This article looks at:

  • The RAND() and RANDBETWEEN() functions
  • Producing a list of random numbers without repeats (e.g. lottery numbers)
  • Randomising list order
  • Making random picks from a list
  • Making biased random picks

It also touches on hardware random number generation and Excel's Analysis ToolPak.

If you just want my lottery number generator then it's over on the right.

Before getting too deep into Excel for your randomness requirements, you may like to investigate the many online tools provided by random.org.

Genuine fake random

First, let's examine Excel's RAND function. This produces a random number between (and possibly including) 0 and very slightly less than 1. In other words, it can be up to 0.999… for as many digits as Excel will handle but never 1 itself.

The numbers are not truly random, but pseudo-random. They are the output of program code that churns out numbers that appear to be random. RAND's output cannot easily be analysed to produce a predictable pattern. And in the long run, values should come up in equal amounts without bias.

For simpler applications, RAND is a usable simulation of a truly random process. In some situations it is not robust enough. This is why there are third-party alternatives, including hardware solutions.

Digression: doing it the hard way

A site I mentioned at the top of the page, random.org, gets random numbers from hardware that samples atmospheric noise. You can buy plug-in gadgets that sample naturally random phenomena to provide a source of values direct to the machine they are attached to. Some CPUs also contain hardware random number generators.

If you think hardware is the way to go, don't just buy a cheap generator at random (ha!) or build your own and assume that you have random numbers nailed down to the n-th degree for any possible situation. While the idea is sound and seems simple, getting it right in practice takes a lot of care.

At least read this page and this one on Wikipedia, and this report by statistician Robert Davies.

Robustification of pseudo-randomness is also why Excel 2003 was kitted out with an upgraded generator. Unfortunately, the change introduced a bug that Excel 2003 users should read about here. Oops – Microsoft caught by the old 'not pressing [F9] more than nine times when testing' problem.

Using RAND's output

If you want a number outside the range provided by RAND, you could manipulate its output. For example, to get a whole number (integer) between 1 and 9 inclusive, we're looking for one of nine possible results:

  1. Multiply RAND() by nine to get a number in the range 0 to 8.999…
  2. INT() cuts off the fractional part leaving a whole number between 0 and 8.
  3. Add one to shift the range to 1 – 9:
= 1 + INT( 9 * RAND() )

Copy and paste that into a cell, then press [F9] a few times to recalculate. Observe that the results are always a whole number in the desired range.

It is easy to make a mistake scaling and shifting RAND, so always perform tests like this. And certainly more than nine, with your eyes open, if you're about to sign off on software before committing to mass distribution.

The RANDBETWEEN function

An easier way to get the same result is with RANDBETWEEN. It used to be an additional function obtained by installing the Analysis ToolPak. This is documented in the help of older versions of Excel. The function is there by default in later editions.

RANDBETWEEN produces a random whole number between (and including) the specified limits. For example, RANDBETWEEN(1,9) replaces the formula above.

Advanced random numbers

There is a further tool for producing random numbers but it is well hidden. For this, you do need to install the Analysis ToolPak in old and newer editions of Excel. Recent versions make it available under File, Options, Add-ins. Find Analysis ToolPak in the list:

Where to find the Analysis ToolPak Add-in in recent versions of Excel

If it is under 'Inactive Application Add-ins', click the Go button with 'Excel Add-ins' selected in the drop-down. A small dialog appears in which you can enable the Analysis ToolPak. You may as well include the associated VBA tools.

With that done, you have access to a toolkit that includes more refined random number generation:

Excel's Data Analysis dialog

The Anaylsis ToolPak performs advanced functions mainly of interest to technical types such as statisticians, scientists and engineers. Each tool could be an article in itself so I must leave you to investigate the random number options if you need these specialist refinements.

Excel's enhanced random number generator

Want to rule out the repeats?

RAND and RANDBETWEEN do not remember their previous output nor try to avoid repetition. Any such avoidance would be non-random behaviour.

Think about flipping coins: I flip a head, I flip a tail. A third flip does not halt the universe or push the coin through to another one. Or at least if something of a multiversal-quantum-crazy nature does happen, it's too slick to notice. I appear to carry on and get another head or tail.

Raw RAND values aren't so bad. They have a lot of decimal places so the number of possible values is huge. Exact repetition is possible but infrequent.

Duplicates arise more often when you produce values in a limited range.

An example is a lottery draw simulator. In the real thing, numbers might be on uniquely numbered balls pulled from a container without being returned. Basic random number generation does not model this.

Lottery numbers are commonly referred to as "random numbers" but they are not. Once you introduce the idea of non-repetition, they become a set of numbers each of which can be used once, so the set can be used up. Coins, dice, roulette wheels and Brownian particle movements are not like that. You can come back as many times as you like if duplication is not a consideration.

Numbered balls in a lottery machine are like a shuffled deck of cards: a known, finite set of values arranged in random order. A random permutation, in other words, and a different class of beast from what RAND gives us.

You could keep recalculating a sheet containing multiple RANDs until a set of numbers without duplicates appears but that is very inefficient. We need a process that reliably produces the list in one shot.

I can see two ways to put a set of values in random order.

Take 1 – a random sort key

The first is a quick method suitable for one-offs where tidiness is not important:

  1. Make a list of all the possible values.
  2. Make a parallel list of RAND functions.
  3. Sort the rows using the RAND function column as the key.

The fact that RAND occasionally repeats does not matter. It is only used as a sort key. You are interested in the parallel list of known values that is being re-ordered. The screenshots below illustrate the process.

Make a list of values:

Make a list of values

Make a parallel list of RAND formulas:

Make a parallel list of RAND formulas

Sort the rows on Column B:

Sort the rows on Column B

Column A is now randomly ordered:

Column A is now randomly ordered

After sorting, the sheet recalculates and produces new random values. These don't fall in the same order. So column B looks strange, given that I just performed a sort on it. It is correct and does not matter.

Take 2 – make a macro of it

The other method involves a macro to do it automatically. The code could be written in different ways, either replicating what I did above, or keeping the results of RAND functions inside itself to avoid the need for an extra column.

The macro below does all the work tidily off-sheet. Don't panic, you can download a workbook with another version of this ready to run:

Sub NoRepeatsTest()
	Const SIZE_OF_LIST As Integer = 10
	Dim iResults() As Integer
	Dim i As Integer
	Dim iPick As Integer
	Dim iLimit As Integer
	Dim iTmp As Integer

	' Generate an array of available values
	ReDim iResults(SIZE_OF_LIST)
	For i = 1 To SIZE_OF_LIST
		iResults(i) = i
	Next

	' iResults() now contains ordered list of values
	' Prove it by outputting to column A:

	For i = 1 To SIZE_OF_LIST
		Range("A1").Offset(i - 1, 0).Value = iResults(i)
	Next

	'''''''''' Now shuffle iResults() ''''''''''
	' Seed random number generator with system time
	Randomize

	' Set limit 1 below maximum array index
	iLimit = SIZE_OF_LIST - 1

	For i = 1 To SIZE_OF_LIST - 1
		' Pick a random element from index=1 to iLimit
		iPick = Int((iLimit) * Rnd) + 1

		' Swap this element with the one above iLimit
		iTmp = iResults(iLimit + 1)
		iResults(iLimit + 1) = iResults(iPick)
		iResults(iPick) = iTmp

		' Reduce limit by 1
		iLimit = iLimit - 1
	Next

	' Output iResults() to column B
	For i = 1 To SIZE_OF_LIST
		Range("B1").Offset(i - 1, 0).Value = iResults(i)
	Next
End Sub

The method is known in computer science as the Knuth algorithm, or the Fisher-Yates algorithm, or as a combination of those names.

The alogrithm has a potential flaw when used with a pseudo-random number generator on longer lists. You really should pay attention to this if you are doing something important. Read the bottom part of this page for the details.

I don't take account of this in the sample code here or in the downloadble files, so be aware of that if you adapt it to your own purpose.

Returning to the macro code above, I have used a list of values from 1 to 10. Picking, say, the first three, gives three random non-repeating numbers in the range 1 to 10.

A lottery simulator would work in a similar way by outputting, for example, the first six numbers from a list of 49 shuffled values.

In the Downloads panel at the head of this page you will find an Excel workbook that produces lottery numbers. It has better presentation than the above example and is customisable in the range and the number of values it outputs.

Make sure you agree to enable macros if asked when you open the file. Click the Help button to find out what to do.

My lottery number spreadsheet

How to make a single random pick

The techniques described so far enable you to generate random numbers in a range and use them to shuffle lists.

Suppose you want just one random pick from a set of numbers or a list of items.

If the things being picked from are a simple range of numbers such as 1-6, you can just do RANDBETWEEN(1,6) to simulate rolling a die and that's your result.

If the numbers have breaks in their sequence, or the items are not numbers, you could make a list and shuffle it. Your random pick would then be the first item. However, list sorting does a lot of work and most of it is unnecessary if you only want one item at random.

It is much more efficient to generate a single random number and use it as an index into the list:

Picking one random item from a list

This is how the phrase is constructed:

Picking one random item from a list - detail 2

Developing that idea into a random story generator is left as a little homework task if you care to take it on.

Biasing the output

Here's a scenario that doesn't quite fit anything seen so far: you want to generate a random height, as in the height of an adult person.

Suppose that you research the heights of the shortest and tallest adults ever recorded. Using these limits in RANDBETWEEN() has two problems.

First, RANDBETWEEN() only produces whole numbers. That's fixable – decimal places can be obtained using the RAND() scale-and-shift technique we started off with. But it still isn't right.

Adult heights are not equally distributed. There are very few dwarves and giants, relatively few of the less extreme but still short or tall people, and a huge number around average height give or take a bit.

If you fill a room with people selected because their heights match your massaged RAND() results, you will see instantly that an uncommonly large number of party-goers are particularly short or tall. And that's before you partake of the recreational chemicals on offer.

So when picking heights, and in other cases where you need to mimic features of the real world, plain random number generation does not give a representative selection. An extra step is needed.

A crude approach would be to take a huge list of actual measured heights, ensuring the sampling process did not bias them and that it represents the types of people you are interested in. You could then make random picks from the list.

The heights would approximate picking people at random from an actual crowd. Party balance is restored and it's safe to dance.

To be continued - watch this space...

Downloads

Current

Version 3.1, November 2014

Compatibility Most versions of Excel

Description / note Lottery number generator. Configurable for different lotteries. Works with one or two sets of balls. Ensure macros are enabled.

Superseded

Version 2.0, June 2005

Compatibility Most versions of Excel

Description / note Works for lotteries with one set of balls. Ensure macros are enabled.

Professional services

I offer professional services in software development, databases, data manipulation and writing. Find out more at IanSharpe.com.

site supporters

How much does dropping your keys cost?

Losing car or house keys can work out a lot more expensive than you think. Protect them free for 100 days with .

OFFER: Get $10 to spend on a DigitalOcean VPS

I am a happy user of DigitalOcean virtual private servers. I get commission if you open an account with DO through this link and stay a while. DigitalOcean will credit $10 to your account when you add a payment method. That gets you a decent chunk of free VPS hosting and helps support this site.

All material © 1986–2014 Ian Sharpe. Nothing may be reproduced or made available for download elsewhere without permission