WRITING AND SOFTWARE BY IAN SHARPE
This article looks at:
It also touches on hardware random number generation and Excel's Analysis ToolPak.
If you just want my lottery number generator then it's over on the right.
Before getting too deep into Excel for your randomness requirements, you may like to investigate the many online tools provided by random.org.
First, let's examine Excel's RAND function. This produces a random number between (and possibly including) 0 and very slightly less than 1. In other words, it can be up to 0.999… for as many digits as Excel will handle but never 1 itself.
The numbers are not truly random, but pseudo-random. They are the output of program code that churns out numbers that appear to be random. RAND's output cannot easily be analysed to produce a predictable pattern. And in the long run, values should come up in equal amounts without bias.
For simpler applications, RAND is a usable simulation of a truly random process. In some situations it is not robust enough. This is why there are third-party alternatives, including hardware solutions.
A site I mentioned at the top of the page, random.org, gets random numbers from hardware that samples atmospheric noise. You can buy plug-in gadgets that sample naturally random phenomena to provide a source of values direct to the machine they are attached to. Some CPUs also contain hardware random number generators.
If you think hardware is the way to go, don't just buy a cheap generator at random (ha!) or build your own and assume that you have random numbers nailed down to the n-th degree for any possible situation. While the idea is sound and seems simple, getting it right in practice takes a lot of care.
Robustification of pseudo-randomness is also why Excel 2003 was kitted out with an upgraded generator. Unfortunately, the change introduced a bug that Excel 2003 users should read about here. Oops – Microsoft caught by the old 'not pressing [F9] more than nine times when testing' problem.
If you want a number outside the range provided by RAND, you could manipulate its output. For example, to get a whole number (integer) between 1 and 9 inclusive, we're looking for one of nine possible results:
= 1 + INT( 9 * RAND() )
Copy and paste that into a cell, then press [F9] a few times to recalculate. Observe that the results are always a whole number in the desired range.
It is easy to make a mistake scaling and shifting RAND, so always perform tests like this. And certainly more than nine, with your eyes open, if you're about to sign off on software before committing to mass distribution.
An easier way to get the same result is with RANDBETWEEN. It used to be an additional function obtained by installing the Analysis ToolPak. This is documented in the help of older versions of Excel. The function is there by default in later editions.
RANDBETWEEN produces a random whole number between (and including) the specified limits. For example, RANDBETWEEN(1,9) replaces the formula above.
There is a further tool for producing random numbers but it is well hidden. For this, you do need to install the Analysis ToolPak in old and newer editions of Excel. Recent versions make it available under File, Options, Add-ins. Find Analysis ToolPak in the list:
If it is under 'Inactive Application Add-ins', click the Go button with 'Excel Add-ins' selected in the drop-down. A small dialog appears in which you can enable the Analysis ToolPak. You may as well include the associated VBA tools.
With that done, you have access to a toolkit that includes more refined random number generation:
The Anaylsis ToolPak performs advanced functions mainly of interest to technical types such as statisticians, scientists and engineers. Each tool could be an article in itself so I must leave you to investigate the random number options if you need these specialist refinements.
RAND and RANDBETWEEN do not remember their previous output nor try to avoid repetition. Any such avoidance would be non-random behaviour.
Think about flipping coins: I flip a head, I flip a tail. A third flip does not halt the universe or push the coin through to another one. Or at least if something of a multiversal-quantum-crazy nature does happen, it's too slick to notice. I appear to carry on and get another head or tail.
Raw RAND values aren't so bad. They have a lot of decimal places so the number of possible values is huge. Exact repetition is possible but infrequent.
Duplicates arise more often when you produce values in a limited range.
An example is a lottery draw simulator. In the real thing, numbers might be on uniquely numbered balls pulled from a container without being returned. Basic random number generation does not model this.
Lottery numbers are commonly referred to as "random numbers" but they are not. Once you introduce the idea of non-repetition, they become a set of numbers each of which can be used once, so the set can be used up. Coins, dice, roulette wheels and Brownian particle movements are not like that. You can come back as many times as you like if duplication is not a consideration.
Numbered balls in a lottery machine are like a shuffled deck of cards: a known, finite set of values arranged in random order. A random permutation, in other words, and a different class of beast from what RAND gives us.
You could keep recalculating a sheet containing multiple RANDs until a set of numbers without duplicates appears but that is very inefficient. We need a process that reliably produces the list in one shot.
I can see two ways to put a set of values in random order.
The first is a quick method suitable for one-offs where tidiness is not important:
The fact that RAND occasionally repeats does not matter. It is only used as a sort key. You are interested in the parallel list of known values that is being re-ordered. The screenshots below illustrate the process.
Make a list of values:
Make a parallel list of RAND formulas:
Sort the rows on Column B:
Column A is now randomly ordered:
After sorting, the sheet recalculates and produces new random values. These don't fall in the same order. So column B looks strange, given that I just performed a sort on it. It is correct and does not matter.
The other method involves a macro to do it automatically. The code could be written in different ways, either replicating what I did above, or keeping the results of RAND functions inside itself to avoid the need for an extra column.
The macro below does all the work tidily off-sheet. Don't panic, you can download a workbook with another version of this ready to run:
Sub NoRepeatsTest() Const SIZE_OF_LIST As Integer = 10 Dim iResults() As Integer Dim i As Integer Dim iPick As Integer Dim iLimit As Integer Dim iTmp As Integer ' Generate an array of available values ReDim iResults(SIZE_OF_LIST) For i = 1 To SIZE_OF_LIST iResults(i) = i Next ' iResults() now contains ordered list of values ' Prove it by outputting to column A: For i = 1 To SIZE_OF_LIST Range("A1").Offset(i - 1, 0).Value = iResults(i) Next '''''''''' Now shuffle iResults() '''''''''' ' Seed random number generator with system time Randomize ' Set limit 1 below maximum array index iLimit = SIZE_OF_LIST - 1 For i = 1 To SIZE_OF_LIST - 1 ' Pick a random element from index=1 to iLimit iPick = Int((iLimit) * Rnd) + 1 ' Swap this element with the one above iLimit iTmp = iResults(iLimit + 1) iResults(iLimit + 1) = iResults(iPick) iResults(iPick) = iTmp ' Reduce limit by 1 iLimit = iLimit - 1 Next ' Output iResults() to column B For i = 1 To SIZE_OF_LIST Range("B1").Offset(i - 1, 0).Value = iResults(i) Next End Sub
The method is known in computer science as the Knuth algorithm, or the Fisher-Yates algorithm, or as a combination of those names.
The alogrithm has a potential flaw when used with a pseudo-random number generator on longer lists. You really should pay attention to this if you are doing something important. Read the bottom part of this page for the details.
I don't take account of this in the sample code here or in the downloadble files, so be aware of that if you adapt it to your own purpose.
Returning to the macro code above, I have used a list of values from 1 to 10. Picking, say, the first three, gives three random non-repeating numbers in the range 1 to 10.
A lottery simulator would work in a similar way by outputting, for example, the first six numbers from a list of 49 shuffled values.
In the Downloads panel at the head of this page you will find an Excel workbook that produces lottery numbers. It has better presentation than the above example and is customisable in the range and the number of values it outputs.
Make sure you agree to enable macros if asked when you open the file. Click the Help button to find out what to do.
The techniques described so far enable you to generate random numbers in a range and use them to shuffle lists.
Suppose you want just one random pick from a set of numbers or a list of items.
If the things being picked from are a simple range of numbers such as 1-6, you can just do RANDBETWEEN(1,6) to simulate rolling a die and that's your result.
If the numbers have breaks in their sequence, or the items are not numbers, you could make a list and shuffle it. Your random pick would then be the first item. However, list sorting does a lot of work and most of it is unnecessary if you only want one item at random.
It is much more efficient to generate a single random number and use it as an index into the list:
This is how the phrase is constructed:
Developing that idea into a random story generator is left as a little homework task if you care to take it on.
Here's a scenario that doesn't quite fit anything seen so far: you want to generate a random height, as in the height of an adult person.
Suppose that you research the heights of the shortest and tallest adults ever recorded. Using these limits in RANDBETWEEN() has two problems.
First, RANDBETWEEN() only produces whole numbers. That's fixable – decimal places can be obtained using the RAND() scale-and-shift technique we started off with. But it still isn't right.
Adult heights are not equally distributed. There are very few dwarves and giants, relatively few of the less extreme but still short or tall people, and a huge number around average height give or take a bit.
If you fill a room with people selected because their heights match your massaged RAND() results, you will see instantly that an uncommonly large number of party-goers are particularly short or tall. And that's before you partake of the recreational chemicals on offer.
So when picking heights, and in other cases where you need to mimic features of the real world, plain random number generation does not give a representative selection. An extra step is needed.
A crude approach would be to take a huge list of actual measured heights, ensuring the sampling process does not bias them and that it represents the types of people you are interested in. Random picks from the list would simulate picking people at random from a real crowd. Party balance is restored and it's safe to dance.
A more sophisticated method avoids the list. You build a mathematical formula whose output is known to closely match the distribution found in real life. This is used as the basis of random sampling. I am not going into detail because it is a subject in itself, except to note that it is one of the things that the randomness tools in the Analysis Toolpak mentioned earlier can help you with.
Version 3.1, November 2014
Compatibility Most versions of Excel
Description / note Lottery number generator. Configurable for different lotteries. Works with one or two sets of balls. Ensure macros are enabled.
Version 2.0, June 2005
Compatibility Most versions of Excel
Description / note Works for lotteries with one set of balls. Ensure macros are enabled.
I offer professional services in software development, databases, data manipulation and writing. Find out more at IanSharpe.com.
IanSharpe.com and other sites I manage are hosted at DigitalOcean. I am happy to recommend this company for Linux VPS hosting.
I get a commission if you open an account through this link and stay a while. DigitalOcean will credit $10 when you add a payment method. That gets you a generous slab of Linux-based VPS hosting for nothing and helps support this site.
This is the personal web site of Ian Sharpe, a software developer and writer based in Bath, United Kingdom.
Before taking up software development full-time I had a career in computer magazine publishing. A hobbyist obsession with programming diverted me into magazines from an even earlier career in railway civil engineering. That was in 1986 and for the next 16 years I was a writer and editor on high-profile titles. Some visitors may remember me from publications of yore such as PC Plus, PC Answers, PC Today and CPC Computing.
Much of the material on this site originates from that period and was written to fill column inches to short dealines. I created the software as an adjunct to the writing so not a lot time was spent on it!
There used to be a lot more (I reckon I wrote over a million words over my magazine career) but I drop items that become too outdated. The remainder survives while it stays popular, which remarkably it does.
And so the articles do not represent current interests or recent experience although I occasionally refresh them with new material.
There is no binding theme, just whatever came into my head to publish. Hence '@random'. That said, several pages deal with aspects of randomness. I don't know how that came about and it certainly wasn't a plan.
Some of my publishing experiences were exceptionally satisfying and many were great fun. But the world changed and the tide turned against the big-circulation magazines of the eighties and nineties. The gravy train was running out of juice and it was time to disembark, re-invent and move on.
I had an ambition to write software professionally. I also had some ability, always having done it as a sideshow to my writing. I got my first magazine job partly on the strength of my ability to crank out publishable code in Z80 assembly language and Basic. Over the years I edited reams of programming tutorials in a variety of languages written by some very capable people.
So I switched career once more and spent five years as a developer at a busy digital printing outfit.
Now I work freelance, dividing my time between projects at local technical software/electronics company Dot Software and whatever else I am engaged in. My professional services site is www.iansharpe.com.
Because my personal site is mostly old material, writing emails about it is not my favourite activity. Even 'quick' questions can take longer to answer than you might expect. It feels rude not to reply and sometimes I will respond, but forgive me if I do not.
If there is a real problem (or opportunity!) my address is email@example.com.
The download is in progress and should be visible soon.