PI digit distribution - confusion

Skogsgurra · Aug 22, 2013

Can't find the proper forum for this question. Don't think there is one.

In short. We needed to explain to a customer that certain processes are stochastic rather than deterministic. He had thyristor fuses blown at very irregular intervals and it turned out that two things needed to coincide for that to happen; A. That the drive was breaking (stops in 400 milliseconds) and B. That PF compensation capacitors were switched in just then.

This happened once or twice a year and you could never tell when it was going to happen. Then, it happened just a few days apart and the customer got somewhat concerned. I thought it would be a good idea to show him that the digits of PI contain groups of digits and that those groups are at random intervals. So, we wrote a "substring finder" that searches a one million digit PI string. Our first try was the number 4711. That gave us 103 hits, which we thought was just about right.
We then tried 47111, which produced ten hits. Very much what we expected. Everything fine - so far.

Then, we wanted to show this in a diagram. Of course, we expected a uniform distribution with some variance. But what we saw was a distribution where the first hits were quite close and then spread out when we got a bit into the "one million file". Like this: ---:--:---:-----:--------:--------------:------------:--------:----- etcetera (the colons represent hits).

That was not at all what we had expected and I am glad that the customer wasn't there when we found out.

We have got the same result consistently. A few of them are given below (the numbers show where the first digit in the "search string" is positioned):

47111: 25447, 79545, 93534, 330582, 346263, 439447, 705730, 775750, 821499, 958303
47112: 73831, 299381, 375984, 718997, 962497
47113: 277993, 417209, 634628, 701464, 823702
47114: 127769, 141364, 153066, 231981, 557948, 719124, 803558, 912741, 964298, 996952
47115: 49554, 77922, 128830, 448202, 460855, 483224, 489871, 619692, 634589, 640280, 644557, 807140, 843108, 882015, 911986, 961592
47116: 36469, 92147, 273385, 279367, 300343, 318650, 378851, 483139, 546824, 635462, 685623, 782354, 803772, 872191, 888271
47117: 39880, 58449, 67914, 116125, 415961, 491234, 699587, 720026, 809578, 811719, 958795

We think (know) that the digits of PI are truly random. We think (know for sure) that our "picker" works properly. But we cannot make out why the distribution is denser at the beginning. Should it be? Why?

Gunnar Englund

http://www.gke.org

--------------------------------------
Half full - Half empty? I don't mind. It's what in it that counts.

MintJulep · Aug 22, 2013

the digits of PI are truly random

But you are not looking for digits. You are looking for a sequence of digits.

Skogsgurra · Aug 22, 2013

Yes. I do. Does that explain why I get more hits at the beginning? I do not understand that.

Gunnar Englund

http://www.gke.org

--------------------------------------
Half full - Half empty? I don't mind. It's what in it that counts.

MintJulep · Aug 22, 2013

I can't see why we should expect a uniform distribution for a random event. Random is random.

Lightning strikes are random events. But their distribution is certainly not uniform over the Earth's surface.

http://geology.com/articles/lightning-map.shtml

Skogsgurra · Aug 22, 2013

I don't think that you understand. "we expected a uniform distribution with some variance" - that is what everyone else seems to be getting. Google "distribution digits of pi" - there are several examples showing just that.

Gunnar Englund

http://www.gke.org

--------------------------------------
Half full - Half empty? I don't mind. It's what in it that counts.

MintJulep · Aug 22, 2013

One million is a pretty small sample of infinite.

IRstuff · Aug 22, 2013

"Truly random" does not mean uniformly spaced.

If you flip a coin you will get amazingly long runs of heads or tails.

Did you try other sequences?

I think it makes sense, but I'm not sure I could even begin to explain why.

TTFN
faq731-376

Need help writing a question or understanding a reply? forum1529

waross · Aug 22, 2013

I am going to have to think about this for awhile. Don't get me wrong, I won't obsess, I will think about it at random intervals. There may be some variance in the spacing. Grin.
Seriously, this will take some time to digest.
Is this a good time to ponder the spacing of the occurrence of prime numbers?
A random thought, echoing MintJulep's observation. Ten hits out of a million. Would a statistician accept such a small sample to develop any kind of a trend?
Is it reasonable to consider a larger sample? Say ten million or one hundred million. Will your computer complete a much larger search in a reasonable time?
Thanks for sharing Gunnar. Your problems sure shake up the grey matter.

Bill
--------------------
"Why not the best?"
Jimmy Carter

GregLocock · Aug 22, 2013

I think you need to study it for many arbitrarily generated search terms, and for much more than one million digits. Are you using a lookup table for pi? or are you using that really weird finding that it is possible to calculate the n'th digit of pi without knowing the preceeding n-1 digits?

Cheers

Greg Locock

New here? Try reading these, they might help FAQ731-376

http://eng-tips.com/market.cfm?

Skogsgurra · Aug 22, 2013

Thanks all.

I use a table. There are several tables available and they are all identical (of course) - all that differs is the length. A one million digit string is more than adequate to show the randomness in digit distribution and the problem is not that there are long sequences of equal digits, it is that the occurence of a sequence is "denser" in the beginning of the string. There are, btw, six nines (999999) to be found at positions 763 and 193035 and, yes, I am very well aware of the improbably long chain of tosses with identical outcome that you do get (the coin doesn't have a memory and happily repeats itself).

The digits of PI are random. That means that they shall occur equally probable across the whole string and that there shall be some random (sic) variation in the probability. But there shall not be any systematic difference depending on how far into the string of numbers you are.

At least, that is what I thought until we tried to use that randomness. My question still stands: Is that expected? And if so, why? I am afraid that there is a problem with my thinking. But I have a problem understanding where I went wrong. Do you see where?

Gunnar Englund

http://www.gke.org

--------------------------------------
Half full - Half empty? I don't mind. It's what in it that counts.

zdas04 · Aug 23, 2013

I really don't see a basis for any assertion that any random string doing any seemingly unrandom thing can be "expected". Pick the string 5116 and see if it also shows up more early than late. Then try 1760. If 20 randomly selected 4-digit strings all show up more often in the early days than the late days then pi is not random and my wife is justified in clinging to her tin-foil hat. A million digits is certainly a statistically significant population, but do you have a statistically significant sample size?

David Simpson, PE
MuleShoe Engineering

"Belief" is the acceptance of an hypotheses in the absence of data.
"Prejudice" is having an opinion not supported by the preponderance of the data.
"Knowledge" is only found through the accumulation and analysis of data.
The plural of anecdote is not "data"

Skogsgurra · Aug 23, 2013

Working on it. It takes some time.

Gunnar Englund

http://www.gke.org

--------------------------------------
Half full - Half empty? I don't mind. It's what in it that counts.

VEBill · Aug 23, 2013

There are quite a few Pi search tools on the 'net. Some search for digits, others search for 'hidden messages'.

Pi-Search Result: search string = "gunnar"
30-bit binary equivalent = 001111010101110011100000110010
search string found at binary index = 3791268904
binary pi : 0010100100111101010111001110000011001011111001011100100101100011
binary string: 001111010101110011100000110010
character pi : knvbfngvcsrgaigunnar:eyeqyixk,y;s_

j.
character string: gunnar

Skogsgurra · Aug 23, 2013

Yes! Too small a sample it was. I tried a few other input strings and now have the result I expected.

Distance from previous found is now like this.

distance from previous found
395 100235 71493 154457 80313 28440 55660 60262 174694 96175
245893 44177 37996 92144 36774 24642 113441 23691 177806 69992
37353 115822 59487 24366 52188 45705 31401 374780 125276 79037
58376 21325 145555 112697 9880 255019 209674 91195 20386 75671
177448 6990 18504 83459 340655 65945 65324 11050 160827 27769
310489 53925 5253 187101 95351 45240 93874 59219 94424 42881
75623 167019 4375 78186 15746 134113 135252 83244 53255 249074
285900 27482 163873 71671 18578 26686 137384 67563 35237 56693
153276 303104 58830 200 69277 6018 199513 109423 10712 86929
86734 70737 116372 33692 23594 155927 52678 79056 72750 208911

Data all over the place, from 200 to 340655. If you remove the extremes and those numbers represent time, it very much describes the failure frequency that customer has. Thanks for the tip!

Gunnar Englund

http://www.gke.org

--------------------------------------
Half full - Half empty? I don't mind. It's what in it that counts.

zdas04 · Aug 23, 2013

That's what we're here for. MuleShoe Engineering--Problem solutions that are no more technically complex than they have to be.

David Simpson, PE
MuleShoe Engineering

"Belief" is the acceptance of an hypotheses in the absence of data.
"Prejudice" is having an opinion not supported by the preponderance of the data.
"Knowledge" is only found through the accumulation and analysis of data.
The plural of anecdote is not "data"

Skogsgurra · Aug 23, 2013

Gunnar Englund

http://www.gke.org

--------------------------------------
Half full - Half empty? I don't mind. It's what in it that counts.

BigInch · Aug 24, 2013

You're comparing the simple probability of finding a certain 4 or 5 digit permutation in a group of 1MM numbers to that of finding them AND finding them in a uniform interval.

Let's take just the permutation 22, for which we should be able to find about 10 of those in a 1000 digit random sequence, probably 1 in each group of 100 digits, however there is no reason that they should be 100 digits apart. You might find the first 22 group near to the beginning of the first 100 digits, but the second group could just as easily be expected to occur towards the end of the second 100 digits.

I think you should expect to see cycles where the average interval between similar permutations can vary from 1 to 2 x 1/P(permutation), or in the case of a two digit permutation, 2 x (1/0.1 x 1/0.1) = a max average interval of 200 digits in our 1000 digit sequence, with extreme intervals from 100 - 3 sigma, to 3 sigma above the high average (500). I guess that that means I might expect to see "22" showing up anywhere from 2 to 10 times in 1000 digits, with cycle lengths from 0 to 0.5 x 1/P(permutation).

I wouldn't try to look too closely for the "pattern". I believe human beings are hard-wired to try to find patterns, and see them even if they don't exist, and many others just as easily don't see them where they do, despite all the evidence.

Independent events are seldomly independent.

BigInch · Aug 24, 2013

Or...

Finding a 1 in the first position of a 10 digit string = the probability of getting a 1 (1 in 10) x probability of landing in the first position, also 1 in 10. That's 1 in 100.

If we calculate the probability of "2,2" being found in the first and second position in a 10 digit sequence it's 1/10 for the first "2" digit, 1/10 for the second "2" digit, 1/10 for the 1st position and 1/9 for the second position, or 1 in 9000. To get that to happen again near the first two slots in the next 10 digit run, probably an additional 1 in 20, and 1 in 19,or so, so expecting to see any kind of uniform spacing would be like 1 in 4,000,000 chance of a 10 digit interval in a in 20 digit string. Does that prove there is considerable room for variance? I'm too tired to think about it any more.

Independent events are seldomly independent.

Skogsgurra · Aug 24, 2013

Thanks for extended view on probability. But, I was not expecting a uniform distribution - I was expecting that all seed numbers shall occur reasonable distributed across the one million digits. My first seeds were not, they showed "lumping" in the first 100 000 digits. At least, five out of seven did. That was what I was concerned about. Choosing other seeds and increasing the number of them somewhat gave me the result I think one could expect.

Gunnar Englund

http://www.gke.org

--------------------------------------
Half full - Half empty? I don't mind. It's what in it that counts.

SomptingGuy · Aug 24, 2013

When I was a lot younger, a brilliant novel by Carl Sagan, called "Contact" found my way. The main part of it was adapted into a Hollywood movie under the same name, but I was devastated that they missed out the under-story of the PI digit sequence. Read the book, don't take sides in the plot, just love the PI story.

- Steve

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

PI digit distribution - confusion

Electrical

Mechanical

Electrical

Mechanical

Electrical

Mechanical

Aerospace

Electrical

Automotive

Electrical

Mechanical

Electrical

Military

Electrical

Mechanical

Electrical

Petroleum

Petroleum

Electrical

Automotive

Similar threads

Log in

Part and Inventory Search

Sponsor