PDA

View Full Version : Randomize Comparator



Trinithis
01-08-2008, 07:37 PM
Let's say I have the array


var a = [1, 2, 3, 4, 5, 6];


I want to randomize this through the sort function and not through a separate shuffle method.



a.sort(function(a, b) {
// do something
});


If I'm not mistaken, the following is not a fair comparator


a.sort(function() {
return Math.random();
});


Or at least I'm almost certain this is not truely random.


a.sort(function(a, b) {
a = Math.random();
b = Math.random();
if(a > b)
return 1;
if(a < b)
return -1
return 0;
});


If anything, the comparator needs to model this form for my page because I want to use something like


function sortList(f) {
Array.prototype.filter.call(document.getElementsByTagName("li"), (function() {
var regex = /^\d+$/;
return function(el) {
return el.value && regex.test(el.value);
};
})()).sort(function(a, b) {
var x = a;
a = f(a);
b = f(b);
while(true) {
try {
if(a > b)
return 1;
if(a < b)
return -1;
return 0;
}
catch(e) {
a = f(x); // IE bug
}
}
}).forEach((function() {
var i = 1;
return function(el) {
el.parentNode.appendChild(el);
el.value = (i++).toString();
};
})());
}

sortList(function(el) {
el.firstChild.nodeValue;
});

sortList(function(el) {
Math.random(); // wierd results!
});


Any ideas?

Twey
01-08-2008, 09:13 PM
Yes, Math.random() returns a number between 0 and 1 (and very rarely 0, obviously). In this case that means that it will almost always be above 0, which means "greater" in the context of the sorting function. Try this one:
function(a, b) { return Math.random() - 0.5; }

Trinithis
01-08-2008, 09:42 PM
Hmm, that doesn't work either. Mapping the randomized values seems to work fine though:



// tested

sortList((function() {
var m = {};
return function(el) {
var k = el.firstChild.nodeValue;
if(!m[k])
m[k] = Math.random();
return m[k];
};
})());


Which should be equivalent to


// untested

((function() {
var m = {};
return function(a, b) {
var ka = a.firstChild.nodeValue;
var kb = b.firstChild.nodeValue;
if(!m[ka])
m[ka] = Math.random();
if(!m[kb])
m[kb] = Math.random();
return m[ka] - m[kb];
};
})());

Twey
01-08-2008, 09:57 PM
js> [1, 2, 3, 4, 5].sort(function() { return Math.random() - 0.5; });
2,3,1,4,5
js> [1, 2, 3, 4, 5].sort(function() { return Math.random() - 0.5; });
5,1,4,3,2
js> [1, 2, 3, 4, 5].sort(function() { return Math.random() - 0.5; });
1,4,3,2,5
js> [1, 2, 3, 4, 5].sort(function() { return Math.random() - 0.5; });
2,4,3,1,5Define "doesn't work?" That looks pretty random to me.

Trinithis
01-08-2008, 10:16 PM
I made an array with values 1-200 inclusive. In Fx, 200 likes to stay at the end. In IE, 1 likes to stay at the end. (Before I reverse the array.)



<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Test</title>
</head>
<body><div>
<script type="text/javascript">

var a = [];
for(var i = 0; i < 200; ++i)
a[i] = i + 1;

function getRandomizer1() {
var m = {};
return function(a, b) {
var ka = a;
var kb = b;
if(!m[ka])
m[ka] = Math.random();
if(!m[kb])
m[kb] = Math.random();
return m[ka] - m[kb];
};
}

function getRandomizer2() {
return function() {
return Math.random() - 0.5;
};
}

// probably fair
for(var i = 0; i < 10; ++i)
document.write(a.slice().sort(getRandomizer1()).reverse() + "<br \/>");

document.write("<br \/><br \/>");

// almost certainly not fair
for(var i = 0; i < 10; ++i)
document.write(a.slice().sort(getRandomizer2()).reverse() + "<br \/>");


</script>
</div>
</body>
</html>

Twey
01-09-2008, 09:55 AM
Hm... well, with Math.random() - 0.5, you've got about-equal chances of it being higher or lower on a comparison, and almost no chance of it being equal. The element will probably be considered higher once and lower once in any sequence of two comparisons, and since it's compared to the penultimate item first, it stops if it rolls a higher and doesn't go anywhere. In the case of the last element that means that it is unlikely to move more than one space backwards. You could, if you wanted, bias it a bit:
function(a, b) { return Math.random() - 0.75; }

jscheuer1
01-09-2008, 04:34 PM
Random is a strange concept, especially to the ordered mind of your typical coder, like the three of us here. Not to say we are all three of us typical coders, just that we often tend to think in an ordered manner.

Once you move over into the realm of random, you have to realize that this truly means random. One of the characteristics of randomness is that it can require an infinite number of 'tries' to get all possible recombinations of your data set. Repeats are common. I believe that the method put forward by Twey here (originally in these forums by mwinter) is sufficient and that any addition to it either reduces the truly random nature of the exercise or has no real effect upon it. This would however be notwithstanding any browser bugs or glitches.

I have found in anecdotal testing (non-statistical, observing casually*), that the method works. You can get stuck in a repeating pattern, or pattern of sorts, though for quite some time and not see certain mixes at all, then suddenly that's all you will see. Randomness can look like that, as well as like the more orderly non-repetitive output that the ordered mind, at first blush, may expect from it.

*With a process that can take infinite tries to prove itself out, no statistical analysis is really possible.

Twey
01-09-2008, 05:26 PM
One of the characteristics of randomness is that it can require an infinite number of 'tries' to get all possible recombinations of your data set.Not in a limited-size set...
you have to realize that this truly means random.Well, actually it's pseudo-random. If you want true randomness you might want to try one of the entropy streams from random.org.

Trinithis
01-09-2008, 05:41 PM
The only reason why I think

function() { return Math.random() - 0.5; }
is skeptical is because the sorting algorithm is unknown. The same part of the list can be revisited after it had been "shuffled".



Care needs to be taken in implementing the Knuth shuffle; even slight deviations from the correct algorithm will produce biased shuffles. For example, working your way through the pack swapping each card in turn with a random card from any part of the pack is an algorithm with n^n different possible execution paths, yet there are only n! permutations, and a counting argument based on using the pigeonhole principle on assignments of execution paths to outcomes shows that this algorithm cannot produce an unbiased shuffle, unlike the true Knuth shuffle, which has n! execution paths which match up one-to-one with the possible permutations.




In a computer, shuffling is equivalent to generating a random permutation of the cards. There are two basic algorithms for doing this, both popularized by Donald Knuth. The first is simply to assign a random number to each card, and then to sort the cards in order of their random numbers. This will generate a random permutation, unless two of the random numbers generated are the same. This can be eliminated either by retrying these cases, or reduced to an arbitrarily low probability by choosing a sufficiently wide range of random number choices.

The second, generally known as the Knuth shuffle or Fisher-Yates shuffle, is a linear-time algorithm (as opposed to the previous O(n log n) algorithm if using efficient sorting such as mergesort or heapsort) which involves moving through the pack from top to bottom, swapping each card in turn with another card from a random position in the part of the pack that has not yet been passed through (including itself). Providing that the random numbers are unbiased, this will always generate a random permutation.


The first algorithm looks like the one I presented above using a closure over a hash. (The getRandomizer1().)

The second looks like:


Array.prototype.shuffle = function(deep) {
for(var i = this.length; i > 1; --i)
this.swap(i - 1, Math.floor(Math.random() * i));
return this;
};

Array.prototype.swap = function(i, j) {
var t = this[i];
this[i] = this[j]
this[j] = t;
};

jscheuer1
01-09-2008, 05:43 PM
Not in a limited-size set

Practically speaking yes, you are right. In actual fact, every time you flip the coin (a set of only 2, possibly 3 if you count the possibility of the coin landing and remaining on its edge, lets say 2 here), the chances are 50/50 (1 out of 2 for each outcome) again. So, it is possible that you will get heads forever or tails forever, not too likely though.

What Trinithis is saying though is as if the browsers are using a coin with two heads. That may be, but I tend to doubt it.

As for random vs pseudo random, I never looked into it deeply enough to be able to tell if there is any practical difference for the purposes of this discussion. Is there? If so, what might that be?

Trinithis
01-09-2008, 05:50 PM
John, the Law of Large numbers should probably be sufficient for our purposes, where "large" isn't too large :D. Also we shoud assume for the sake of the algorithm that the psuedo-random number generator is actually truly random.

Twey
01-09-2008, 06:01 PM
Oh, I see what you mean. Yes, that's true.

jscheuer1
01-10-2008, 03:25 AM
OK, nice Law. That still doesn't rule out what I said about statistical analysis of the random product on a small number of sample trials. The apparent lack of randomness could be just anecdotal.

I'm not worried too much about that though. The real issue is - are the browsers mentioned really screwing up or not? And (at least to me), if so - why?

It has been demonstrated already that under Windows at least, browsers can screw up floating point math, perhaps (if there really is a problem with .sort(function() { return Math.random() - 0.5; })) this has something to do with it.

Trinithis
01-10-2008, 05:48 AM
Perhaps to mitigate floating point arithmetic issues you could convert it to a string and use < or > for string comparisons if this is really an issue.

Now as for . . .

.sort(function() { return Math.random() - 0.5; })
This the issue of having more than n! permutations, say perhaps n^n permutations, which leads to biased results, where n is the number of items to be shuffled. The explanation:

Let's say we have a deck of 5 cards. This means there are 5! or 120 different ways to organize the deck. If you were to "shuffle" the deck by going through the bottom to the top, swapping the current card with any card in the deck (including itself and including the layer of the deck already shuffled), this would produce 5^5 or 3125 possible outcomes, depending on all the ways to interact all the cards with one another.

3125 is bigger than 120, which means that there must be at least one variation in the 120 outcomes that must be repeating. In the worst case scenario, there are 119 ways that occur once and 1 way that occurs 3006 times. Clearly this is biased. So to make things less dramatic, let's take the best case scenario where any one outcome occurs the same amount of times as any other outcome in the 3125 possibilities. This is arguably unbiased. There's only one problem: 3125 is not evenly divisible by 120. This means that outcome X is at least "one more time" (so to speak) more likely to happen than outcome Y.


The real issue is - are the browsers mentioned really screwing up or not?

Nope, the computer is not messing up; the algorithm is just flawed.

jscheuer1
01-10-2008, 12:38 PM
Here's an interesting function:


function testRand(){
var a=[1,2,3,4,5], c=0;
while (a.join(',')!=a.sort(function(){return Math.random()-0.5}))
c++;
return c;
}

The typical value returned in Opera (under 100) and FF (under 500) is usually low with Opera typically lowest, IE (over 1000) is typically the highest.

I'm not sure what, if anything this really tells us though.

Trinithis
01-10-2008, 06:41 PM
John, that probably only just tells us what sorting algorithm the browser uses or something. I would take it with more salt if you custom wrote the sort algorithm.

Here's some code that performs both shuffling algorithms and finds the distribution of their outcomes. Then it minuses the expected outcome (as if everything came out evenly) from the results and displays them.

The Math.random() - 0.5 is left in the dust. Both IE and Fx give similar results.

Note: Each row adds up to zero.

Example outcome


-72 77 -19 38 -24
-17 15 -1 -37 40
-133 9 69 96 -41
174 10 -2 -152 -30
48 -111 -47 55 55



668 -2654 3357 1338 -2709
617 -2584 3482 1104 -2619
435 -324 -1308 3120 -1923
114 4413 -2097 -2123 -307
-1834 1149 -3434 -3439 7558




<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Test</title>
</head>
<body><div>
<script type="text/javascript" src="http://trinithis.awardspace.com/TextFormatter/TextFormatter.js"></script>
<script type="text/javascript">

var tf = new TextFormatter(document.body.firstChild).setFontFamily("monospace");


Array.prototype.clone = function(deep) {
var a = this.slice();
if(deep)
for(var i = a.length - 1; i >= 0; --i)
if(a[i].clone instanceof Function)
a[i] = a[i].clone(deep);
return a;
};

if(!Array.prototype.forEach)
Array.prototype.forEach = function(f /*, context */) {
for(var i = 0, n = this.length, t = arguments[1]; i < n; ++i)
f.call(t, this[i], i, this);
};

if(!Array.prototype.map)
Array.prototype.map = function(f /*, context */) {
for(var r = [], i = 0, n = this.length, t = arguments[1]; i < n; ++i)
r[i] = f.call(t, this[i], i, this);
return r;
};

var a = [1, 2, 3, 4, 5];

function getRandomizer1() {
var m = {};
return function(a, b) {
var ka = a;
var kb = b;
if(!m[ka])
m[ka] = Math.random();
if(!m[kb])
m[kb] = Math.random();
return m[ka] - m[kb];
};
}

function getRandomizer2() {
return function() {
return Math.random() - 0.5;
};
}

var r1 = [
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0],
[0,0,0,0,0]
];
var r2 = r1.clone(true);

var n = 5000;

for(var i = n * a.length; i > 0; --i) {
a.slice().sort(getRandomizer1()).forEach(function(v, i) {
++r1[i][v - 1];
});
a.slice().sort(getRandomizer2()).forEach(function(v, i) {
++r2[i][v - 1];
});
}

for(var i = 0; i < r1.length; ++i) {
tf.println(r1[i].map(function(v) {
return v - n;
}).join("\t"));
}

tf.println("\n\n");

for(var i = 0; i < r2.length; ++i) {
tf.println(r2[i].map(function(v) {
return v - n;
}).join("\t"));
}


</script>
</div>
</body>
</html>

Trinithis
01-11-2008, 03:36 AM
Yeah, I'm almost positive that your results differ between browsers because of the sort algorithm. I made my own sort, and the results between IE and Fx are practically the same. Switching the highlighted qsort with sort gives differing results if you want to try it.



<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Test</title>
</head>
<body><div>
<script type="text/javascript" src="http://trinithis.awardspace.com/jslib/Object.js"></script>
<script type="text/javascript" src="http://trinithis.awardspace.com/jslib/Array.js"></script>
<script type="text/javascript" src="http://trinithis.awardspace.com/jslib/Function.js"></script>
<script type="text/javascript" src="http://trinithis.awardspace.com/jslib/TextFormatter.js"></script>
<script type="text/javascript">
var tf = new TextFormatter(document.body.firstChild).setFontFamily("monospace");

Array.prototype.qsort = function(f) {
var g = function(a, b) {
return f(a, b) > 0;
};
var innerSort = function() {
if(this.length == 0)
return this;
var pivot = this.pop();
var part = this.partition(g.curry()(pivot));
return innerSort.call(part[0]).append(pivot).concat(innerSort.call(part[1]));
};
for(var q = innerSort.call(this), i = q.length - 1; i >= 0; --i)
this[i] = q[i];
return this;
};

function testRand() {
var a = [1, 2, 3], c = 0;
while(a.join(',') != a.qsort(
function() {
return Math.random() - 0.5;
})
) c++;
return c;
}

var tot = 0;
var n = 500;

for(var i = n; i > 0; --i)
tot += testRand();

tf.println(tot / n);

</script>
</div></body>
</html>

jscheuer1
01-24-2008, 06:48 AM
Can those scripts be used without the external ones from your site? I'm about to play with them, but saw you were on, so thought I would ask.

Trinithis
01-24-2008, 07:05 AM
Go ahead and download those js files. I was too lazy to pick out the exact functions from them. They were mainly just for convenience.

jscheuer1
01-24-2008, 07:53 AM
I don't know what your test is supposed to prove because I can't read its logic just looking at the code, and the output doesn't speak to me either. Perhaps some explanation is in order. I may just be being dense. I made up my own test though:


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<script type="text/javascript">
var a=[1,2,3],
trial=[123,132,213,231,321,312],
totals=[0,0,0,0,0,0];
function as(){
a.sort(function(){return 0.5 - Math.random();});
for (var i = trial.length-1; i > -1; --i)
if(a.join('')==trial[i])
totals[i]++;
document.getElementById('srt').firstChild.nodeValue=a.join('');
document.getElementById('tot').firstChild.nodeValue=totals.join('\xa0 ');
}
function assess(){for (var i = 1000; i > -1; --i)as();};
window.onload=assess;
</script>
</head>
<body>
<div>Current Order:<span id="srt">&nbsp;</span><span>&nbsp;Distribution:&nbsp;</span><span id="tot">&nbsp;</span></div>
<input type="button" onclick="assess();" value="Randomize 1000 Times">
</body>
</html>

I think it 'randomizes' an array with the numbers 1 2 3 in it 1000 times at a pop, checking which of the six possible results it gets each time and keeping track of them. The distribution seems even enough to me to demonstrate randomness or something very close to randomness. I used:


a.sort(function(){return 0.5 - Math.random();});

as the only randomizing agent.

jscheuer1
01-24-2008, 09:46 AM
Ooops, I've found that if I return the array to its natural order of 123 before each randomization, a pattern emerges. I've also found that randomizing 4 times each time seems to about get it. But I'm wondering if that's proportional to the number of items in the array or not.

Trinithis
01-24-2008, 05:11 PM
I think it is proportional to the number of items in the list. From casual inspection, it seems that if the list is larger, the greater the chance for one of the items (such as the first item) to hug an end.



I don't know what your test is supposed to prove because I can't read its logic just looking at the code, and the output doesn't speak to me either. Perhaps some explanation is in order.

I basically did what you did with your

function testRand(){
var a=[1,2,3,4,5], c=0;
while (a.join(',')!=a.sort(function(){return Math.random()-0.5}))
c++;
return c;
}
Except I replaced sort with a known sorting implementation. I just happened to make a cheesey qsort for the heck of it. I could have used bubble sort for all that mattered, but boredom forced me to experiment.

In any case once the sorting implementation was held constant, I in effect removed a "lurking variable" (statistics lurking variable). From this, I saw that with your test, browsers outputted similar results.

jscheuer1
01-24-2008, 11:44 PM
I'm still not following you, but your math skills in this area are obviously better, or at least less rusty than mine. I did come up with a asort array prototype function that takes an array and alters it randomly*:


Array.prototype.asort=function (){
if(typeof this.splice=='undefined')
this.sort(function(){return 0.5-Math.random();});
else{
var a=[], b=0, c;
for (var i = this.length-1; i > -1; --i)
a=this[i];
while(a.length){
this[b++]=a[c=Math.floor(Math.random()*a.length)];
a.splice(c,1);
};};};

After 1002001 iterations on a 5 member array (I was doing other things), the top result (this is out of 120 possible outcomes) occurred 8593 times, the bottom 8129. Giving a spread of 464. Pretty darn random. And, doing this just once, as would be the case in a script, isn't all that intensive.



*Usage:


[I]arry_name.asort();