Powered by TypePad

« Saturday Morning! | Main | Todd Zywicki, A Man For All Seasons »

April 21, 2012

Comments

boris

"Once the pool of 30 has been determined, the fact of non-replacement means ..."

(assume 10%B) Seems to me the odds of 28W +2B is greater than the odds of 26W+4B.

That weights the odds in favor of Ws.

Danube of Thought

"I'd rather rely on the logic that selecting 30 from a large population, then six from that 30, has exactly the same probability distribution as selecting six from a large population."

If you concede that that's the logic, then of course that's the end of the discussion. The reason I don't concede it is that we have two discrete steps: picking the pool, then picking the jury. In step one, the likelihood of ending up with a jury that is more--or less--likely to mirror the general population is identical, and random. Once that likelihood is fixed, the fact of non-replacement only operates in one direction: each successive white juror seated makes it less likely, without replacement, that the next one will be white.

jimmyk

But the pool hasn't been determined. As of now, the population is large. My last explanation. Use two large (100,000+) jars of marbles, each with 10% black, 90% white.

Jar 1: Pull six marbles out at random. We agree that non-replacement doesn't matter.

Jar 2: Pull 30 out at random, and then six from that 30.

I will maintain to my dying day that the odds of getting six white marbles is the same in either case. I could prove it mathematically, but I hear math is frowned upon.

Clarice

I really hate when you guys start that statistics baby talk.

MJW

I will maintain to my dying day that the odds of getting six white marbles is the same in either case. I could prove it mathematically, but I hear math is frowned upon.

I think anyone who dispute that should address my earlier point. If each marble is uniquely marked, then the total number of possible unique sets is the same in either case, and the probability of choosing any particular set is the same, since nothing in either selection process makes one marble more likely to be picked then another. The probability of picking a set of all white marbles is just the sum of the probabilities over the sets of white marbles.

Danube of Thought

"Jar 2: Pull 30 out at random, and then six from that 30."

And at that point, non-replacement does matter. Pick any mix of black and white marbles you choose for the thirty from jar 2. Once you select one from that mix, if it is black the experiment is over. If it is white, the likelihood that the next one is white is slightly lower than it would be if the first marble had been replaced by one randomly selected from the original pool.

boris

While walking the dog just now I calculated the basic odds using 10 instead of 30.

Assuming 10%B, I got .35 as the odds of 10W.
Odds of at least one B is therefore .65.

Odds of exactly 1B is going to be about .39 ...

10W = .35
9W = .39
8W or less = .26

Getting 6W out of 10W = 100% = .35
Getting 6W out of 9W 1B = .4 x .39 = .16

Already you can see we are up to .51 for 6W instead of .4

The last category only needs to add up .02 to make jimmyk correct.

Danube of Thought

"then the total number of possible unique sets is the same in either case"

But the processes of arriving at the final sets are not the same, and the probabilities of a given outcome are not the same, because with replacement some of the unique marbles get to be used twice.

Suppose instead of a pool of 30 and a jury of six, it was a pool of three and a jury of two. Probability of two blacks from jar 1 is one in 20,000 (or whatever). The probabilities of each of the possible permutations of three in the initial cut from jar 2 are whatever they are. The trials resulting in three blacks and three whites are a wash, and no test.

But in a trial where we have one white and two blacks after the first cut, on one-third of the next cuts we have one white juror, but without replacement we have zero chance of an all-white jury. With replacement--where we bring forward one of the unique marbles that was previously eliminated--our chances are still alive.

boris

Lets do pool of 10 ... jury of 3 (non replace) pop=10%B.

Odds of 10W = .35
Odds of 9/1 = .39
Odds of 8/2 = .19
Odds of 7/3 = .06
Odds of 6/4 = .01

Odds of 3W out of 10W = 100% (*.35) = .35
Odds of 3W out of 9/1 = 0.7 (*.39) = .27
Odds of 3W out of 8/2 = 0.47 (*.19) = .09
Odds of 3W out of 7/3 = 0.29 (*.06) = .02
Odds of 3W out of 6/4 = 0.17 (*.01) = .00

.35 +.27 +.09 +.02 = .73

Odds of pulling 3W out of the pop = .73

jimmyk is correct

cboldt

Hypergeometric probability calculator.

MJW

But the processes of arriving at the final sets are not the same, and the probabilities of a given outcome are not the same, because with replacement some of the unique marbles get to be used twice.

It doesn't matter how the final sets were chosen: if the probability of each unique jury being selected is the same as the probability of any other jury being selected, then the probability of selecting an element of any subset of the juries (such as all-white) is simply the number of members of the subset divided by the total number of unique juries.

Consider your example, and assume for simplicity there's an infinite population to choose from, and the probability of selecting a white person is 3/4, a black person, 1/4.

The probability all whites would be selected for the 3 person pool is (3/4)^3.

The probability 1 black person is selected is the probability the first juror in back and the next 2 white; plus the probability the first is white, the second black, and the third white; plus the probability the first 2 are white, the last black. So it is, 3 * (1/4) * (3/4)^2 = (3/4)^3.

Likewise, the probability of 2 blacks and a white in the pool is 3 * (1/4)^2 * (3/4) = 3^2/4^3.

The possibility of 3 blacks is (1/4)^3.

For a sanity check, sum the probabilities. The total is 1, as expected.

If the pool is all white, the probability of selecting an all-white jury from the pool is 1. If the pool has 1 black, the probability is 1/3. If there are 2 or 3 blacks in the pool, the probability is 0.

The probability of selecting an all-white jury using the pool method is:
1 * (3/4)^3 + (1/3) * (3/4)^3 = 9/16

The probability of selecting an all-white jury from the population is likewise 9/16.


MJW

A brief follow-up. First, when I said to assume an infinite population, I meant assume a population so large that non-replacement can be ignored. Obviously you can't choose elements from an infinite population with equal probability.

Second, the argument doesn't work if replacement is allowed in the population selection and the pool selection because it alters probability that specific juries will be selected. Even with replacement, the probability a jury with a duplicate member will be selected from a very large population is negligible. On the other hand, if a pool is selected from the population (with or without replacement), then the jury is selected from the pool with replacement, the probability the jury will contain a duplicate member is high.

MJW

Following up my follow up, it isn't true that you can't choose an element form an infinite set with equal probability, though of course that probability has to be zero. What is true -- but not obviously so -- is that you can't choose the elements of a countably infinite set with equal probability.

Danube of Thought

"if the probability of each unique jury being selected is the same as the probability of any other jury being selected"

But it isn't--that is the proposition to be tested. Under one test, we pause on the way to selecting that unique jury and, if we choose to do so, put a previously-excluded candidate back into the pool from which we complete the selection. Thus, from the point of the pause forward, the outcome is not unique.

MJW

But it isn't--that is the proposition to be tested. Under one test, we pause on the way to selecting that unique jury and, if we choose to do so, put a previously-excluded candidate back into the pool from which we complete the selection.

Consider any two possible juries. Nothing in the pool selection process makes the selection of one of the juries more likely than the other. Each possible jury has exactly the same probability of being selected as any other possible jury, because nothing in the process takes into account any property of the jury composition.

MJW

The first paragraph in my previous comment was quoting DoT, and should have been in italics.

MJW

But it isn't--that is the proposition to be tested. Under one test, we pause on the way to selecting that unique jury and, if we choose to do so, put a previously-excluded candidate back into the pool from which we complete the selection.

I'm confused by what you mean by that. Choosing with replacement means putting a previously accepted element back in the pool, allowing for the possibility the same element can be selected more than once. As I mentioned before, this increases the probability of selections with duplicate elements relative to selection, even with replacement, from the original population.

jimmyk

And at that point, non-replacement does matter.

True, but that's beside the point, because the composition of the pool was randomly chosen to begin with.

Here's perhaps an easier case: Suppose a jar with a million marbles half black and half white. You take two at random. What's the probability they're both white? Obviously 1/4. (At least, so close that we don't worry about non-replacement.)

Now suppose you accidentally took four, but without looking at them, randomly threw two back into the jar. Would you not agree that the the probability the two you kept are both white is still 1/4?

Again, you can do the math: The probabilities of the four are as follows:

4B: 1/16 (0)
4W: 1/16 (1)
3B,1W: 1/4 (0)
3W,1B: 1/4 (1/2)
2W,2B: 3/8 (1/6)

The number in parentheses is the probability of getting both white in each case from sampling without replacement. Now multiply and add: 1/16+1/8+3/48 and you get 1/4. QED


Danube of Thought

"Choosing with replacement means putting a previously accepted element back in the pool"

No. Remember, we are down to three remaining marbles now, and we are testing the likelihood of no black marbles remaining at the conclusion of the process.

If there are three black marbles, we are bound to have at least one black among the final two; if there are three whites, it is certain we will have none. So let's suppose we have two blacks and a white, and we are to select our jury of two from those remaing three, without replacing (from the universe into the pool) those that we select. It is a certainty that we will have at least one black at the conclusion.

But if replacement is allowed, if the first marble chosen is white, there is a possibility that the second--and the third--will be white as well, because with each selection we replenish the pool from the universe. Thus the ability to replace increases the likelihood that we can get all white marbles on the jury.

boris

Using replacement you can have a jury of any size from a pool of 1.

That's just the same as picking the jury from the base population.

Danube of Thought

I'll submit on the briefs.

Nytol.

Ignatz

See what threads degenerate into without a periodic injection of shapely bottoms?
Sad.

MJW

No. Remember, we are down to three remaining marbles now, and we are testing the likelihood of no black marbles remaining at the conclusion of the process.

It doesn't matter if you think of it in terms of the rejecting some marbles, as you seem to be, or accepting the non-rejected marbles, as I prefer. The result is the same.

Worrying about choosing from the pool with replacement seems completely pointless to me. Yes it changes the odds compared to choosing without replacement from the pool, but that has nothing to do with the problem of choosing a pool from a population and then choosing a group from the pool, both without replacement.

I think it's very helpful to not just consider black and white marbles, but suppose each marble is individually, randomly numbered. There are C(n, 6) unique sets of 6 marbles. If a pool of 20 is selected at random, then from the pool 6 are taken, each of the original unique sets of 6 has the same chance of being the final selection, because the nothing in the selection process depended on the jury composition.

With apologize to those who thought no math would be required:

If the population is "n," the pool size, "p," and the jury size "j,"

The probability any specific jury is selected from the population is 1/C(n, j), which is j! * (n - j)! / n!.

The probability a pool will contain all j members of the specific juries can be found by removing those members, then figuring out the number of p - j sized groups. That value divided by the total number of pools is the probability a pool contains all the members, thus it is:

C(n - j, p - j) / C(n, p)

(n - j)!
---------
(p - j)! * (n - p)!
---------------------
n!
---------------
p! * (n - p)!

= (n - j)! * p! / [n! * (p - j)!]

If a pool contains all the members of the specific jury, the probability they'll be chosen is 1/C(20, 6) = j! * (p - j)! / p!.

Multiplying by the probability the pool contains all the members gives the probability the specific jury will be selected. It is,
j! * (n - j)! / n!.

This is the same as the probability the jury will be selected from the population.


MJW

If a pool contains all the members of the specific jury, the probability they'll be chosen is 1/C(p, j) ...

MJW

With apologies ...

(I'm not sure why I feel the need to correct some errors but not others. If I corrected every typing and spelling error I made, threads would consist of nothing but corrections and corrections of corrections.)

The comments to this entry are closed.

Wilson/Plame