Unit 2 Probability

3.2 Conditional Probability and the Multiplication Rule

Open the Global Health data set, and use it to complete the table below with the correct number of countries for each cell:

Average Income ≤ $28000 Average Income > $28000 Total Number
Life Expectancy ≤ 79

19

2

21

Life Expectancy >79

5

20

25

Total

24

22

46

The table you created above is called a contingency table. It has that name because it shows how the individuals are distributed along each variable, contingent on the value of the other variable. We will use this table to calculate some probabilities.

Does there appear to be a relationship between the wealth and life expectancy of a country? Yes

Does this seem to be a representative sample from which to make conclusions? Yes

The Multiplication Rule

\(P(A \text { and } B)=P(A) P(B | A)\)

\(P(B | A)\) is called a conditional probability. It represents the probability of event B occurring AFTER it is assumed event A has already occurred.

The Multiplication Rule with Independent Events

The occurrence of Event A does not affect the probability that Event B will occur and the occurrence of Event B does not affect the probability that Event A will occur. You may see the words “with replacement” when finding the probability of selecting multiple items.

When A and B are independent events, \(\underline{\mathrm{P}}(\mathrm{B} | \mathrm{A})=\mathrm{P}(\mathrm{B})\). This means you can simplify the multiplication rule to:

\(P(A \text { and } B)=P(A) P(B)\)

The Multiplication Rule with Dependent Events

When A and B are dependent events, the occurrence of Event A affects the probability that Event B will occur and the occurrence of Event B affects the probability that Event A will occur. You may see the words “without replacement” when finding the probability of selecting multiple items.

When A and B are dependent events, \(P(B | A)\) must be used as the probability Event B will occur. We have assumed Event A has occurred and Event A affects the probability that Event B will occur. You must use the original multiplication rule:

\(P(A \text { and } B)=P(A) P(B | A)\)

The Multiplication Rule with Contingency Tables (Dependent and Independent Events)
Average Income ≤ $28000 Average Income > $28000 Total Number
Life Expectancy ≤ 79

19

2

21

Life Expectancy >79

5

20

25

Total

24

22

46

  1. If you randomly select one country, what is the probability of ...?
    1. P(Life Expectancy >79) =

      25/46 = 0.543

    2. P(Life Expectancy ≤ 79 and Average Income > $28000) =

      2/46 = 0.043

  2. Independent Compound Events: If you randomly select two countries, with replacement, what is the probability of ...?
    1. P(Both Average Income > $28000) =

      (22/46)(22/46) = 0.229

    2. P(Average Income > $28000, and then Life Expectancy >79) =

      (22/46)(25/46) = 0.260

  3. Dependent Compound Events: If you randomly selected two countries, without replacement, what is the probability of ...?

    P(Both Average Income > $28000) =

    (22/46)(21/45) = 0.223

  4. Conditional Probability

    What is the percentage breakdown by Life Expectancy if we separate the wealthier and poorer countries?

    1. P(Life Expectancy ≤ 79, given that Average Income > $28000) =

      2/22=0.091

    2. Another notation for this : P(Life Expectancy ≤ 79 ǀ Average Income > $28000) =

      2/22 = 0.091

    3. P(Life Expectancy ≤ 79, given that Average Income ≤ $28000) =

      19/24 = 0.792

    4. Another notation for this : P(Life Expectancy ≤ 79 ǀ Average Income ≤ $28000) =

      19/24 = 0.792

    5. When we limit ourselves to a specific column or row, so that we can explore the distribution of one variable based on whether they meet the criteria of another variable, we have created a conditional distribution. Why do you think it got the name “conditional”?

  5. Consider an illness that has a prevalence of about 1 in 1000, and the test for this illness has a 95% accuracy (meaning that patients who have the illness test positive 95% of the time). The results for a large number of patients tested would look something like this:
    Tests negative Tests positive Total Number
    Has illness 1 19 20
    Does not have illness 18981 999 19980
    Total 18982 1018 20000
    1. If someone tests positive, what is the probability that they actually have the illness?

      P(have illness, given tested positive) = 19/1018 = 0.0187 = 1.87%

    2. If someone tests negative, what is the probability that they actually have the illness?

      P(have illness, given tested negative) = 1/18982 = 0.000053 = 0.0053%

    3. If someone tests positive, what is the probability they DO NOT have the illness?

      P(does not have illness, given tested positive) = 999/1018 = 0.9813 = 98.13%

    4. Why would it be important for doctors to understand these probabilities?

      So they can give patients accurate information about the results of tests…it would be very wrong to tell a patient who tested positive, for example that there is a 95% chance they have this illness because the test is 95% accurate.

  6. The following contingency table represents a bowl full of M&Ms and Skittles. Find the probabilities:
    Red Candy Yellow Candy Green Candy Blue Candy Orange Candy Brown or Purple Total Number
    M&Ms 109 102 142 183 187 105 828
    Skittles 202 188 145 0 154 156 845
    Total Number 311 290 287 183 341 261 1673
    1. One randomly selected a piece of candy:
      1. \(P(\mathrm{m} \& \mathrm{m} | \mathrm{Red})\)

        \(\frac{109}{311}=0.3505\)

      2. \(P(\operatorname{Red} | \mathrm{m} \& \mathrm{m})\)

        \(\frac{109}{828}=0.1316\)

      3. \(P(\text { Skittle } | \text { Yellow })\)

        \(\frac{188}{290}=0.6483\)

      4. \(P(\mathrm{Yellow} | \mathrm{Skittle})\)

        \(\frac{188}{845}=0.2225\)

    2. Two randomly selected pieces of candy:
      1. \(P(\text { Both Blue })\) with replacement

        \(\left(\frac{183}{1673}\right)\left(\frac{183}{1673}\right)=0.011965\) independent events

      2. \(P(\text { Both Blue })\) without replacement

        \(\left(\frac{183}{1673}\right)\left(\frac{182}{1672}\right)=0.011907\) dependent events

      3. \(P(\text { Both Orange Skittles })\) with replacement

        \(\left(\frac{154}{1673}\right)\left(\frac{154}{1673}\right)=0.008473\) independent events

      4. \(P(\text { Both Orange Skittles} )\) without replacement

        \(\left(\frac{154}{1673}\right)\left(\frac{153}{1672}\right)=0.008423\) dependent events

  7. Republicans Democrats Independents Total Number
    Males 46 39 1 86
    Females 5 9 0 14
    Total Numer 51 48 1 100
    1. Randomly select 1 person
      1. P (Female | Democrat)

        \(\frac{9}{48}=.1875\)

      2. P (Republican | Female)

        \(\frac{5}{14}=.3571\)

    2. Randomly select 2 people:
      1. P (Both Female) with replacement

        \(\frac{14}{100} \times \frac{14}{100}=.0196\)

      2. P (Both Female) without replacement

        \(\frac{14}{100} \times \frac{13}{99}=.0184\)

      3. P (Both Democrat) with replacement

        \(\frac{48}{100} \times \frac{48}{100}=.2304\)

      4. P (Both Democrat) without replacement

        \(\frac{48}{100} \times \frac{47}{99}=.2279\)

      5. P(Democrat then Republican) without replacement

        \(\frac{48}{100} \times \frac{51}{99}=.2473\)

      6. P(Democrat then Republican) with replacement

        \(\frac{48}{100} \times \frac{51}{100}=.2448\)

  8. Example with Scientific Notation:
    1. You have 20,000 CD’s in a warehouse. 2% are defective. You choose 3 CD’s from the 20,000.

      .02(20000) = 400; 2% of the 20000 are defective

    2. P (3 CD’s are defective) with replacement =

      \(\frac{400}{20000} \times \frac{400}{20000} \times \frac{400}{20000}=8 E-6=8 \times 10^{-6}=.000008\)

      Another notation when you have an independent multiplication probability: \((.02)^3 = 0.000008\)

    3. P (3 CD’s are defective) without replacement =

      \(\frac{400}{20000} \times \frac{399}{19999} \times \frac{398}{19998}=7.94 \times 10^{-6}=0.00000794\)

Some calculations are cumbersome, but they can be made manageable by using the common practice of treating events as independent when small samples are drawn from large populations. In such cases, it is rare to select the same item twice.

The 5% Guideline for Cumbersome Calculations: If a sample size is no more than 5% of the size of the population, treat the selections as being independent (even if the selections are made without replacement, so they are technically dependent).

So, in the example above, we could treat the events as independent, even though the CDs are being selected “without replacement”.

Practice with Scientific Notation

Write the following in Exponential Notation and Standard Notation:

  1. 3.56 E -4

    \(3.56 \times 10^{-4}=.000356\)

  2. 7.83 E 5

    \(7.83 \times 10^{5}=783000\)