Ch. 3 Discrete Distributions

Sections covered: 3.1 - 3.6

Skip: 3.6 “The Poisson Distribution as a Limit” (pp. 132-33) and “The Poisson Process” (pp. 134-135)

There is no direct way in R to calculate the mean and variance of a distribution given its pmf. However we can use R to make the calculations simpler.

3.3 Expected value

x <- 1:5
x
## [1] 1 2 3 4 5
px <- c(.1, .15, .2, .25, .3)
px
## [1] 0.10 0.15 0.20 0.25 0.30
x*px
## [1] 0.1 0.3 0.6 1.0 1.5
sum(x*px)    # E(X)
## [1] 3.5

3.3 Variance

x - 3.5
## [1] -2.5 -1.5 -0.5  0.5  1.5
(x - 3.5)^2
## [1] 6.25 2.25 0.25 0.25 2.25
((x - 3.5)^2)*px
## [1] 0.6250 0.3375 0.0500 0.0625 0.6750
sum(((x - 3.5)^2)*px)   # V(X)
## [1] 1.75

3.3 Variance (alternative method)

x
## [1] 1 2 3 4 5
px
## [1] 0.10 0.15 0.20 0.25 0.30
x^2
## [1]  1  4  9 16 25
(x^2)*px
## [1] 0.1 0.6 1.8 4.0 7.5
sum((x^2)*px)  # E(X^2)
## [1] 14
14-3.5^2     # E(X^2) - [E(X)]^2
## [1] 1.75

3.4 Binominal Theorem

p. 121 Using binomial tables for the cumulative distribution function (cdf) is optional; you may use R (see below) or www.stattrek.com instead.

For tests you will not need to calculate these values. You can leave your answers as summations, for example:

\(\Sigma_{x=0}^6 \left(\begin{array}{c}12\\ x\end{array}\right)(.3^x)(.7^{12-x})\)

R

Probability mass function (pmf)

\(P(X = x)\)

choose(8, 3)    # "8 choose 3"
## [1] 56
56*.6^3*.4^5     # P(X = 3) given n = 8, p = .6
## [1] 0.123863

Direct method

dbinom(3, 8, .6)     # P(X = 3) given n = 8, p = .6
## [1] 0.123863

Cumulative distribution function (cdf)

\(P(X \leq x)\)

Find \(P(2 \leq X \leq 4)\) given \(p = .7, n = 15\)

pbinom(4, 15, .7) - pbinom(1, 15, .7)
## [1] 0.0006717175

(Using the pmf instead:)

dbinom(2, 15, .7) + dbinom(3, 15, .7) + dbinom(4, 15, .7)
## [1] 0.0006717175

Graphing a binomial pmf:

ex. \(p = .7, n = 10\)

px <- dbinom(0:10, 10, .7)
# adding las = 1 turns the y-axis tick mark labels horizontal, which are easier to read
barplot(px, names.arg = 0:10, las = 1, col = "lightblue")

3.5 Hypergeometric

Note that the notation that R uses is different from the Devore textbook:

parameter Devore R
total successes M m
total failures N-M n
sample size n k
successes in sample x x

Example (p. 127)

Devore: h(x; n, M, N)

P(X = 2) = h(2; 10, 5, 25) –>

dhyper(x = 2, m = 5, n = 20, k = 10)
## [1] 0.3853755

3.6 Poisson

Example (p. 132)

p(3;2) =

dpois(3,2)
## [1] 0.180447

F(3;2) =

ppois(3, 2)
## [1] 0.8571235

Practice Exercises

  1. (Binomial) Suppose the probability of a car accident involving a single vehicle is .7. If 15 accidents are randomly selected, what is the probability that between 2 and 4, inclusive, involve a single vehicle? (from slides)
dbinom(2, 15, .7) + dbinom(3, 15, .7) + dbinom(4, 15, .7)
## [1] 0.0006717175

or

pbinom(4, 15, .7) - pbinom(1, 15, .7)
## [1] 0.0006717175

  1. (Binomial) A particular telephone number is used to receive both voice calls and fax messages. Suppose that 25% of the incoming calls involve fax messages, and consider a sample of 25 incoming calls. What is the probability that

    1. At most 6 of the calls involve a fax message?

    2. Exactly 6 of the calls involve a fax message?

    3. At least 6 of the calls involve a fax message?

    4. What is the expected number of calls among the 25 that involve a fax message?

    5. What is the standard deviation of the number among the 25 calls that involve a fax message?

    6. What is the probability that the number of calls among the 25 that involve a fax transmission exceeds the expected number by more than 2 standard deviations? (Textbook 50-51)

pbinom(6, 25, .25)
## [1] 0.5610981
dbinom(6, 25, .25)
## [1] 0.1828195
1 - pbinom(5, 25, .25)
## [1] 0.6217215
E <- 25*.25
Var <- 25*.25*.75
pbinom(floor(E+2*sqrt(Var)), 25, .25)
## [1] 0.9703301

  1. (Hypergeometric) A geologist has collected 8 specimens of basaltic rock and 12 specimens of granite. The geologist instructs a laboratory assistant to randomly select 15 of the specimens for analysis.

    1. What is the probability that all specimens of one of the two types of rock are selected for analysis?

    2. What is the probability that the number of granite specimens selected for analysis is within 1 standard deviation of its mean value? (Textbook 71)

dhyper(8, 8, 12, 15) + dhyper(3, 8, 12, 15)
## [1] 0.05469556
mean_g <- 15*12/20
sd_g <- sqrt(20*(12/20)*(8/20))
pbinom(mean_g + sd_g, 15, 12/20)-pbinom(mean_g - sd_g, 15, 12/20)
## [1] 0.8144507

  1. (Negative Binomial) The probability that a randomly selected box of a certain type of cereal has a particular prize is .2. Suppose you purchase box after box until you have obtained two of these prizes.

    1. What is the probability that you purchase \(x\) boxes that do not have the desired prize?

    2. What is the probability that you purchase four boxes?

    3. What is the probability that you purchase at most four boxes?

    4. How many boxes without the desired prize do you expect to purchase? How many boxes do you expect to purchase? (Textbook 75, homework)

Let x be the number of boxes without prizes to be purchased, \(p = {x+2-1\choose1}(.2^2)(.8 ^x)\)

dnbinom(2, 2, .2)
## [1] 0.0768
dnbinom(2, 2, .2) + dnbinom(1, 2, .2) + dnbinom(0, 2, .2)
## [1] 0.1808
2*.8/.2
## [1] 8
# 8 boxes without desired prize, 10 boxes in total

  1. (Poisson) Suppose that the number of drivers who travel between a particular origin and destination during a designated time period has a Poisson distribution with parameter \(\mu = 20\). What is the probability that the number of drivers will

    1. Be at most 10?

    2. Be 10?

    3. Be within 2 standard deviations of the mean value?

    (Textbook 81)

ppois(10, 20) # a
## [1] 0.01081172
dpois(10, 20) # b1
## [1] 0.005816307
((exp(1)^-20)*(20^10))/factorial(10) # b2
## [1] 0.005816307
ppois(20 + 2*sqrt(20),20) - ppois(20 - 2*sqrt(20), 20) # c
## [1] 0.9442797