Hypothesis testing

One sample Z hypothesis test for Proportion

Null hypothesis H₀: already known, established, default, status quo, old, pre-existing, current practice, well-known, working assumption, nothing new, boring. The (generic) parameter φ equals some number a; there is no difference.
Alternative hypothesis H_A: new, exciting, hoped/wished, changed, different, research, challenger, the conjecture. Either the parameter p<a, or p>a, or p≠a; there is a difference, there is an effect.
Test if the sample (i.e. its statistic and its size, n) provides enough evidence to overthrow ("warrant rejection of") the null hypothesis. Is the sample statistic extreme enough.
Either "reject" or "fail to reject" the null hypothesis; never "accept" it. Rejecting it ≡ "support" the alternative.
The alternative hypothesis is neither rejected nor accepted.
Nothing is ever "proven". (would need entire population to prove anything)

1-PropZTest for proportion p. Uses #yeses or p̂, p, and n. Test statistic is z.
Binary nominal data. Normal distribution is approximating a Binomial distribution.

The test statistic is a measure of discrepancy between a sample statistic and the H₀ claimed value of the population parameter.

Given null hypothesis H₀: parameter p = a
Choose one:
H_A: parameter p < a	"H_A < H₀"	Left-tailed
H_A: parameter p > a	"H_A > H₀"	Right-tailed
H_A: parameter p ≠ a	"H_A ≠ H₀"	Two-tailed

Uses #yeses or p̂, p, n
p:
p̂: OR #Yeses:
n:

np: nq: Both should be ≥5

Skip: Power: specific value of p: α: = p̂:

z:		Standard error=√(pq)/√n:

Critical value:
One-tailed: α=0.10:±1.280 α=0.05:±1.645 α=0.01:±2.324
Two-tailed: α=0.10:±1.645 α=0.05:±1.960 α=0.01:±2.576
If Left-tailed and z≤-CritValue then Reject H₀ at that α level.
If Right-tailed and z≥CritValue then Reject H₀ at that α level.

p-value (CDF(z)): if p_value < α, reject H₀

Chance that the test statistic would be as much or more if H₀ were true.
"If the p is low, the null must go."
Typically the critical/rejection region ("level of significance", α) is chosen to be .05 or .01, so if p is less than it reject H₀; if p is not less than the critical value don't reject H₀ ("fail to reject").
Probability (area) in a tail (or two) of the test statistic's PDF curve.
If p is high (bigger than α), can't reject H₀.
Selecting Two-tailed case doubles the p-value over the One-tailed cases.
Proportion One-tailed tests are "symmetric".
Tip: if the p-value is like .9, check that you selected the appropriate "tail" above before failing to reject.

Exs.

p=.5   482 / 926 = p̂ ≈52%  right-tailed

Mendel peas(?)
p=.25     152 / 580 = p̂ ≈.262  

sleepwalking
p=.3   p̂=.292   n=19136    
With very large sample a very small difference between p̂ and claimed p can be "significant".

biometric security
p=.5     270 / 510 = p̂ ≈.5294118

malpractice
p=.5     856 / 1228 = p̂ ≈.697068

NB. p-hacking: great pressures (professional, monetary, publication bias, ideological) to have positive result.
So cheating and lying by:
stop data collection when p≤.05
discard data that prevents p≤.05
repeat the experiment until get p≤.05
test for different effects until find one with p≤.05

NB. Also possible to have:
H₀: φ≤a and H_A: φ>a
H₀: φ≥a and H_A: φ<a

NB. With very large sample a very small difference between p̂ and claimed p can be "significant".