Wed Mar 06 2024

False Discovery Rate (FDR) Control

FDR Introduction

For the relationship between the truth of hypotheses in reality (Null and Non-null) and the truth of test results (Reject and Not reject), we can use the following table to represent it:

	Null is True	Null is False
Reject Null	Type I Error	True Positive
Not Reject Null	True Negative	Type II Error

False Discovery — from this perspective — seems to be Type I Error. The difference is only in the denominator.

P_{I} = \frac{\#FP}{N} = \alpha \\ FDR = E\{\frac{\#FP}{\#FP + \#TP}\} = \frac{\alpha \times N}{\alpha \times N + \#TP} = \alpha \frac{1}{\alpha + \frac{\#TP}{N}}

However, consider the following situation, for multiple tests with \alpha = 0.05, among 100 features there is one that is truly significant, but in the remaining 99 features that are actually not significant, no more than 5% of the tests result in false positives.

FDR = E\{\frac{\#FP}{\#FP + \# TP }\} = \frac{99 \times 0.05}{1 + 99 \times 0.05} = 83.19\%

From this situation, we can discover — the confidence level in hypothesis testing cannot suppress the FDR, especially when the number of True Positives is actually very low. This situation is not uncommon in biostatistics and other high-dimensional data (sparsity?).

Benjamini-Hochberg Procedure (BH(q))

The most classic method to control FDR is the Benjamini-Hochberg Procedure. The basic idea of this method is: instead of setting a specific $\alpha$ (e.g., 0.05), a function related to i (maximum index) and q (fixed constant) is used as the threshold for p-values in multiple hypothesis testing, to control the overall false discovery rate. The specific operations are as follows:

Set a constant $q \in (0, 1)$ , for example, 0.1 (controversial, to be discussed later);
Sort the p-values of multiple hypothesis tests from low to high, denoted as $p_{(1)} \leq p_{(2)} \leq \cdots \leq p_{(N)}$ ;
For the maximum index $(i)_{max}$ $(i)_{ma x}$ where $p_{(i)} < \frac{i}{N}q$ $p_{(i)} < \frac{i}{N} q$ :
- If $i < (i)_{max}$ : Reject $H_0$
- Otherwise, accept $H_0$

It requires $p_i$ to be independent. In this case, the expected FDR will have an upper bound limited by q.

E\{FDR_{BH(q)}\} = \pi_0 q \leq q \\ (\pi_0 = \frac{N_0}{N})

Due to the similarity in definitions, FDR is often compared with FWER. The difference is that FWER is for all hypothesis tests, while FDR is for the rejected hypothesis tests. FWER tends to be more conservative as it attempts to control all types of errors, which may reduce the model's power ( $\frac{\#TP}{N_1}$ , the ability to discover true significance); the FDR method is more popular in dealing with high-dimensional data because it offers a better balance between controlling error rates and maintaining high statistical power.

The requirement for $p_i$ to be independent can be relaxed. When $p_i$ are only non-positively correlated, by introducing some correction terms, the Benjamini-Hochberg Procedure can still be effective.

l_i = \Sigma_{j=1}^{i} \frac{1}{j} \\ p_{(i)} \leq \frac{i}{N}q \times \frac{1}{l_i}

Proof of the Upper Bound of FDR Expectation under BH(q)

Algebra

R(t) = \#\{p_{(i)} \leq t\} \ (t \in (0, 1])

Define $a(t)$ as the number of tests for which $p_i < t$ but $H_0$ is actually true. Then:

Fdp(t) = \frac{a(t)}{R(t)} \ [=0 | R(t)=0] \\ Q(t) = \frac{Nt}{R(t)} \ [=0 | R(t)=0]

Let $t_{q} = sup_t\{Q(t) \leq q\}$ , then $Q(t_q) = q$

\because R(p_{(i)}) = i \\ \therefore Q(p_{(i)}) = \frac{Np_{(i)}}{i} \\ \therefore p_{(i)} \leq \frac{i}{N}q \iff Q(p_{(i)}) \leq q \iff p_{(i)} \leq t_{q}

Therefore, the rejection region for $H_{0(i)} is p_{(i)} \leq t_{q}$ , having:

a(t_{q}) = \#\{p_{(i)} \leq t_{q} | H_{0(i)}\ is\ true\}

Let $A(t) = \frac{a(t)}{t}$

\forall s \leq t, E\{A(s) | A(t)\} = A(t)

And $a(1) = N_0$ is known, thus:

E\{A(t_{q})\} = A(1) = \frac{a(1)}{1} = N_0

Therefore:

Fdp(t_q) = \frac{a(t_q)}{R(t_q)} = \frac{Q(t_q)}{N \cdot t_q} \cdot a(t_q) = \frac{q}{N} \cdot A(t_q) \\ E\{Fdp(t_q)\} = \frac{N_0}{N} q \leq q

Visual and Algebraic Integration

Noticing the image in the book, there seems to be an intuitive but not strictly rigorous proof that is hard to fault.

With a constant slope, we have:

slope = \frac{q}{N} = \frac{h}{i_{max}}

For independently distributed $p_i \sim U(0, 1)$ , including $N_0$ true negatives, the probability of each $p_i$ being judged positive because it's below h is h, so:

E(\#FP) = {N_0}\cdot h \\ \#TP + \#FP = i_{max} \\ \therefore E(Fdp) = \frac{E(\#FP)}{\#TP + \#FP} = \frac{N_0 \cdot h}{i_{max}} = \frac{N_0}{N}q

Questions

During the proof process, one can roughly feel why the BH method can control FDR, but there are still some questions about the details of its criteria:

Why find the largest index $(i)_{max}$ that satisfies the condition and reject all $H_{0(i)} for (i) < (i)_{max}$ , instead of rejecting $H_0$ for $(i)$ that satisfies $p_{(i)} < \frac{i}{N}q$ . The conclusions reached in the examples given in the book are consistent, but it is also easy to think of some counterexamples.
Regarding the details of choosing q. Since FDR is more open compared to FWER—able to control error rates while maintaining high statistical power, why is the selection of q-value even more lenient than the threshold for p-value?
The combination with empirical Bayes.

ref:

https://cpb-us-w2.wpmucdn.com/blog.nus.edu.sg/dist/0/3425/files/2018/10/Understanding-Benjamini-Hochberg-method-2ijolq0.pdf
Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing and Prediction