Jan 12th, 2021: Concentration Bounds I Have Seen (and Forgotten )¶

I've seen many probability concentration bounds in random places but I always struggle to remember any of them. This a summary of the second chapter of High-Dimensional Statistics by Martin J. Wainwright. )

So the meaning of this post is twofold:

Review some probability inequalities;
Make it a convenient reference page for future.

Spoiler : Many sentences are simply blatant copy of the aforementioned book, which I claim zero credit for. All the proofs are mostly sketch of the key ideas. This is just a sincere attempt to appreciate other peoples'works

Some Main Takeaways:¶

We can control the tail probability $\mathbb P[X \geq t]$ of a random variable $X$ by controlling its moments. The higher order of the moments we can bound, the sharper bound we can get. For example, Markov's inequality only requires we control the first order of moment (finite mean: $\mathbb E [x] < \infty$).Chebyshev's inquality requires we control the second order of moment (finite variance).
Basically, Markov implies Chebyshev, moment, and the Chernoff bounds.
Chernoff bound is used to prove the Gaussian tail bound, the sub-Gaussian concentration inequality, sub-exponential tail bound.
Gaussian tail bound implies Hoeffding bound.
Bernstein condition implies sub-exponential properties.
Summarizing all of these took much more efforts than I would have expected, so I will create another post another summary of Martingale-based bounds (Section 2.2 of the book) in future!

1. Markov's Inequality¶

Statement: Given a non-negative random variable with finite mean, we have \begin{equation*} \mathbb P (X \geq t) \leq \frac{\mathbb E[X]}{t} \hspace{0.5cm} \text{for all $t >0$} \label{eq:markov} \tag{1} \end{equation*}

proof: write $X= X \mathbb 1_{x \geq t}+ X \mathbb 1_{x < t}$ and start from $\mathbb E[X]$

2. Chebyshev's Inequality and Moment Bound¶

Statement: Given a random variable with finite variance, we have \begin{equation*} \mathbb P (|X-\mu| \geq t) \leq \frac{\text{var}(X)}{t^2} \hspace{0.5cm} \text{for all $t >0$} \label{eq:cheby} \tag{2} \end{equation*}

Note this makes perfect sense: when variance is small, the variable is more likely to concentrate on its mean.

proof: define $y=(x-\mu)^2$ and then apply Markov
Extension: in general: \begin{equation*} \mathbb P (|X-\mu| \geq t) \leq \frac{\mathbb E[(X-\mu)^k]}{t^k} \hspace{0.5cm} \text{for all $t >0$} \label{eq:mb} \tag{3} \end{equation*}

3. Chernoff Bound¶

Statement: suppose X has a mgf in a neighborhood of $0$ ( $\exists b >0$ such that $\mathbb E[e^{\lambda (X-\mu)}]$ exists $\forall \lambda \leq |b|$ ), then we have \begin{equation*} \mathbb \log P ((X-\mu) \geq t) \leq \inf_{\lambda \in [0,b]} \{\log \mathbb E[e^{\lambda (X-\mu)}] - \lambda t \} \label{eq:cher} \tag{4} \end{equation*}

Proof: $\forall \lambda \in [0, b]$, we can apply Markov's inequality to $Y= e^{\lambda(X-\mu)}$: $$\mathbb P [(X-\mu) \geq t]= \mathbb P [e^{\lambda(X-\mu)} \geq e^{\lambda t}] \leq \frac{\mathbb E [e^{\lambda(X-\mu)}]}{e^{\lambda t}}$$ Finally we take log and inf and BAM!!!!!

4. Sub-Gaussian¶

4.1 Motivation: Gaussian Tail Bound¶

Statement: let $X \sim N(\mu, \sigma^2)$ be Gaussian, we have

\begin{equation*} \mathbb P [X \geq \mu + t] \leq e^{\frac{-t^2}{2 \sigma ^2}} \hspace{0.5cm} \text{for all $t >0$} \label{eq:gtb} \tag{5} \end{equation*}

Proof: the mgf of $X$ is $$\mathbb E[e^{\lambda X}]= e^{\mu \lambda +\frac{\sigma^2 \lambda^2}{2}}$$, in which $\lambda \in \mathbb R$. Now, we apply Chernoff:

$$\inf_{\lambda \geq 0} \{\log \mathbb E[e^{\lambda (X-\mu)}] - \lambda t\}= \inf_{\lambda \geq 0} \{\frac{\lambda^2 \sigma^2}{2} - \lambda t\}= \frac{-t^2}{2\sigma^2}$$

4.2 Definition and Sub-Gaussian Concentration Inequality¶

The Gausian tail bound motivates the definition of sub-Gaussian:

Definition: A random variable $X$ with mean $\mu=\mathbb E[X]$ is sub-Gaussian if there is a positive number $\sigma$ such that $$\mathbb E[e^{\lambda(X-\mu)}] \leq e^{\frac{\sigma^2 \lambda^2}{2}} \hspace{0.5cm} \text{for all $\lambda \in \mathbb R$}$$ $\sigma$ is referred as the sub-Gaussian paramter
Observation 1: $-X$ is sub-Gaussian iff $X$ is sub-Gaussian (simply by symmetry of definition, let $\lambda = - \lambda$)
Observation 2: any Gaussian variable with variance $\sigma^2$ is sub-Gaussian with parameter $\sigma$ (stare at the mgf of Gaussian variable really hard)
observation 3: if $X$ is sub-Gaussian by definition, we subsitiute the bound $\mathbb E[e^{\lambda(X-\mu)}] \leq e^{\frac{\sigma^2 \lambda^2}{2}}$ in the RHS of Chernoff and upper bound it by $\inf_{\lambda \geq 0} \{\frac{\lambda^2 \sigma^2}{2} - \lambda t\}= \frac{-t^2}{2\sigma^2}$ as shown in the proof in Section 4.1. Hence $X$ must satisfy the Gaussian tail bound.

Now, by observation 1 and 3, we can derive the concentration inequality for sub-Gaussian variable: \begin{equation*} \mathbb P [|X -\mu| \geq t] \leq 2e^{\frac{-t^2}{2 \sigma ^2}} \hspace{0.5cm} \text{for $t \in \mathbb R$} \label{eq:sgci} \tag{6} \end{equation*}

4.3 Examples¶

4.3.1 Rademacher variable $\epsilon$ takes the values $\{-1, 1\}$ equiprobably. It is sub-Gaussian with parameter $\sigma=1$.¶

\begin{align} \mathbb E [e^{\lambda \epsilon}] = \frac{1}{2} \{e^{-\lambda}+ e^{\lambda}\}&= \frac{1}{2} \{\sum_{k \geq 0} \frac{(-\lambda)^k}{k!} + \sum_{k \geq 0} \frac{(\lambda)^k}{k!} \} \\ & = \sum_{k \geq 0} \frac{\lambda ^{2k}}{(2k)!}\\ & \leq 1 + \sum_{k \geq 1} \frac{\lambda^{2k}}{2^{k} k!} \hspace{0.5cm} \text{since } (2k)! \geq 2^k k! \\ & = e^{\frac{\lambda^2}{2}} \end{align}

4.3.2 All bounded Random Variables are sub-Gaussian (interesting proof!)¶

Let $X$ be zero mean and supported on $[a, b]$. Recall Jenson's in equality: for convex function $\Phi$, and probability space $(\Omega, \mu)$, we have $\Phi (\int_{\Omega} f d \mu) \leq \int_{\Omega} (\Phi \circ f) d \mu$

Let $X'$ be an independent copy, for $\forall \lambda \in \mathbb R$.

\begin{align} \mathbb E_X[e^{\lambda X}] &= \mathbb E_{X} [e^{\lambda (X- \mathbb E_{X'}[X'])}] \\ & \geq \mathbb E_X \mathbb E_{X'} e^{\lambda (X-X')}\\ & = E_{X, X'} [e^{\lambda (X-X')}] \\ & = E_{X, X'} [\mathbb E_{\epsilon} [e^{\lambda \epsilon (X-X')}]]\\ & \leq E_{X, X'} [e^{\frac{\lambda ^2 (X-X')^2}{2}}] \\ & \leq e^{\frac{\lambda^2 (b-a)^2}{2}} \end{align}

Second line: we applied Jensen inside $\mathbb E_X$ where $X$ can be treated as a constant. To be more padantic here, $\Phi= e^{X}$ and $f= \lambda(X-X')$, which is a function of $X'$]
Fourth line: note $(X-X')$ has the same distribution as $\epsilon(X-X')$
Fifth line: apply the bound in 4.3.1, and note $\lambda (X-X')$ is a constant
Last line: $|X-X'| \leq b-a$

This shows $X$ must be sub-Gaussian with parameter at most $\frac{b-a}{2}$.

The idea of introducing $X'$ and Rademacher variable is called "Symmetrization Argument"

4.3.3 If $X_1$ and $X_2$ are independent sub-Gaussian with parameters $\sigma_1$ and $\sigma_2$, then $X_1+X_2$ is sub-Gaussian with parameter $\sqrt{\sigma_1^2+\sigma_2^2}$¶

4.4 Hoeffding Bound¶

Statement: Suppose that the variables $X_i$, $i= 1, \cdots, n$ are independent, and $X_i$ has mean $\mu_i$ and sub-Gaussian parameter $\sigma_i$. Then for all $t \geq 0$, we have

\begin{equation*} \mathbb P [\sum_{i=1}^{n}(X_i -\mu_i) \geq t] \leq \exp{\{\frac{-t^2}{2\sum_{i=1}^{n} \sigma_i^2}\}} \label{eq:hoeff} \tag{7} \end{equation*}

Proof: use 4.33 and the Gaussian tail bound in 4.1. The reason why we can apply Gaussian tail bound is due to observation 3 in Section 4.2.

4.5 Other 3 equivalent definitions of Sub-Gaussian (I will never remember)¶

For any zero-mean random variable $X$

$\exists c \geq 0$ and $Z \sim N(0, \tau^2)$ such that $$\mathbb P[|X|>s] \leq c \mathbb P [|Z| \geq s] \hspace{0.5cm} \text{for all $s \geq 0$}$$
$\exists \theta \geq 0$ such that $$\mathbb P[|X|^{2k}] \leq \frac{(2k)!}{2^{k}k!} \theta^{2k} \hspace{0.5cm} \text{for all $k = 1,2 , \cdots$}$$
$\exists \sigma \geq 0$ such that $$\mathbb E [e^{\frac{\lambda x^2}{2\sigma^2}}] \leq \frac{1}{\sqrt{(1-\lambda)}} \hspace{0.5cm} \text{for all $\lambda \in [0,1)$}$$

5. Sub-Exponential¶

Recall the definition of sub-Gaussian requires $\mathbb E[e^{\lambda(X-\mu)}] \leq e^{\frac{\sigma^2 \lambda^2}{2}}$ holds for all $\lambda$. We can of course make it more general

5.1 Definitions and Observations¶

A random variable $X$ with mean $\mu=\mathbb E[X]$ is sub-Gaussian if there are non-negative parameters $(v, \alpha)$, such that $$\mathbb E[e^{\lambda(X-\mu)}] \leq e^{\frac{v^2 \lambda^2}{2}} \hspace{0.5cm} \text{for all $|\lambda| < \frac{1}{\alpha}$}$$

Observation 1: all sub-Gaussian variable is sub-Exponential with $v=\sigma$ and $\alpha=0$
Observation 2: The definition is equivalent to requiring the existence of the moment generating function in a neighborhood of 0.
Observation 3: sub-exponential may not be sub-Gaussian: consider variables $X=Z^{2}$ (chi-squared), some calculus exercise will shown $X$ is not sub-Gaussian, but it is sub-Exponential with parameter $(v, \alpha)= (2,4)$
Observation 4: sub-exponential property is preserved under summation for independednt random variables. In particular, consider an independent sequence $\{X_k\}_{k=1}^{n}$ of random variables such that $X_k$ has mean $\mu_k$, and is sub-exponential with parameters $(v_k, \alpha_k)$, then the variable $\sum_{k=1}^{n} (X_k - \mu_k)$ is sub-exponential with parameters $(v_{*}. \alpha_{*})$, where $\alpha_{*} = \max_{k=1 \cdots, n} \alpha_k$ and $v_{*} = \sqrt {\sum_{k=1}^n v_k^2}$. The proof is straight forward, just bound $\mathbb E [e^{\lambda \sum_{k=1}^{n} (X_k - \mu_k)}]$ directly.

5.2 Sub-exponential Tail Bound¶

Statement: Suppose that $X$ is sub-exponential with parameters $(v, \alpha)$. Then \begin{equation*} \mathbb P [X -\mu \geq t] \leq e^{\frac{-t^2}{2 v^2}} \hspace{0.5cm} \text{for $0 \leq t \leq \frac{v^2}{\alpha}$} \label{eq:stb1} \tag{8} \end{equation*}

\begin{equation*} \mathbb P [X -\mu \geq t] \leq e^{\frac{-t}{2 \alpha}} \hspace{0.5cm} \text{for $t> \frac{v^2}{\alpha}$} \label{eq:stb2} \tag{9} \end{equation*}

Proof: we first apply the Chernoff bound and then the definition of sub-exponential: WLOG we may assume $\mu=0$, we have $$\mathbb P [X \geq t] \leq e^{- \lambda t} \mathbb E[e^{\lambda X}] \leq \exp(-\lambda t +\frac{\lambda^2 v^2}{2}) \hspace{0.5cm} \text{for $\lambda \in [0, \alpha^{-1})$}$$ Now this is just a matter of minimizing the RHS by trying different $\lambda$. Note the optimal $\lambda$ is $\frac{t}{v^2}$. Two situations: discuss whether this optimal $\lambda$ exists in $[0, \alpha^{-1})$. Since only simple calculus is needed, let's just trust the theory is true.

5.3 Motivation: Bernstein's Condition¶

One way to verify whether a random variable is sub-exponential is to compute and bound its mgf $\mathbb E [e^{\lambda (X-\mu)}]$ analytically. This may be infeasible in may cases. Another strategy is to bound its polynomial moment. So here comes the "Bernstein's condition": we say that Bernstein condition with parameter $b$ holds if $$|\mathbb E [(x-\mu)^k]| \leq \frac{1}{2} k!\sigma^2 b^{k-2} \hspace{0.5cm} \text{$k= 2, 3, 4$}$$

Observation 1: bounded random variable fullfills the Bernstein's condition, since we can let $|X-\mu| \leq b$. (However, unbounded variable may still fullfill the Bernstei's condition)
Observation 2: if $X$ satisfies the Bernstein condition, then it is sub-exponential with parameters $\sqrt{2 \sigma}$ and $2b$. To see this, observe

\begin{align} \mathbb E_[e^{\lambda (X-\mu)}] &= 1+\frac{\lambda^2 \sigma^2}{2} +\sum_{k \geq 3} \lambda^{k} \frac{\mathbb E[(X-\mu)^{k}]}{k!}\\ & \leq 1+\frac{\lambda^2 \sigma^2}{2}+ \frac{\lambda^2 \sigma^2}{2} (\sum_{k>3} (|\lambda|b)^{k-2}) \end{align}

Note for $|\lambda| < \frac{1}{b}$, we can sum the geometric series and obtain $$\mathbb E [e^{\lambda(x-\mu)}] \leq 1+\frac{\lambda^2 \sigma^2 /2}{1-b|\lambda|} \leq e^{\frac{\lambda^2 \sigma^2 /2}{1-b|\lambda|}} \hspace{0.5cm} \text{since $1+x \leq e^{x}$}$$ which implies $$\mathbb E [e^{\lambda(x-\mu)}] \leq e^{\frac{\lambda^2 (\sqrt{2\sigma})^2}{2}} \hspace{0.5cm} \text{for $|\lambda|<\frac{1}{2b}$}$$

5.4 Bernstein-type Bound¶

Based on 5.3, we can state the Bernstein-type formally:

Statement: for any random variable satisfying the Berinstein Condition , we have \begin{equation*} \mathbb E [e^{\lambda (X -\mu)}] \leq e^{\frac{\lambda^2 \sigma^2/2}{1- b |\lambda|}} \hspace{0.5cm} \text{for $|\lambda| < \frac{1}{b}$} \label{eq:btb} \tag{10} \end{equation*}

and more over the concentration inequality \begin{equation*} \mathbb P [|X -\mu| \geq t] \leq 2e^{\frac{-t^2}{2 (\sigma^2+bt)}} \hspace{0.5cm} \text{for $t \geq 0$} \label{eq:btc} \tag{11} \end{equation*}

Proof: the first inequality directly comes from Section 5.3. The second inequality can be deduced from the Chernoff bound, in which we let $\lambda= \frac{t}{bt+\sigma^2} \in [0, \frac{1}{b})$

Remark: Bernstein-type bound is often superior than the Chernoff bound when controling bounded random variable with ($|X-\mu| \leq b$). The most straightforward way to control such variable is to argue $(X-\mu)$ is sub-Gaussian with param $b$. Now observe when $t$ is small in the inequality above, the variable $X$ behaves like sug-Gaussian with parameter $b$. Since $\sigma^2= \mathbb E (X-\mu)^2 \leq b^2$ (as X is bounded), Bernstein-type bound is always better!

Remark 2: by observation 4 from section 5.1, we have $$\mathbb P [\frac{1}{n} \sum_{i=1}^{n} (X_k -\mu_k) \geq t] \leq e^{\frac{-nt^2}{2v_{*}^2/n}} \hspace{0.5cm} 0 \leq t \leq \frac{v_{*}^2}{n\alpha_{*}}$$ $$\mathbb P [\frac{1}{n} \sum_{i=1}^{n} (X_k -\mu_k) \geq t] \leq e^{\frac{-nt}{2 \alpha_{*}}} \hspace{0.5cm} t > \frac{v_{*}^2}{n\alpha_{*}}$$

5.5 One-sided Bernstein's inequality¶

Statement: if $X \leq b$ almost surely, then \begin{equation*} \mathbb E [e^{\lambda (X -\mathbb E[X])}] \leq \exp(\frac{\frac{\lambda^2}{2} \mathbb E [X^2]}{1-\frac{b \lambda}{3}}) \hspace{0.5cm} \text{for $\lambda \in [0 , \frac{3}{b})$} \label{eq:obi1} \tag{12} \end{equation*} Consequently, given $n$ independent variables such that $X_i \leq b$ almost surely, we have \begin{equation*} \mathbb P [\sum_{i=1}^{n} (X_i -\mathbb E[X_i]) \geq n \delta] \leq \exp(-\frac{n\delta^2}{2 (\frac{1}{n} \sum_{i=1}^{n} \mathbb E[X_i^2] + \frac{b \delta}{3})}) \label{eq:obil2} \tag{13} \end{equation*}
Remark 1: obviously if a random variable is bounded below, we can use the bound above to derive bounds on its lower tail (just substitute $X$ to $-X$)
Remark 2: Prof. Wainwright mentioned that the proof of this is very straightforward, which made me wonder what straightforward really meant.
Proof: Define $h(u) = 2 \frac{2^{u} - u -1}{u^2} = 2\sum_{k=2}^{\infty} \frac{u^{k-2}}{k!}$ (seriously any proof starting by defining some random function shouldn't be called straightforward). We have $$\mathbb E [e^{\lambda x}] = 1+\lambda \mathbb E[X] + \frac{1}{2} \lambda^2 \mathbb E [X^{2} h(\lambda X)]$$ (ohh now I kind of getting where function $h$ comes from). Note for all scalars $x<0$, $x' \in [0, b]$ and $\lambda >0$, we have $$h(\lambda x) \leq h(0) \leq h(\lambda x') \leq h(\lambda b)$$ Since $X < b$ almost surely, we also have $\mathbb E[X^{2} h(\lambda X)] \leq \mathbb E[X^2] h(\lambda b)$. It follows that $$\mathbb E [e^{\lambda(X- \mathbb E[X])}] \leq e^{-\lambda \mathbb E [X]}(1+\lambda \mathbb E[X] + \frac{1}{2} \lambda^2 \mathbb E[X^2] h(\lambda b)) \leq \exp(\frac{\lambda^2 \mathbb E[X^2]}{2} h(\lambda b))$$ The first inequality follows from the fact that $X$ is bounded by $b$ and $e^{\lambda \mathbb E [X]}$ is simply a constant; the second inequality follows from the inequality $x +1 \leq e^{x} $ so that the expression in the bracket is upper bounded by $\exp(\lambda \mathbb E[X] + \frac{1}{2} \lambda^2 \mathbb E[X^2] h(\lambda b))$. To prove the first inequality, it remains to show that $h(\lambda b) \leq (1-\frac{\lambda b}{3})^{-1}$ for $\lambda b <3$. Note the inequality $k! \geq 2(3^{k-2})$ holds for $k \geq 2$, so we have $$h(\lambda b)= 2\sum_{k=2}^{\infty} \frac{(\lambda b)^{k-2}}{k!} \leq \sum_{k=2}^{\infty} (\frac{\lambda b}{3})^{k-2}= \frac{1}{1-\lambda b/3}$$ where the condition $\frac{\lambda b}{3} \in [0,1]$ guarantees the convergence of geometric series. To prove the second bound, we apply the Chernoff bound and the inequality above and obtain $$\mathbb P [\sum_{i=1}^{n} (X_i - \mathbb E[X_i]) \geq n\delta] \leq \exp(-\lambda n \delta + \frac{\frac{\lambda^2}{2}\sum_{i=1}^{n} \mathbb E [X_i^2]}{1-\frac{b \lambda}{3}}) \hspace{0.5cm} \text{valid for $b \lambda \in [0,3)$}$$ Substituting $$\lambda = \frac{n \delta}{ \sum_{i=1}^{n} \mathbb E [X_i^2] +\frac{n\delta b}{3}} \in [0, \frac{3}{b})$$ will yield the desired bound.

5.6 Application: Johnson-Lindenstrauss embedding¶

Motivation: suppose we want to do dimension reduction from dimention $d$ to dimension $m$. Meanwhile, we hope the mapping can preserve the pairwise distance, what mapping option do we have?
statement: let $X \in \mathbb R^{m \times d}$ be a random matrix filled with independent $N(0,1)$ entries. Consider the linear mapping $F: \mathbb R^{d} \longrightarrow \mathbb R^{m}$ via $u \longrightarrow Xu/\sqrt{m}$. Given some tolerance $\delta \in (0,1)$, we have $$(1-\delta) \leq \frac{\|F(u^{i})- F(u^{j})\|^2}{\|u^{i}-u^{k}\|^2} \leq (1+\delta) \hspace{0.5cm} \text{for all pairs $u^{i} \neq u^{j}$}$$ holds with high probability.
Proof: let $x_i \in \mathbb R^{d}$ denote the ith row of the matrix, and consider some finxed vector $u \neq 0$. Since $x_i$ is standard normal vector, we know the variable $<x_i, \frac{u}{\|u\|}>$ follows $N(0,1)$ distribution, and hence the quantity$$Y:=\frac{\|Xu\|^2}{\|u\|^2}=\sum_{i=1}^{m} <x_i, u/\|u\|>^2$$ is a chi-squared variable with $m$ degrees of freedom. Now by observation 3 from section 5.1, $Z^2$ is sub-exponential with parameters $(2,4)$. Then by observation 4 from the same section, we know $Y$ is a sub-exponential with parameters $(2\sqrt{n}, 4)$. Then by remark 2 from the last section, for any chi-squared with $n$ degree of freedom, we hava the tail bound $$\mathbb P[|\frac{1}{n} \sum_{k=1}^{n} Z_k^2 -1| \geq t] \leq 2e^{-nt^2/8} \hspace{0.5cm} \text{for all $t \in (0,1)$}$$Apply this bound to $Y$ yields $$\mathbb P[|\frac{\|Xu\|^2}{m\|u\|^2}-1| \geq \delta] \leq 2e^{-m\delta^2/8} \hspace{0.5cm} \text{for all $\delta \in (0,1)$}$$ Recall the definition of $F$, the inequality above is equivalent to $$\mathbb P[\frac{\|F(u)\|^2}{\|u\|^2} \not \in [(1-\delta),(1+\delta)]] \leq 2e^{-m \delta^2 /8} \hspace{0.5cm} \text{for any fixed $0 \neq u \in \mathbb R^{d}$}$$ Note we have $N \choose 2$ distinct pairs of data, by the union bound, we have $$\mathbb P[\frac{\|F(u^{i}- u^{j})\|^2}{\|u^{i}-u^{j}\|^2} \not \in [(1-\delta),(1+\delta)] \text{ for $u^{i} \neq u^{j}$}] \leq 2 e^{-m \delta^2 /8} \text{$N\choose 2$} $$ Observe for any $\epsilon \in (0,1)$, if we let $m$ sufficiently large, the probability above can be bounded below by $\epsilon$. Specifically, some algebra would show $m >\frac{16}{\delta^2} \log (N/\epsilon)$.

5.7 Other 3 definitions of sub-exponential¶

For a zero mean random variable $X$, the following are equivalent definitions to the one in Section 5.1:

there is a positive number $c_0 >0$ such that $\mathbb E [e^{\lambda X}] < \infty$ for all $|\lambda| \leq c_0$
there are constants $c_1, c_2 >0$ such that $$\mathbb P [|X| \geq t] \leq c_1 e^{-c_2 t} \hspace{0.5cm} \text{for all $t>0$}$$
the quantity $\sum_{k \geq 2} [\frac{\mathbb E [x^{k}]}{k!}]^{\frac{1}{k}}$ is finite.