3.2RandomVariables
[TOC]
Random variable
Definition of a random variable
A random variable is a function $\omega$(i.e., an event in the sample space $\Omega$) that returns a number x.
随机变量既可以被看作是一个变量,也可以被看作是一个函数。在概率论中,我们通常将随机变量定义为一个映射,将样本空间中的事件映射到实数轴上的值。因此,在这个定义下,随机变量可以被视为一个函数。例如,在问题中,随机变量 $\omega$ 被定义为将样本空间 $\Omega$ 中的事件映射到数值 x 上的函数。然而,在实际应用中,我们可以将随机变量看作是一个变量,它可以随机地取不同的数值。
-
E.g., let X be the random variable defined by the roll of a fair die and denote x as the result of a single roll. The probability that the random variable is equal to five can be expressed as:
P(X=x) when x=5 or P(X=5)
Two classes of random variables
● Discrete random variable
-
Assigns a probability to a distinct set of values, which can be either finite or contain a countably infinite set of values.
-
When the set of values is infinite, the set must be countable.
● Continuous random variable
- Produces values from an uncountable set.
无限可分的样本空间是不可数的。
Discrete random variables
Probability mass function (PMF)【概率质量函数】
- The function returns the probability that a random variables take a certain value.
$$
f_X(x)=\mathrm{P}(\mathrm{X}=x)
$$ - The value returned from PMF must be non-negative.
- The sum of across all values in the support of a random variable must be one.
Cumulative distribution function (CDF)
-
Measures the total probability of observing a value less than or equal to the input ${x}$.
-
$F_X(x)={P}(X \leq x)$
-
$F(X)$ is a non-decreasing function such that if $x_2>x_1$, then $F(x_2) \geq F(x_1)$.
-
$P(X>k)=1-F(k)$
-
$P\left(x_1<X \leq x_2\right)=F\left(x_2\right)-F\left(x_1\right)$
The relationship between PMF and CDF
-
Can always be expressed as the sum of PMF for all values in support that are less than or equal to $x: F_ {X}(x)=\sum_ {t \leq x} {f}_{X}(t)$
Suppose $X$ is a random variable defined by the roll of a fair die and $x$ is the result of a single roll. Please express the probability mass function and Cumulative distribution function for $X$.
Correct Answer:
- The PMF of $X$ can be equivalently expressed using a list of values
- The counterparty of PMF is the cumulative distribution function
Expectations and moments【矩】
Mathematical expectation of random variable
- The weighted average mean of the random variable (denoted as E[X]) is defined as:
$$
E[X]=\sum P\left(\mathrm{X}=x_i\right) x_i
$$
Expectation operator
- The expectation operator $E[]$ is computing a weighted average of its possible values.
- Linear properties of expected value
- If $b$ is a constant, $E[b]=b, E[E[X]]=E[X]$.
- If $a$ is a constant, $E[a X]=a E[X]$.
- If $a, b$ and $c$ are constants, then $E[a X+b Y+c]=a E[X]+b E[Y]+c$.
Moment: As stated previously, moments are a set of commonly used descriptive measures that characterize important features of random variables.
-
Two types of moments
- Central moment: $\mu_K=E\left[(X-E[X])^K\right],(k \geq 2)$
- The second central moment is defined as the variance of $X$, or $\operatorname{Var}[X]$
- Central moment: $\mu_K=E\left[(X-E[X])^K\right],(k \geq 2)$
-
Non-central moment: $\mu_k^{N C}=E\left[X^K\right],(k \geq 1)$
- The first moment is defined as the expected value of $X$, or $E[X]$
-
Relationships between Central moment and Non-central moment
$$
E\left[(X-E[X])^2\right]=E\left[X^2\right]-E[X]^2=\mu_2^{N C}-\left(\mu_1^{N C}\right)^2
$$
降矩公式。
The Four Named Moment.
- The first moment is the mean: $\mu(X)=E[X]$
衡量数据的中心趋势。
-
The second central moment is the variance: $\sigma^2(X)=E\left[(X-\mu)^2\right]$
$ \sigma^2(a X)=E\left[(a X-a \mu)^2\right]=a^2 \sigma^2(X)$-
The standard deviation is denoted by $\sigma$ and is defined as the square root of the variance (i.e., $\sqrt{\sigma^2}$ ).
-
more natural measure of dispersion【离散】
-
directly comparable to the mean(same unit)
衡量数据的离散程度。数据越集中在均值左右部分,集中程度越高。
-
-
-
The third moment is the skewness【偏度】:
$$
\operatorname{skew}(X)=\frac{E\left[(X-\mu)^3\right]}{\sigma^3}=E\left[\left(\frac{X-\mu}{\sigma}\right)^3\right]
$$
三阶中心矩/σ三次方,衡量一组数据是否是对称的。那边尾巴长,就往哪边偏。
- The fourth moment is the kurtosis【峰度】:
$$
\operatorname{kurtosis}(X)=\frac{E\left[(X-\mu)^4\right]}{\sigma^4}=E\left[\left(\frac{X-\mu}{\sigma}\right)^4\right]
$$
四阶中心矩/σ四次方,衡量数据尾巴的薄厚。尾巴的薄厚表示了极端值出现的可能性。
峰度越大,尾巴越厚。在方差一致的情况下,才能比较峰度。
The effect of changes in four named moments
Standardization【标准化】 of random variable: When $\mathrm{X}$ has mean $\mu$ and variance $\sigma^2$, a standardized version of $X$ can be constructed as
$$
\frac{X-\mu}{\sigma}
$$
- This variable has mean 0 and unit variance (and standard deviation)
$$
\begin{aligned}
& E\left[\frac{X-\mu}{\sigma}\right]=0 \
& V\left[\frac{X-\mu}{\sigma}\right]=1
\end{aligned}
$$
Continuous random variable
Probability density function (PDF)【概率密度函数】 is used instead of PMF
-
PMF can not be used in continuous random variable because $P(X=x)=0$ even though $x$ can occur.
-
The PDF $f_X(x)$ returns a non-negative value for any input in the support of $X$, it is used to find interval probability.
-
$P\left(x_1<X<x_2\right)=\int_{x_1}^{x_2} f_X(x) d x$.
-
The total area under the curve $f(x)$ is 1 .
-
$P\left(x_1<X<x_2\right)$ is the area under the curve between $x_1$ and $x_2$.
-
$P\left(x_1 \leq X \leq x_2\right)=P\left(x_1<X \leq x_2\right)=P\left(x_1<X<x_2\right)$
-
-
The CDF $F_X(x)$ of a continuous random variable is identical to that of a discrete random variable.
$$
F_X(x)=\int_{-\infty}^x f_X(\mathrm{z}) d z
$$
Quantiles and modes
Inverse Cumulative Distribution Function (CDF)
- If we want to know what’s the prob. that a random variable is less than 3, we can simply calculate $F(3)$;
- If we want to know what’s the corresponding random variable that indicates a cumulated prob. of $F(x)$, we can simply calculate $F^{-1}(x)$, which is called inverse cumulative distribution function.
如果我们想要知道表示累积概率F(x)的相应随机变量,我们可以简单地计算 $F^{-1}(x)$,这被称为累积分布函数的逆。
Example:
- The CDF is characterized as follow, find the value of a such that 25% of the distribution is less than or equal to $x$.
$$
F(x)=\frac{x^2}{100} \text { s. } t .0 \leq x \leq 10
$$ - Correct Answer: $F^{-1}(x)=10 \sqrt{x}$, if $x=0.25$, then $F^{-1}(x)=5$
Quantiles can be used to construct an alternative set of descriptive measures of a random variable.
- For a continuous or discrete random variable $X$, the $\alpha$-quantile $X$ is the smallest number q such that $\operatorname{Pr}(X<q)=\alpha$.
- The way to calculate the $\alpha$-quantile is the same as finding the inverse cumulative distribution function.
$$
Q_X(\alpha)=F_X^{-1}(\alpha)
$$
Example:
- The CDF is characterized as follow, find the 25%-quantile of random variable, $X$.
$$
F(x)=\frac{x^2}{100} \text { s. } t .0 \leq x \leq 10
$$ - Correct Answer: $\mathrm{Q}(0.25)=F^{-1}(0.25)=5$
Mean, Median and Mode【众数】
-
Mean: the average value of random variable $X$ and is also referred to as the location of distribution.
- Very sensitive to large outliers.
-
Median: the middle number【针对离散值】 or 50%-quantile【针对连续型随机变量】 of an random variable
中位数可能受到极端值影响,也可能不会。
- Mode: the random variables that occur most frequently
- random variables may have one or more modes.
极端值对频次高的数据没有影响。
右偏时,均值>中位数>众数;反之亦然。这三个数都是一阶矩。可以通过这三个数来判断三阶矩。
当对称时,均值=中位数=众数。