The Pennsylvania State University, Spring 2021 Stat 415-001, Hyebin Song

Introduction

IntroductionIntroduction to Statistical InferenceLearning objectivesStatistics Some definitionsRelationship between probability and statisticsParametric and distribution-free modelsOverview of this course

Introduction to Statistical Inference

Learning objectives

Understand the relationship between probability and statistics
Understand basic terminology in statistics

Statistics

Statistics is a data-driven science that concerns the extraction of useful information from the observed data in a principled way, accounting for uncertainty in the observed data.

Example 1

Goal: understand an average height of all PSU Students (40K)

Procedure:

Ask 100 PSU students their heights and record the responses
Compute an average of 100 height values
Claim that this computed average should be close to the average height of all PSU Students

population and sample

Some definitions

Term	Definition
Population	the target of our inferential interest
Sample	a random fraction of the population (usually from n independent trials from a random experiment)
Parameter	$(\mu, \sigma^2, \lambda,\dots)$ .
Statistic	$(X_1,\dots,X_n)$ $f(X_1,\dots,X_n)$ which does not depend on any unknown parameters.
Estimator	$\theta$ $\widehat{\theta}$ $\widehat{\theta}(X_1,\dots,X_n)$ .
Estimate	a realization of an estimator. Usually denoted using lower-case letters.

Example 1 (Continued)

Population: collection of all heights from PSU students
$(X_1,\dots,X_{100})$ . A random vector of size 100
$\mu = \frac{1}{4K}(5'8''+6'+\dots)$ .
$\bar{X} = \frac{1}{100}(X_1+\dots+X_{100})$ .
$\bar{x}= \frac{1}{100}(5'8''+4'11''+\dots)$ .

Example 2

Goal: understand the probability of face "1" in a biased die

Procedure:

Toss the die 1000 times and record the outcomes.
Compute the average proportion of "1" among 1000 trials
Argue that the computed proportion is close to the true probability of outcome "1".

In this example,

population: probability of outcome 1,...,6 of the die.
$\theta$ = probability of outcome 1
$(X_1, \dots, X_{1000})$ $X_i$ : = ith toss outcome
$\theta$ $\widehat{\theta}(X_1,\dots,X_n) = \frac{1}{1000}\sum_{i=1}^{1000}1[X_i=1]$
$\theta$ $\widehat{\theta}(1,0,\dots,1)=\frac{1}{1000}(1+0+1+1 + \dots+1 )$

Relationship between probability and statistics

$P(H) = P(T) = 1/2$ $\leq 10$ among 20 tosses?
$X_i \sim \textrm{Ber}(0.5$ $(X_1,\dots,X_{20})$ i.i.d.
$P(\sum_{i=1}^{20}X_i\leq 10)$
$x_1=1, x_2= 0, \dots, x_{20}=1$ ), what is the true probability of head?
$(x_1,\dots,x_{20})$ $x_i \in\{0,1\}$ .
Infer a model:
- $x_i \in\{0,1\}$ $X_i \sim \textrm{Ber}(\theta)$ , i.i.d.
- $(x_1,\dots,x_{20})$ $\widehat{\theta}$ $X_i \sim \textrm{Ber}(\theta)$ $\theta \approx \widehat{\theta}$ .

Parametric and distribution-free models

Parametric models: assume that the distribution of each observation is known up to a parameter.
Example: $X_i \sim \textrm{Ber}(\theta)$ $X_i \sim N(\mu,\sigma^2)$ , ...
Distribution-free models: do not make an assumption on the distribution of a sample.

Remark 1. The problem of inferring a model reduces to inferring a parameter value.

Remark 2. In this class, we will mostly focus on parametric models.

Overview of this course

$(x_1,\dots,x_n)$ $x_i$ $P_\theta$ $\theta$ .

Topic	Goal
Point Estimation	$\theta$ $(x_1,\dots,x_n)$
Interval Estimation	$\theta$ $(x_1,\dots,x_n)$
Hypothesis Testing	$\theta$ $(x_1,\dots,x_n)$