The Pennsylvania State University, Spring 2021 Stat 415-001, Hyebin Song

Introduction

Go to course main page

IntroductionIntroduction to Statistical InferenceLearning objectivesStatistics Some definitionsRelationship between probability and statisticsParametric and distribution-free modelsOverview of this course

Introduction to Statistical Inference

Learning objectives

  1. Understand the relationship between probability and statistics
  2. Understand basic terminology in statistics

Statistics

Statistics is a data-driven science that concerns the extraction of useful information from the observed data in a principled way, accounting for uncertainty in the observed data.

Example 1

Goal: understand an average height of all PSU Students (40K)

Procedure:

  1. Ask 100 PSU students their heights and record the responses
  2. Compute an average of 100 height values
  3. Claim that this computed average should be close to the average height of all PSU Students

population and sample

Some definitions

TermDefinition
Populationthe target of our inferential interest
Samplea random fraction of the population (usually from n independent trials from a random experiment)
Parametera numerical summary of the population. Denoted by Greek letters .
Statistica numerical summary of the sample . A function of the sample which does not depend on any unknown parameters.
Estimatora statistic designed to infer a specific parameter . Denoted as or .
Estimatea realization of an estimator. Usually denoted using lower-case letters.

 

Example 1 (Continued)

  1. Population: collection of all heights from PSU students
  2. Sample: . A random vector of size 100
  3. Parameter of interest: .
  4. Estimator: .
  5. Estimate: .

 

Example 2

Goal: understand the probability of face "1" in a biased die

Procedure:

  1. Toss the die 1000 times and record the outcomes.
  2. Compute the average proportion of "1" among 1000 trials
  3. Argue that the computed proportion is close to the true probability of outcome "1".

 

In this example,

  1. population: probability of outcome 1,...,6 of the die.
  2. parameter of interest: = probability of outcome 1
  3. sample: a random vector of size 1000 where : = ith toss outcome
  4. estimator of : a sample proportion
  5. estimate of : a realized sample proportion

 

Relationship between probability and statistics

ProbabilityStatisticsGive model, predict dataGiven data, infer modelProbabilityStatistics
  1. Given , what is the probability that the number of heads among 20 tosses?

    Model: ). i.i.d.

    Predict data:

  2. Given (), what is the true probability of head?

    Data: where each .

    Infer a model:

    • Since each , assume , i.i.d.
    • Using , we make a guess about , and argue with .

 

Parametric and distribution-free models

Remark 1. The problem of inferring a model reduces to inferring a parameter value.

Remark 2. In this class, we will mostly focus on parametric models.

 

Overview of this course

Suppose we have an observed sample where each is assumed to be from a parametric distribution with an unknown parameter .

TopicGoal
Point EstimationProvide the best guess of based on the observed sample
Interval EstimationProvide an interval which is likely to include based on the observed sample
Hypothesis TestingMake a decision about a statement about the parameter based on the observed sample