# Topic A1/A2 — Introducing Statistics + Tables & Graphs

## Table of contents

## Topic A1 — Introducing Statistics

### Difference between descriptive and inferential statistics

**Descriptive**, well… describe a population or a sample. Tells about the features of the data. Examples of desciptive statistics?

- Climate change
- Road deaths
- [Econ example] Tax data

**Inferential** stats start from a sample and try to generalize to the population. It also tries to draw relationships between variable, testing hypotheses and even draw predictions.

Bottom line: we need representativeness in our samples

### Data types

A good way to think about data types is to think in terms of two dimensions: time and individuals/subjects.

It’s a very simple model: time and subject can take either value ‘1’ or ‘many’. So a the start, you have 1 time point and 1 individual. But you can’t say much with that can you? That’s just anecdotal evidence. So let’s see what we get when we expand these dimensions.

### Cross-sectional data

A bunch of individuals, at one period in time

- [Class ex] Mortality rate
- Guiness consumption in this class in September

Like a snapshot: you observe 100ppl at time T

The key is that it only looks at one specific time point or (or period, summarized as a point).

### Time series data

One individual over time. It’s like a tracker measuring a specific variable over time. Like your step counter on your phone or smartwatch, or you personal daily consumption of Guiness over the month.

### Panel data

It’s a tracker on a bunch of people. So it would be getting all of you guys’ step counter and appending them together to form one dataset: a panel data set. In terms of Guiness consumption, it’s your and your classmates daily consumption over the month, for example.

[Show examples of how it appears in a table, stat software]

### Quantitative vs qualitative variables

**Quantitative** is basically stuff you can measure with numbers. Any examples?

- CO2 concentration in the atmosphere (continuous)
- Liters of Guiness (continuous)
- Pints of Guiness (0 to 2; more than 2 to 4; etc)
- Numbers of brothers and sisters (discrete, can’t slice it in smaller pieces)
- Elevators in a building (discrete)

**Qualitative** is describing a certain state or feature. Any examples?

- Computer is working / not working (state, nominal)
- Member of political party (nominal)
- Able to speak Irish (nominal)
- “Agree with this statement… (1) mostly not (2) not really (3) indifferent (4) a bit (5) mostly yes” (ordinal)

Coded in what is called a dummy variable.

### Discrete vs continuous variables

[Done above]

### Nominal vs ordinal variables

[Done above]

### Interval vs Ratio

**Interval** when the interval between two values is meaningful.

**Ratio** has a clear definition of 0.0 of “it”: there is none (no quantity) when variable equals 0.

Temperature: C° or F° ar *interval*, Kelvins are *ratio*.

## Topic A2 — Tables & Graphs

In this topic we divide quantitative data in classes. It can be arbitrary, or based on some classification.

### Absolute frequency

It’s the absolute number of times this category appears in your data.

### Relative frequency

It’s the absolute times it appears divided by the total number of observations

### Cumulative frequency

When categories are ordered, you can “stack” the absolute frequencies (total goes to the total number of observations).

### Cumulative relative frequency

Same as above but with relative frequencies (total 100%).

### Histogram

Visual representation of frequency distribution. Two main characteristics:

- Heights: frenquency
- Width: class width

### Polygon

Same, but linking the top of the bars together

### Ogive

Plots the cumulative relative frequency. The x-axis represents the upper limits of each classes.

### Scatter plots

It’s a spatial representation of two variables and shows their relationship together.

When hours of the day vary, does Guiness consumption vary? If you plot time as the x-axis and pints on the y-axis, can we draw the scatter plot?