Topic A1/A2 — Introducing Statistics + Tables & Graphs
Table of contents
- Topic A1/A2 — Introducing Statistics + Tables & Graphs
- Topic A1 — Introducing Statistics
- Topic A2 — Tables & Graphs
Topic A1 — Introducing Statistics
Difference between descriptive and inferential statistics
Descriptive, well… describe a population or a sample. Tells about the features of the data. Examples of desciptive statistics?
- Climate change
- Road deaths
- [Econ example] Tax data
Inferential stats start from a sample and try to generalize to the population. It also tries to draw relationships between variable, testing hypotheses and even draw predictions.
Bottom line: we need representativeness in our samples
A good way to think about data types is to think in terms of two dimensions: time and individuals/subjects.
It’s a very simple model: time and subject can take either value ‘1’ or ‘many’. So a the start, you have 1 time point and 1 individual. But you can’t say much with that can you? That’s just anecdotal evidence. So let’s see what we get when we expand these dimensions.
A bunch of individuals, at one period in time
- [Class ex] Mortality rate
- Guiness consumption in this class in September
Like a snapshot: you observe 100ppl at time T
The key is that it only looks at one specific time point or (or period, summarized as a point).
Time series data
One individual over time. It’s like a tracker measuring a specific variable over time. Like your step counter on your phone or smartwatch, or you personal daily consumption of Guiness over the month.
It’s a tracker on a bunch of people. So it would be getting all of you guys’ step counter and appending them together to form one dataset: a panel data set. In terms of Guiness consumption, it’s your and your classmates daily consumption over the month, for example.
[Show examples of how it appears in a table, stat software]
Quantitative vs qualitative variables
Quantitative is basically stuff you can measure with numbers. Any examples?
- CO2 concentration in the atmosphere (continuous)
- Liters of Guiness (continuous)
- Pints of Guiness (0 to 2; more than 2 to 4; etc)
- Numbers of brothers and sisters (discrete, can’t slice it in smaller pieces)
- Elevators in a building (discrete)
Qualitative is describing a certain state or feature. Any examples?
- Computer is working / not working (state, nominal)
- Member of political party (nominal)
- Able to speak Irish (nominal)
- “Agree with this statement… (1) mostly not (2) not really (3) indifferent (4) a bit (5) mostly yes” (ordinal)
Coded in what is called a dummy variable.
Discrete vs continuous variables
Nominal vs ordinal variables
Interval vs Ratio
Interval when the interval between two values is meaningful.
Ratio has a clear definition of 0.0 of “it”: there is none (no quantity) when variable equals 0.
Temperature: C° or F° ar interval, Kelvins are ratio.
Topic A2 — Tables & Graphs
In this topic we divide quantitative data in classes. It can be arbitrary, or based on some classification.
It’s the absolute number of times this category appears in your data.
It’s the absolute times it appears divided by the total number of observations
When categories are ordered, you can “stack” the absolute frequencies (total goes to the total number of observations).
Cumulative relative frequency
Same as above but with relative frequencies (total 100%).
Visual representation of frequency distribution. Two main characteristics:
- Heights: frenquency
- Width: class width
Same, but linking the top of the bars together
Plots the cumulative relative frequency. The x-axis represents the upper limits of each classes.
It’s a spatial representation of two variables and shows their relationship together.
When hours of the day vary, does Guiness consumption vary? If you plot time as the x-axis and pints on the y-axis, can we draw the scatter plot?