from the Gelman's Monkey Post Thread
Just took a look at the data. Kind of weird.... Check out this crosstab.
. insheet using science-data.txt, comma clear
. reshape wide therm_level ssm_level, i(panelist_id) j(wave)
. tab ssm_level1 ssm_level2
1 | 2 ssm_level
ssm_level | 1 2 3 4 5 | Total
-----------+-------------------------------------------------------+----------
1 | 2,754 623 25 2 3 | 3,407
2 | 182 779 208 11 1 | 1,181
3 | 4 148 657 185 11 | 1,005
4 | 0 13 226 1,044 307 | 1,590
5 | 0 0 24 526 2,864 | 3,414
-----------+-------------------------------------------------------+----------
Total | 2,940 1,563 1,140 1,768 3,186 | 10,597
. corr ssm_level1 ssm_level2
(obs=10597)
| ssm_le~1 ssm_le~2
-------------+------------------
ssm_level1 | 1.0000
ssm_level2 | 0.9519 1.0000
That amount of intertemporal stability is not believable to me. Not one of the 2,940 people who answered at 1 at time 2 answered at 4 or 5 at time 1? Is that consistent with other online panels?
Same thing in the feeling thermometer...
. destring therm_level*, replace force
. corr therm_level1 therm_level2
(obs=10597)
| therm_~1 therm_~2
-------------+------------------
therm_level1 | 1.0000
therm_level2 | 0.9882 1.0000
.988? That's nuts. Even if the measurements were a week apart.
Don't know how to reproduce this on PSR but check out this plot too on your own computers...
. scatter therm_level1 therm_level2
The noise is weirdly normally distributed, with *NOT ONE* respondent out of 10597 deviating by more than 25 in a negative direction or 38 in a positive direction. That complete lack of survey error in a sample so large is bizarre. All it takes is one person to misinterpret the feeling thermometer, think it's for a different question, etc.
Here's ANES 2012 vs. 2013 just as a rough benchmark, realizing that the time difference is way more and mode is different etc.:
. corr gayft2012 gayft2013
(obs=1492)
| gay~2012 gay~2013
-------------+------------------
gayft2012 | 1.0000
gayft2013 | 0.7106 1.0000
The amount of instability is *24x* lower in this study than in the ANES. Not saying that couldn't happen, but it is weird. Anecdotally, I've seen party ID within survey have a much lower correlation.
It's consistent with ML having a fixed sample size and a fixed believable effect size and having to work backward to what would be statistically significant (if there was less stability, the study would be less well-powered).
DEFINITELY not saying this nails anything, just that digging deeper into this data probably should be done...