Code

The page is in draft state, but you might want to read it as a survival guide to the course, in addition to the Stata Guide.

COURSE SESSIONS

All course sessions but the first are organised around a single case study that makes use of one of the teaching datasets:

Session(s)	Data	Substantive topic
1	---	Numbers and Development (introductory session)
2, 4, 5	NHIS	Adult Obesity in the USA
3	WVS	Support for Sharia Law in Muslim Countries
6	ESS	Support for Torture in Israel
7, 8, 9	QOG	Fertility and Education
10	ESS	Attitudes Towards Migration in Europe
11	ESS	Satisfaction with Health Services in Britain and France
12	GSS	Numbers of Sexual Partners in the USA

Each session also introduces its own Stata learning steps.

Part 1. Exploratory analysis

Covered: dataset exploration and basic manipulation; descriptive statistics.
Readings: Urdan ch. 1 (basics), ch. 2-3 (central tendency and dispersion), ch. 4-5 (distributions).
See also: F & T, ch. 2.1–2.4 (variables), ch. 2.5–2.6 (distributions).

Sessions	Methods covered	Essential Stata commands
1	Course Setup	Folder navigation: `cd`, `ls`, `mkdir`
		File navigation: `browse`, `doedit`
		Command execution: `do`, `run`, `log`
		Command installation: `adopath`, `ssc install`
2	Data Exploration	Data loading: `use`, `clear`, `svyset`
		Data description: `lookfor` `d`, `codebook`
		Missing values: `count`, `li`, `in`, `misstable`
		Subsetting: `keep`, `drop`, `if`, `mi()`
3	Variable Recoding	Summarizing: `su`, `fre`, `tab`, `tabstat`, `bys`
		Tabulating: `tab, gen()`, `tab, sum()`, `table`
		Recoding: `gen`, `clonevar`, `ren`, `replace`
		Encoding: `la`, `recode`, `irecode`, `decode`, `encode`
4	Distributions	Density: `hist`, `kdensity`, `gr hbox`
		Normality: `symplot`, `pnorm`, `qnorm`
		Transformations: `gladder`
		Confidence intervals: `ci`, `prop`, `serrbar`

Assignment No.1: explore the course datasets, looking for samples and variables of interest. Write a do-file to describe your variables of interest, after recoding them if necessary. Export a summary statistics table, and write a short research design paper to present your dataset, sample and variables of interest, with attention to the distribution of the dependent variable.

Constraints: your sample must be cross-sectional, and unless you are willing to explore log-odds and logistic regression for binary dependent variables towards the end of the course, your dependent variable must be normally distributed to a reasonable extent.

Part 2. Association tests

Covered: inference, significance tests, ordinary least squares.
Readings: Urdan, ch. 6-7 (confidence intervals), ch. 9 and 14 (association tests), ch. 13 (least squares).
See also: F & T, ch. 5 (inference), ch. 6-7 (bivariate statistics), ch. 4 (least squares).

Sessions	Methods covered	Essential Stata commands
5	Confidence Intervals	Commands from previous teaching bloc
		Summary statistics tables: `ttest`
6	Association Tests	Comparing means: `ttest`, `prtest`, `gr dot`, `gr bar`
		Proportions: `chi2`, `tabchi`, `V`, `spineplot`
7	Correlation	Correlation coefficients: `corr`, `pwcorr`, `mkcorr`
		Correlation matrixes: `sc`, `gr mat`
8	Least Squares	Least squares: `reg`, `i.`, `predict`, `rvfplot`
		Visualization: `lfit`, `lfitci`, `lowess`

Assignment No.2: using the existing research literature on your topic, hypothesize about how your variables of interest might non-trivially relate to one another. Explore these relationships by running significance tests of substantively significant associations. Document both the hypotheses and the results of these association in your research design paper.

Constraints: choose your association tests in relation in relation to the structure of your variables and to theoretical insights from existing research on your topic. If your dependent variable is binary, pay special attention to the computation of odds ratios.

Part 3. Regression models

Covered: linear and logistic additive models.
Readings: F & T, ch. 9.1-9.2, 11 (linear regression), ch. 12.1-12.3, 13.1-13.3 (logistic regression).

Sessions	Methods covered	Essential Stata commands
9	Multiple Linear Regression	Commands from previous teaching bloc
		Regression tables: `leanout`, `estout`
10	Logistic Regression	Odds ratios: `tabodds`
		Specification: `logit`, `ologit`, `linktest`
11	Regression diagnostics	Residuals: `rsta`, `rvpplot`, `avplot`
		Specification: `b`, `vif`, `c.`, `#`
12	Marginal effects	Marginal effects: `margins`, `marginsplot`
		Extensions: `vce(cluster)`, `bootstrap`, and some demos

Assignment No.3: run a series of linear (and/or logistic) regressions suggested by your previous explorations of the data. Report on the results in your research design paper by describing the marginal effects and residuals of your regression models, and by including a regression results table. Finalize your text as an empirical research paper.

Constraints: your paper should come as close as possible to publication level. When you are done, and only then, will you be done with your quest. Remove your helmet, reflect on the components of the Prophecy, and marvel as you ascend into your planar form.

DO-FILES

The typical structure for the course do-files contains the following elements:

PROLOGUE
- header
- setup
DATA DESCRIPTION
- use
- Dependent variable
- Breakdowns (if any)
- INDEPENDENT VARIABLES
- FINALIZING (counting missing values with misstable and subsetting)
- Normality
- Export summary statistics
ASSOCIATION TESTS
- (#) DV-IV relationships
- (#) IV-IV relationships
- Covariates
REGRESSION MODELS
- (#) Models with DV
- Models with covariates
- Export regression results
END

The do-file are usually within 300-500 lines each, with a slightly longer introduction in Week 1. The do-files of the last teaching bloc show what students are expected to submit as their last assignment.

`week1.do`

Comments
Practice
Interface
WARM-UP EXERCISE
COMMANDS
- Tip (1): Get to learn some syntax
- Tip (2): Run all lines in sequential order
- Tip (3): Keyboard shortcuts for Mac / Win
- Tip (4): Command navigation
- Tip (5): Run multiple lines together
SETUP
- (1) Memory
- (2) Screen breaks
- (3) Additional commands
- (4) Working directory
- (5) Log
DATASETS
- (1) List datasets
- (2) European Social Survey Round 4, 2008
- (3) Quality of Government, 2011
- (4) World Values Survey, 2000
- (5) General Social Survey, 2010
- (6) Search across datasets
HELP
END

`week2.do`

DATA DESCRIPTION
- Finding variables
- Subsetting to cross-sectional format
- Survey weights
VARIABLE MANIPULATION
- Dependent variable: Body Mass Index
- Labelling a variable
- Summary statistics
- Visualization
- Logical expressions
INDEPENDENT VARIABLES
- Summarizing over categories
- Visualization over categories
FINALIZING A DATASET
- Patterns of missing values
- Subsetting
END

`week3.do`

DATA DESCRIPTION
- Dependent variable: Support for sharia law
- Recoding to dummies
- Stacked plots with dummies
INDEPENDENT VARIABLES
- IV: Gender
- IV: Age
- IV: Education
- IV: Employment status
- IV: Household composition
- IV: City size
FINALIZING THE DATASET
END

`week4.do`

DATA DESCRIPTION
- Dependent variable: Body Mass Index
- Independent variables
DISTRIBUTION
- Standard deviation
- Outliers
NORMALITY
- Visual assessment
- Formal assessment
- Variable transformation
- Comparison plot
SAMPLING ERROR
- Confidence intervals with means
- Confidence intervals with proportions
END

`week5.do`

DATA DESCRIPTION
- Dependent variable: Body Mass Index
- Breakdowns
- Independent variables
- Subsetting
- Normality
CONFIDENCE INTERVALS
- IV: Age
- IV: Gender
- IV: Race
- IV: Education
- IV: Income
- IV: Health insurance
- IV: Health affordability
EXPORT SUMMARY STATISTICS
END

`week6.do`

Note: the length is correct, but the large header should be replaced by in-code explanations.

DATA DESCRIPTION
- Dependent variable: Justifiability of torture in event of preventing terrorism
- Subsetting
SIGNIFICANCE TESTS
- IV: Age
- IV: Gender
- IV: Income deciles
- IV: Education
- IV: Religious faith
- IV: Political positioning
- IV: Media exposure
END

`week7.do`

DATA DESCRIPTION
- Finalized sample
CORRELATION
- (1) Fertility rates and schooling years
- (2) Schooling years and (log) Gross Domestic Product
- (3) Corruption and human development
- (4) Female government ministers and corruption
SCATTERPLOTS
- Scatterplot matrixes
- Scatterplots with marker labels
- Scatterplots with histograms
- Scatterplots with smoothed lines
END

`week8.do`

Note: continues week7.do.

REGRESSION MODELS
- (1) Fertility Rates and Schooling Years
- Plotting regression results
- Small multiples
- Fitting a transformed IV
- (2) Fertility Rates and (Log) Gross Domestic Product
- Fitting 'lin-log' equations
- (3) Corruption and Human Development
- Fitting a quadratic term
- (4) Fertility and Democracy
- Fitting a dummy predictor
- (5) Fertility and Women's Rights
- Fitting a categorical predictor
END

`week9.do`

DATA DESCRIPTION
- Subsetting
- Export summary statistics
ASSOCIATION TESTS
REGRESSION MODELS
- Simple linear regressions
- Multiple linear regression
- Standardised ('beta') coefficients
- Dummies (categorical variables)
REGRESSION DIAGNOSTICS
- (1) Standardized residuals
- (2) Heteroskedasticity
- (3) Variance inflation and interaction terms
EXPORT MODEL RESULTS
END

`week10.do`

Note: actually Week 11 in the course syllabus and synopsis.

DATA DESCRIPTION
- DV: Allow many/few immigrants of different race/ethnic group from majority
- IVs: age, gender, country of birth, education, income, left-right scale
- Export summary statistics
ASSOCIATION TESTS
REGRESSION MODELS
- Linear regression
- Logistic regression
- Marginal effects
- Sensitivity analysis
- Export model results
END

`week11.do`

Note: actually Week 10 in the course syllabus and synopsis.

TODO: fill in.

`week12.do`