Skip to content
François edited this page Jun 6, 2017 · 40 revisions

The page is in draft state, but you might want to read it as a survival guide to the course, in addition to the Stata Guide.

COURSE SESSIONS

All course sessions but the first are organised around a single case study that makes use of one of the teaching datasets:

Session(s) Data Substantive topic
1 --- Numbers and Development (introductory session)
2, 4, 5 NHIS Adult Obesity in the USA
3 WVS Support for Sharia Law in Muslim Countries
6 ESS Support for Torture in Israel
7, 8, 9 QOG Fertility and Education
10 ESS Attitudes Towards Migration in Europe
11 ESS Satisfaction with Health Services in Britain and France
12 GSS Numbers of Sexual Partners in the USA

Each session also introduces its own Stata learning steps.

Part 1. Exploratory analysis

  • Covered: dataset exploration and basic manipulation; descriptive statistics.
  • Readings: Urdan ch. 1 (basics), ch. 2-3 (central tendency and dispersion), ch. 4-5 (distributions).
  • See also: F & T, ch. 2.1–2.4 (variables), ch. 2.5–2.6 (distributions).
Sessions Methods covered Essential Stata commands
1 Course Setup Folder navigation: cd, ls, mkdir
File navigation: browse, doedit
Command execution: do, run, log
Command installation: adopath, ssc install
2 Data Exploration Data loading: use, clear, svyset
Data description: lookfor d, codebook
Missing values: count, li, in, misstable
Subsetting: keep, drop, if, mi()
3 Variable Recoding Summarizing: su, fre, tab, tabstat, bys
Tabulating: tab, gen(), tab, sum(), table
Recoding: gen, clonevar, ren, replace
Encoding: la, recode, irecode, decode, encode
4 Distributions Density: hist, kdensity, gr hbox
Normality: symplot, pnorm, qnorm
Transformations: gladder
Confidence intervals: ci, prop, serrbar

Assignment No.1: explore the course datasets, looking for samples and variables of interest. Write a do-file to describe your variables of interest, after recoding them if necessary. Export a summary statistics table, and write a short research design paper to present your dataset, sample and variables of interest, with attention to the distribution of the dependent variable.

Constraints: your sample must be cross-sectional, and unless you are willing to explore log-odds and logistic regression for binary dependent variables towards the end of the course, your dependent variable must be normally distributed to a reasonable extent.

Part 2. Association tests

  • Covered: inference, significance tests, ordinary least squares.
  • Readings: Urdan, ch. 6-7 (confidence intervals), ch. 9 and 14 (association tests), ch. 13 (least squares).
  • See also: F & T, ch. 5 (inference), ch. 6-7 (bivariate statistics), ch. 4 (least squares).
Sessions Methods covered Essential Stata commands
5 Confidence Intervals Commands from previous teaching bloc
Summary statistics tables: ttest
6 Association Tests Comparing means: ttest, prtest, gr dot, gr bar
Proportions: chi2, tabchi, V, spineplot
7 Correlation Correlation coefficients: corr, pwcorr, mkcorr
Correlation matrixes: sc, gr mat
8 Least Squares Least squares: reg, i., predict, rvfplot
Visualization: lfit, lfitci, lowess

Assignment No.2: using the existing research literature on your topic, hypothesize about how your variables of interest might non-trivially relate to one another. Explore these relationships by running significance tests of substantively significant associations. Document both the hypotheses and the results of these association in your research design paper.

Constraints: choose your association tests in relation in relation to the structure of your variables and to theoretical insights from existing research on your topic. If your dependent variable is binary, pay special attention to the computation of odds ratios.

Part 3. Regression models

  • Covered: linear and logistic additive models.
  • Readings: F & T, ch. 9.1-9.2, 11 (linear regression), ch. 12.1-12.3, 13.1-13.3 (logistic regression).
Sessions Methods covered Essential Stata commands
9 Multiple Linear Regression Commands from previous teaching bloc
Regression tables: leanout, estout
10 Logistic Regression Odds ratios: tabodds
Specification: logit, ologit, linktest
11 Regression diagnostics Residuals: rsta, rvpplot, avplot
Specification: b, vif, c., #
12 Marginal effects Marginal effects: margins, marginsplot
Extensions: vce(cluster), bootstrap, and some demos

Assignment No.3: run a series of linear (and/or logistic) regressions suggested by your previous explorations of the data. Report on the results in your research design paper by describing the marginal effects and residuals of your regression models, and by including a regression results table. Finalize your text as an empirical research paper.

Constraints: your paper should come as close as possible to publication level. When you are done, and only then, will you be done with your quest. Remove your helmet, reflect on the components of the Prophecy, and marvel as you ascend into your planar form.

DO-FILES

The typical structure for the course do-files contains the following elements:

  • PROLOGUE
    • header
    • setup
  • DATA DESCRIPTION
    • use
    • Dependent variable
    • Breakdowns (if any)
    • INDEPENDENT VARIABLES
    • FINALIZING (counting missing values with misstable and subsetting)
    • Normality
    • Export summary statistics
  • ASSOCIATION TESTS
    • (#) DV-IV relationships
    • (#) IV-IV relationships
    • Covariates
  • REGRESSION MODELS
    • (#) Models with DV
    • Models with covariates
    • Export regression results
  • END

The do-file are usually within 300-500 lines each, with a slightly longer introduction in Week 1. The do-files of the last teaching bloc show what students are expected to submit as their last assignment.

week1.do

  • Comments
  • Practice
  • Interface
  • WARM-UP EXERCISE
  • COMMANDS
    • Tip (1): Get to learn some syntax
    • Tip (2): Run all lines in sequential order
    • Tip (3): Keyboard shortcuts for Mac / Win
    • Tip (4): Command navigation
    • Tip (5): Run multiple lines together
  • SETUP
    • (1) Memory
    • (2) Screen breaks
    • (3) Additional commands
    • (4) Working directory
    • (5) Log
  • DATASETS
    • (1) List datasets
    • (2) European Social Survey Round 4, 2008
    • (3) Quality of Government, 2011
    • (4) World Values Survey, 2000
    • (5) General Social Survey, 2010
    • (6) Search across datasets
  • HELP
  • END

week2.do

  • DATA DESCRIPTION
    • Finding variables
    • Subsetting to cross-sectional format
    • Survey weights
  • VARIABLE MANIPULATION
    • Dependent variable: Body Mass Index
    • Labelling a variable
    • Summary statistics
    • Visualization
    • Logical expressions
  • INDEPENDENT VARIABLES
    • Summarizing over categories
    • Visualization over categories
  • FINALIZING A DATASET
    • Patterns of missing values
    • Subsetting
  • END

week3.do

  • DATA DESCRIPTION
    • Dependent variable: Support for sharia law
    • Recoding to dummies
    • Stacked plots with dummies
  • INDEPENDENT VARIABLES
    • IV: Gender
    • IV: Age
    • IV: Education
    • IV: Employment status
    • IV: Household composition
    • IV: City size
  • FINALIZING THE DATASET
  • END

week4.do

  • DATA DESCRIPTION
    • Dependent variable: Body Mass Index
    • Independent variables
  • DISTRIBUTION
    • Standard deviation
    • Outliers
  • NORMALITY
    • Visual assessment
    • Formal assessment
    • Variable transformation
    • Comparison plot
  • SAMPLING ERROR
    • Confidence intervals with means
    • Confidence intervals with proportions
  • END

week5.do

  • DATA DESCRIPTION
    • Dependent variable: Body Mass Index
    • Breakdowns
    • Independent variables
    • Subsetting
    • Normality
  • CONFIDENCE INTERVALS
    • IV: Age
    • IV: Gender
    • IV: Race
    • IV: Education
    • IV: Income
    • IV: Health insurance
    • IV: Health affordability
  • EXPORT SUMMARY STATISTICS
  • END

week6.do

Note: the length is correct, but the large header should be replaced by in-code explanations.

  • DATA DESCRIPTION
    • Dependent variable: Justifiability of torture in event of preventing terrorism
    • Subsetting
  • SIGNIFICANCE TESTS
    • IV: Age
    • IV: Gender
    • IV: Income deciles
    • IV: Education
    • IV: Religious faith
    • IV: Political positioning
    • IV: Media exposure
  • END

week7.do

  • DATA DESCRIPTION
    • Finalized sample
  • CORRELATION
    • (1) Fertility rates and schooling years
    • (2) Schooling years and (log) Gross Domestic Product
    • (3) Corruption and human development
    • (4) Female government ministers and corruption
  • SCATTERPLOTS
    • Scatterplot matrixes
    • Scatterplots with marker labels
    • Scatterplots with histograms
    • Scatterplots with smoothed lines
  • END

week8.do

Note: continues week7.do.

  • REGRESSION MODELS
    • (1) Fertility Rates and Schooling Years
    • Plotting regression results
    • Small multiples
    • Fitting a transformed IV
    • (2) Fertility Rates and (Log) Gross Domestic Product
    • Fitting 'lin-log' equations
    • (3) Corruption and Human Development
    • Fitting a quadratic term
    • (4) Fertility and Democracy
    • Fitting a dummy predictor
    • (5) Fertility and Women's Rights
    • Fitting a categorical predictor
  • END

week9.do

  • DATA DESCRIPTION
    • Subsetting
    • Export summary statistics
  • ASSOCIATION TESTS
  • REGRESSION MODELS
    • Simple linear regressions
    • Multiple linear regression
    • Standardised ('beta') coefficients
    • Dummies (categorical variables)
  • REGRESSION DIAGNOSTICS
    • (1) Standardized residuals
    • (2) Heteroskedasticity
    • (3) Variance inflation and interaction terms
  • EXPORT MODEL RESULTS
  • END

week10.do

Note: actually Week 11 in the course syllabus and synopsis.

  • DATA DESCRIPTION
    • DV: Allow many/few immigrants of different race/ethnic group from majority
    • IVs: age, gender, country of birth, education, income, left-right scale
    • Export summary statistics
  • ASSOCIATION TESTS
  • REGRESSION MODELS
    • Linear regression
    • Logistic regression
    • Marginal effects
    • Sensitivity analysis
    • Export model results
  • END

week11.do

Note: actually Week 10 in the course syllabus and synopsis.

TODO: fill in.

week12.do

TODO: fill in.