Fragile Families Challenge: Getting Started Workshop

CCPR Seminar Room 4240 Public Affairs Building, Los Angeles, CA, United States

“Fragile Families Challenge: Getting Started Workshop” Ian Lundberg Ph.D. Student, Sociology and Social Policy,  Princeton University The Fragile Families Challenge is a scientific mass collaboration that combines predictive modeling, causal inference, and […]

James Robins, Harvard University

Room 33-105 CHS Building 650 Charles E Young Drive South, Los Angeles, CA, United States

The UCLA Departments of Epidemiology, Biostatistics, Statistics and the Center for Social Statistics presents: Causal Methods in Epidemiology: Where has it got us and what can we expect in the […]

Sander Greenland, UCLA Department of Epidemiology

The UCLA Department of Statistics and the Center for Social Statistics presents: Statistical Significance and Discussion of the Challenges of Avoiding the Abuse of Statistical Methodology Sander Greenland will offer […]

Hadley Wickham, RStudio

The UCLA Department of Statistics and the Center for Social Statistics presents: Programming data science with R & the tidyverse Tidy evaluation is a new framework for non-standard evaluation that […]

Rob Warren, University of Minnesota

CCPR Seminar Room 4240 Public Affairs Building, Los Angeles, CA, United States

"When Should Researchers Use Inferential Statistics When Analyzing Data on Full Populations?"

Abstract: Many researchers uncritically use inferential statistical procedures (e.g., hypothesis tests) when analyzing complete population data—a situation in which inference may seem unnecessary. We begin by reviewing and analyzing the most common rationales for employing inferential procedures when analyzing full population data. Two common rationales—having to do with handling missing data and generalizing results to other times and/or places—either lack merit or amount to analyzing sample (not population) data. Whether it is appropriate to use inferential procedures depends on whether researchers are analyzing sample or population data and on whether they seek to make causal or descriptive claims. When doing descriptive research, the distinction between sample and population data is paramount: Inferential statistics should only be used to analyze sample data (to account for sampling variability) and never to analyze population data. When doing causal research, the distinction between sample data and population data is unimportant: Inferential procedures can and should always be used to distinguish (for example) robust associations from those that may have come about by chance alone. Crucially, using inferential procedures to analyze population data to make descriptive claims can lead to incorrect substantive conclusions—especially when population sizes and/or effect sizes are small.

Yu Xie, Princeton

CCPR Seminar Room 4240 Public Affairs Building, Los Angeles, CA, United States

"Heterogeneous Causal Effects: A Propensity Score Approach "

Abstract: Heterogeneity is ubiquitous in social science. Individuals differ not only in background characteristics, but also in how they respond to a particular treatment. In this presentation, Yu Xie argues that a useful approach to studying heterogeneous causal effects is through the use of the propensity score. He demonstrates the use of the propensity score approach in three scenarios: when ignorability is true, when treatment is randomly assigned, and when ignorability is not true but there are valid instrumental variables.

Jake Bowers, University of Illinois at Urbana-Champaign

Franz Hall 2258A

"Rules of Engagement in Evidence-Informed Policy: Practices and Norms of Statistical Science in Government"

Abstract: Collaboration between statistical scientists (data scientists, behavioral and social scientists, statisticians) and policy makers promises to improve government and the lives of the public. And the data and design challenges arising from governments offer academics new chances to improve our understanding of both extant methods and behavioral and social science theory. However, the practices that ensure the integrity of statistical work in the academy — such as transparent sharing of data and code — do not translate neatly or directly into work with governmental data and for policy ends. This paper proposes a set of practices and norms that academics and practitioners can agree on before launching a partnership so that science can advance and the public can be protected while policy can be improved. This work is at an early stage. The aim is a checklist or statement of principles or memo of understanding that can be a template for the wide variety of ways that statistical scientists collaborate with governmental actors.

Erin Hartman, University of California Los Angeles

CCPR Seminar Room 4240 Public Affairs Building, Los Angeles, CA, United States

Covariate Selection for Generalizing Experimental Results

Researchers are often interested in generalizing the average treatment effect (ATE) estimated in a randomized experiment to non-experimental target populations. Researchers can estimate the population ATE without bias if they adjust for a set of variables affecting both selection into the experiment and treatment heterogeneity.Although this separating set has simple mathematical representation, it is often unclear how to select this set in applied contexts. In this paper, we propose a data-driven method to estimate a separating set. Our approach has two advantages. First, our algorithm relies only on the experimental data. As long as researchers can collect a rich set of covariates on experimental samples, the proposed method can inform which variables they should adjust for. Second, we can incorporate researcher-specific data constraints. When researchers know certain variables are unmeasurable in the target population, our method can select a separating set subject to such constraints, if one is feasible. We validate our proposed method using simulations, including naturalistic simulations based on real-world data.

Co-Sponsored with The Center for Social Statistics

Adrian Raftery, University of Washington

CCPR Seminar Room 4240 Public Affairs Building, Los Angeles, CA, United States

Bayesian Population Projections with Migration Uncertainty

The United Nations recently issued official probabilistic population projections for all countries for the first time, using a Bayesian hierarchical modeling framework developed by our group at the University of Washington. These take account of uncertainty about future fertility and mortality, but not international migration. We propose a Bayesian hierarchical autoregressive model for obtaining joint probabilistic projections of migration rates for all countries, broken down by age and sex. Joint trajectories for all countries are constrained to satisfy the requirement of zero global net migration. We evaluate our model using out-of-sample validation and compare point projections to the projected migration rates from a persistence model similar to the UN's current method for projecting migration, and also to a state of the art gravity model. We also resolve an apparently paradoxical discrepancy between growth trends in the proportion of the world population migrating and the average absolute migration rate across countries. This is joint work with Jonathan Azose and Hana Ševčíková.

Co-sponsored with the Center for Social Statistics 

Rocio Titiunik, University of Michigan

CCPR Seminar Room 4240 Public Affairs Building, Los Angeles, CA, United States

Internal vs. external validity in studies with incomplete populations

Researchers working with administrative data rarely have access to the entire universe of units they need to estimate effects and make statistical inferences. Examples are varied and come from different disciplines. In social program evaluation, it is common to have data on all households who received the program, but only partial information on the universe of households who applied or could have applied for the program. In studies of voter turnout, information on the total number of citizens who voted is usually complete, but data on the total number of voting-eligible citizens is unavailable at low levels of aggregation. In criminology, information on arrests by race is available, but the overall population that could have potentially been arrested is typically unavailable. And in studies of drug overdose deaths, we lack complete information about the full population of drug users.

In all these cases, a reasonable strategy is to study treatment effects and descriptive statistics using the information that is available. This strategy may lack the generality of a full-population study, but may nonetheless yield valuable information for the included units if it has sufficient internal validity. However, the distinction between internal and external validity is complex when the subpopulation of units for which information is available is not defined according to a reproducible criterion and/or when this subpopulation itself is defined by the treatment of interest. When this happens, a useful approach is to consider the full range of conclusions that would be obtained under different possible scenarios regarding the missing information. I discuss a general strategy based on partial identification ideas that may be helpful to assess sensitivity of the partial-population study under weak (non-parametric) assumptions, when information about the outcome variable is known with certainty for a subset of the units. I discuss extensions such as the inclusion of covariates in the estimation model and different strategies for statistical inference.

Co-sponsored with the Political Science Department, Statistics Department and the Center for Social Statistics 

Kosuke Imai, Harvard University

CCPR Seminar Room 4240 Public Affairs Building, Los Angeles, CA, United States

Matching Methods for Causal Inference with Time-Series Cross-Section Data

Matching methods aim to improve the validity of causal inference in observational studies by reducing model dependence and offering intuitive diagnostics. While they have become a part of standard tool kit for empirical researchers across disciplines, matching methods are rarely used when analyzing time-series cross-section (TSCS) data, which consist of a relatively large number of repeated measurements on the same units.

We develop a methodological framework that enables the application of matching methods to TSCS data. In the proposed approach, we first match each treated observation with control observations from other units in the same time period that have an identical treatment history up to the pre-specified number of lags. We use standard matching and weighting methods to further refine this matched set so that the treated observation has outcome and covariate histories similar to those of its matched control observations. Assessing the quality of matches is done by examining covariate balance. After the refinement, we estimate both short-term and long-term average treatment effects using the difference-in-differences estimator, accounting for a time trend. We also show that the proposed matching estimator can be written as a weighted linear regression estimator with unit and time fixed effects, providing model-based standard errors. We illustrate the proposed methodology by estimating the causal effects of democracy on economic growth, as well as the impact of inter-state war on inheritance tax. The open-source software is available for implementing the proposed matching methods.

Co-sponsored with the Political Science Department, Statistics Department and the Center for Social Statistics

Workshop: Merging Entities – Deterministic, Approximate, & Probabilistic

4240 Public Affairs Bldg

Instructor: Michael Tzen Title: Merging Entities: Deterministic, Approximate, & Probabilistic Location: January 31, 2019, 2:00-3:00 PM 4240 Public Affairs Building CCPR Seminar Room Content: Combining information from different groups is […]

Adeline Lo, Princeton University

1434A Physics and Astronomy Building

Covariate screening in high dimensional data: applications to forecasting and text data

High dimensional (HD) data, where the number of covariates and/or meaningful covariate interactions might exceed the number of observations, is increasing used in prediction in the social sciences. An important question for the researcher is how to select the most predictive covariates among all the available covariates. Common covariate selection approaches use ad hoc rules to remove noise covariates, or select covariates through the criterion of statistical significance or by using machine learning techniques. These can suffer from lack of objectivity, choosing some but not all predictive covariates, and failing reasonable standards of consistency that are expected to hold in most high-dimensional social science data. The literature is scarce in statistics that can be used to directly evaluate covariate predictivity. We address these issues by proposing a variable screening step prior to traditional statistical modeling, in which we screen covariates for their predictivity. We propose the influence (I) statistic to evaluate covariates in the screening stage, showing that the statistic is directly related to predictivity and can help screen out noisy covariates and discover meaningful covariate interactions. We illustrate how our screening approach can removing noisy phrases from U.S. Congressional speeches and rank important ones to measure partisanship. We also show improvements to out-of-sample forecasting in a state failure application. Our approach is applicable via an open-source software package.

Workshop: Grad Student Panel Discussing the Causal Toolkit

4240 Public Affairs Bldg

Title: Grad Student Panel Discussing the Causal Toolkit Location: February 27, 2019, 2:00-3:30 PM 4240 Public Affairs Building CCPR Seminar Room Content: Focusing on the uses of the causal toolkit, […]

Lan Liu, University of Minnesota at Twin Cities

Lan Liu, University of Minnesota at Twin Cities "Parsimonious Regressions for Repeated Measure Analysis"  Abstract: Longitudinal data with repeated measures frequently arises in various disciplines. The standard methods typically impose […]

Eloise Kaizar, Ohio State University

Eloise Kaizar, Ohio State University Randomized controlled trials are often thought to provide definitive evidence on the magnitude of treatment effects. But because treatment modifiers may have a different distribution […]