• Fragile Families Challenge: Getting Started Workshop

    CCPR Seminar Room 4240 Public Affairs Building, Los Angeles, CA, United States

    “Fragile Families Challenge: Getting Started Workshop” Ian Lundberg Ph.D. Student, Sociology and Social Policy,  Princeton University The Fragile Families Challenge is a scientific mass collaboration that combines predictive modeling, causal inference, and in-depth interviews in order to learn more about the lives of disadvantaged children. Fragile Families Challenge builds on the Fragile Families and Child Wellbeing Study […]

  • Jake Bowers, University of Illinois at Urbana-Champaign

    Franz Hall 2258A

    "Rules of Engagement in Evidence-Informed Policy: Practices and Norms of Statistical Science in Government"

    Abstract: Collaboration between statistical scientists (data scientists, behavioral and social scientists, statisticians) and policy makers promises to improve government and the lives of the public. And the data and design challenges arising from governments offer academics new chances to improve our understanding of both extant methods and behavioral and social science theory. However, the practices that ensure the integrity of statistical work in the academy — such as transparent sharing of data and code — do not translate neatly or directly into work with governmental data and for policy ends. This paper proposes a set of practices and norms that academics and practitioners can agree on before launching a partnership so that science can advance and the public can be protected while policy can be improved. This work is at an early stage. The aim is a checklist or statement of principles or memo of understanding that can be a template for the wide variety of ways that statistical scientists collaborate with governmental actors.

  • Adeline Lo, Princeton University

    1434A Physics and Astronomy Building

    Covariate screening in high dimensional data: applications to forecasting and text data

    High dimensional (HD) data, where the number of covariates and/or meaningful covariate interactions might exceed the number of observations, is increasing used in prediction in the social sciences. An important question for the researcher is how to select the most predictive covariates among all the available covariates. Common covariate selection approaches use ad hoc rules to remove noise covariates, or select covariates through the criterion of statistical significance or by using machine learning techniques. These can suffer from lack of objectivity, choosing some but not all predictive covariates, and failing reasonable standards of consistency that are expected to hold in most high-dimensional social science data. The literature is scarce in statistics that can be used to directly evaluate covariate predictivity. We address these issues by proposing a variable screening step prior to traditional statistical modeling, in which we screen covariates for their predictivity. We propose the influence (I) statistic to evaluate covariates in the screening stage, showing that the statistic is directly related to predictivity and can help screen out noisy covariates and discover meaningful covariate interactions. We illustrate how our screening approach can removing noisy phrases from U.S. Congressional speeches and rank important ones to measure partisanship. We also show improvements to out-of-sample forecasting in a state failure application. Our approach is applicable via an open-source software package.

  • UCLA CCPR