Instructor:
Michael Tzen
Title:
Merging Entities: Deterministic, Approximate, & Probabilistic
Location:
January 31, 2019, 2:00-3:00 PM
4240 Public Affairs Building
CCPR Seminar Room
Content:
Combining information from different groups is a fundamental procedure in the data analysis pipeline. Using NBA and NCAA data, we will walk through deterministic, approximate, and probabilistic methods to merge entities from the different data sources. Is Luc Richard Mbah a Moute playing in the NBA the same Luc Mbah a Moute who played for the University of California, Los Angeles? We’ll discuss how the probabilistic methods loosely relate to matching in causal analysis. After this workshop, participants should be able to merge data sets 3 different ways and think about how the merge quality may affect downstream analysis.
Please RSVP below