MPC Job Opening: Data Workflow Developer

Requisition 195847
The Minnesota Population Center (MPC, www.pop.umn.edu) is a leader in the field of quantitative social
science research and the largest disseminator of census and demographic data to the world’s academic
research community. Or, to put it another way – we’re on a mission to gather, process, link and publish
billions of records spanning hundreds of years and more than 100 countries for demographers, historians,
economists, environmental scientists, journalists, policymakers, and others around the globe, who then use
the data to do amazing research and make the world a better place.
The staff of the MPC IT Core support this mission by using leading open source tools to solve complex
data and computation challenges and build reliable, scalable web-based data dissemination systems. The
MPC IT Core is a small group of talented and dedicated individuals. Your work will be highly visible and
will contribute directly to the overall success of our organization. Read more about MPC IT at
http://tech.popdata.org/.

Responsibilities
We are currently seeking an experienced developer to build tools and frameworks to support our mission
to integrate census and demographic data through time and across geography. As our data workflow
developer, you will apply scalable, high-performance computing approaches to enable and automate the
processing of massive, complex datasets and allow our researchers to quickly test new insights and
algorithms in our workflows. You will report to a lead software developer.
You will be responsible for 1) working directly with researchers to understand data transformations (e.g.
recoding, data correction, new variable construction, harmonization, and other logical edits) that need to
be performed; 2) developing domain expertise with our large-scale demographic datasets; 3) bringing
appropriate technologies to bear to execute these transformations to our datasets with aggressive
performance requirements; 4) creating modular frameworks which allow researchers to express and apply
edits to the data workflows; 5) creating and maintaining user interfaces to allow researchers to efficiently
manage processing tasks and make the underlying complexity as transparent as possible; 6) transitioning
existing processes to leverage these new technologies and approaches; and, 7) providing end user support
and developer mentorship for these systems.
In this role, you will make a significant contribution to the MPC’s ongoing transition to next-generation
storage and computation technologies, such as Hadoop-based platforms and column-store databases, with
an opportunity to work closely with peers from other product teams at the Center. You will work
throughout the project life cycle, from architecture and design through implementation to deployment and
support. You must be able to work largely independently on complex projects and take complete technical
ownership of the applications that are developed.

Work Breakdown

  • 30% Software Architecture and Design. Analyzing business needs, architecting designs, and
    implementing solutions, including interaction with researchers and end-users.
  • 50% Software Implementation. Coding, refactoring, and testing in a team environment.• 10% Deployment and Support. Working with operations staff to build out infrastructure to support
    new systems. Developing deployment processes to production environments. Providing system
    support to the team and our researchers.
  • 10% Other duties as assigned. Mentoring of junior developers, professional development activities,
    participation in IT working groups, and other tasks as assigned.

Qualifications
The minimum requirements for this position are four years of professional software development
experience with a related bachelor’s degree or six years of professional software development experience
with a non-related bachelor’s degree.

Experience must include:

  • Proficiency with C/C++
  • Proficiency with at least one scripting language, such as Python, Perl or Ruby
  • Proven track record of effective work planning/managing of software development projects
  • Experience with a database management system (RDBMS or otherwise)
  • Excellent oral and written communication skills with technical and non-technical audiences

Additional selection criteria include experience with the following: iPython; Automating data processing
workflows; Creating or working with domain-specific languages (DSLs); Machine learning and data
mining approaches; Distributed PostgreSQL (e.g. plproxy, Postgres-XL, sharding); RDBMS-alternative
technologies such as Hadoop, HDFS or Spark; Familiarity with distributed computing in a high
performance computing environment; and working in an academic research environment.
Application Procedures
Please apply using the University of Minnesota’s online employment system (http://z.umn.edu/roy). Attach a
cover letter, resume, and contact information for three professional references to your online application. Your
cover letter is a great opportunity for you to explain your interest in our position opening and to highlight your
relevant skills and abilities. The search committee will begin its review of applications immediately upon
receipt; the position will remain open until filled.
Any offer of employment is contingent upon the successful completion of a background check. Our
presumption is that prospective employees are eligible to work here. Criminal convictions do not automatically
disqualify finalists from employment.