Tools / Applied

Time: 10.45am – 12.15pm
Location: Oceania room
Chair: Andrew Sporle
This session will have talks from the following speakers:
  1. Will Scarrold Calibrating Economic Agent-Based Models with Microdata
  2. Alistair Ramsden Improved synthetic data method for Stats NZ datalab microdata output
  3. Nick Snellgrove and Petra Muellner AIS Explorer --- Integrating technology to prevent aquatic invasive species in Minnesota
  4. Petra Muellner Epidemix- visualising analytical complexity to improve decision-making for disease control
  5. Shrividya Ravi RAPping in the public sector: binding your legacy code into a pipeline with Python
  6. Simon Anastasiadis Accelerating Dataset Assembly
  7. Simon Anastasiadis Representative timeline modelling
  8. Will Haringa Data Wrangling At Warp Speed with Segna
  9. Tom Elliott Analysing Surveys with iNZight
  10. Adrian Ortiz-Cervantes Modelling the transfer of tacit knowledge on temporal bipartite networks

Will Scarrold

Te Pūnaha Matatini (University of Auckland) and the Institute for New Economic Thinking (University of Oxford)

View Presentation

Calibrating Economic Agent-Based Models with Microdata

Agent-Based Modelling Integrated Data Infrastructure Economics Information Geometry

Despite the complexity of modern socio-economic systems, current benchmark models assume the economy is much simpler than it really is: households are often assumed to be identical, and firms are assumed to use the same equipment to produce the same “representative” product.

Recent developments in Agent-Based Modelling have provided an alternative to conventional models. Such models show comparable forecasting performance despite being a relatively early stage of development.

This presentation will discuss ABM modelling and the use of microdata from the New Zealand IDI to calibrate these models.

More information on recent macroeconomic agent-based models can be found at

Alistair Ramsden

Statistical Methods Census Methodology Team, Stats NZ

View Presentation

Improved synthetic data method for Stats NZ datalab microdata output

confidentiality methodology datalab synthetic data

Stats NZ has approved a variation to Microdata Output Guide confidentiality rules, for the ‘He Ara Poutama mō te reo Māori SURF’ datalab project.

The synthetic data method is Classification And Regression Tree (CART) modelling, implemented using the Synthpop R package. The difference compared to existing rules is to generate, test, and release synthetic data counts tables with noised counts values {0,3,6,9,12,…}. Previously such values were {‘Suppressed’,6,9,12,…}.

The new method releases data with better utility (inferential validity), yet retaining adequate safety (disclosure control).

Next steps are to calculate Differential Privacy (DP) parameters {epsilon, delta} for this data and method.

Nick Snellgrove and Petra Muellner


View Presentation

AIS Explorer — Integrating technology to prevent aquatic invasive species in Minnesota

R Shiny Python cloud computing analytics dashboard

Minnesota in the United States is well known for its beautiful “10,000 lakes” - however the complex system of waterbodies is threatened by the spread of aquatic invasive species. We developed the AIS Explorer in collaboration with the Aquatic Invasive Species Research Center of the University of Minnesota to bridge research and decision making, for example to optimise watercraft inspections.

AIS Explorer:

In this talk we will showcase how a diverse set of technologies, like R, R Shiny, Python and cloud computing was integrated to a create a tool that is easy to use and matches stakeholder needs.

Petra Muellner

Epi-interactive Ltd. & Massey University

View Presentation

Epidemix- visualising analytical complexity to improve decision-making for disease control

SIR models risk management R R Shiny infectious disease

Epidemix^[Muellner U, Fournie G, Muellner P, Ahlstrom C, Pfeiffer D. epidemix - an Interactive Multi-Model Application for Teaching and Visualizing Infectious Disease Transmission. Epidemics, doi: 10.1016/j.epidem.2017.12.003, 2017.] allows users to develop an understanding of the impact of disease modelling assumptions on the trajectory of an epidemic and the impact of control interventions, without having to directly deal with the complexity of equations and programming languages. The app provides a visual interface for nine generic models, plus two disease-specific models of international relevance (COVID-19 and ASF). Epidemix supports the teaching of mathematical modelling to non-specialists – including policy makers by demonstrating key concepts of disease dynamics and control in a hands-on way. Funded by the Royal Veterinary College and the City University of Hong Kong (

Shrividya Ravi

Ministry of Transport

View Presentation

RAPping in the public sector: binding your legacy code into a pipeline with Python

Python data engineering legacy code analytics Reproducible analytical pipelines (RAP)

Straggling legacy SAS scripts requiring manual steps are a common “feature” of data processing tasks in the public sector. However, refactoring such scripts to modern, reproducible analytical pipelines (RAP) can be challenging due to a lack of IT infrastructure or high complexity. In such situations, interim solutions can at least reduce manual effort and mental overhead. One such solution is using Python as an effective glue to create one click execution pipelines. Manual tasks like downloading data from email, updating new data file names in scripts, running scripts in sequence and more, can be managed with Python and its rich ecosystem of packages. In this talk, I will showcase how three Python packages, exchangelib, jupyter and saspy, can create quick and easy automated versions of legacy SAS scripts that contain many types of manual steps.

Simon Anastasiadis

Social Wellbeing Agency

View Presentation

Accelerating Dataset Assembly

data preparation data wrangling tools process

Data preparation is a key stage for analytic and research projects. However, as the number of data sources increases so does the complexity of preparation. Without a consistent method for assembling analysis-ready datasets, this process can become time-consuming, expensive, and error prone.

In response to this, we have developed the Dataset Assembly Tool. By standardising and automating the data preparation and dataset assembly stages of analytic projects, the tool helps staff deliver higher quality work faster. The assembly tool is now available for other researchers to use. This presentation with describe the tool and its advantages.

Simon Anastasiadis

Social Wellbeing Agency

View Presentation

Representative timeline modelling

methodologies tools timeline

Timeline are sequences of events and periods which reflect significant parts of a person’s experience. They can be an effective tool for researchers seeking to understand life events, and interactions between events, over time.

To share timeline information while respecting privacy and confidentiality, we have developed a representative timeline methodology. This method produces a timeline that captures experiences that are common across a group of people — similar to a group average.

This presentation will demonstrate the technique, drawing examples from our first application of it – a study of South Auckland families’ experiences around the birth of a child.

Will Haringa


View Presentation

Data Wrangling At Warp Speed with Segna

data wrangling automated data science

As a Data scientist you want your data in an analysis-ready form faster, delivered through pipelines that don’t break as the shape of your data changes. Harnessing the power of machine learning, Segna’s smart wrangling tool aggregates and cleans data from multiple sources up to 1,600 times faster!

More information:

Tom Elliott

Te Rourou Tātaritanga, Victoria University of Wellington and The University of Auckland

View Presentation

Analysing Surveys with iNZight

surveys tools GUI

Survey data has a huge importance for many research groups, but many software tools require users to understand the survey design and know how to specify it to the relevent program. iNZight is a graphical user interface for R that lets researchers quickly and efficiently visualise and explore their data. Recent changes to iNZight allow users to forget the design and focus on what matters: exploring their data.

For more information, visit

Adrian Ortiz-Cervantes

University of Auckland and Te Pūnaha Matatini

View Presentation

Modelling the transfer of tacit knowledge on temporal bipartite networks

Complex Networks Knowledge Transfer IDI

In this work we analyze the employee mobility in New Zealand and its repercussions on knowledge and skills transfer amongst different industries, in the context of in temporal complex networks. Using tax records from all the employees and firms in New Zealand from the year 2000 to 2017, we created a bipartite temporal network of employers and employees that allow us to implement a model for the transfer of tacit knowledge between employees, and estimate the stock of knowledge inside firms and sectors across different skills.