Tools / Applied
- Will Scarrold Calibrating Economic Agent-Based Models with Microdata
- Alistair Ramsden Improved synthetic data method for Stats NZ datalab microdata output
- Nick Snellgrove and Petra Muellner AIS Explorer --- Integrating technology to prevent aquatic invasive species in Minnesota
- Petra Muellner Epidemix- visualising analytical complexity to improve decision-making for disease control
- Shrividya Ravi RAPping in the public sector: binding your legacy code into a pipeline with Python
- Simon Anastasiadis Accelerating Dataset Assembly
- Simon Anastasiadis Representative timeline modelling
- Will Haringa Data Wrangling At Warp Speed with Segna
- Tom Elliott Analysing Surveys with iNZight
- Adrian Ortiz-Cervantes Modelling the transfer of tacit knowledge on temporal bipartite networks
Will Scarrold
Te Pūnaha Matatini (University of Auckland) and the Institute for New Economic Thinking (University of Oxford)
Calibrating Economic Agent-Based Models with Microdata
Despite the complexity of modern socio-economic systems, current benchmark models assume the economy is much simpler than it really is: households are often assumed to be identical, and firms are assumed to use the same equipment to produce the same “representative” product.
Recent developments in Agent-Based Modelling have provided an alternative to conventional models. Such models show comparable forecasting performance despite being a relatively early stage of development.
This presentation will discuss ABM modelling and the use of microdata from the New Zealand IDI to calibrate these models.
More information on recent macroeconomic agent-based models can be found at tinyurl.com/econABM
Improved synthetic data method for Stats NZ datalab microdata output
Stats NZ has approved a variation to Microdata Output Guide confidentiality rules, for the ‘He Ara Poutama mō te reo Māori SURF’ datalab project.
The synthetic data method is Classification And Regression Tree (CART) modelling, implemented using the Synthpop R package. The difference compared to existing rules is to generate, test, and release synthetic data counts tables with noised counts values {0,3,6,9,12,…}. Previously such values were {‘Suppressed’,6,9,12,…}.
The new method releases data with better utility (inferential validity), yet retaining adequate safety (disclosure control).
Next steps are to calculate Differential Privacy (DP) parameters {epsilon, delta} for this data and method.
AIS Explorer — Integrating technology to prevent aquatic invasive species in Minnesota
Minnesota in the United States is well known for its beautiful “10,000 lakes” - however the complex system of waterbodies is threatened by the spread of aquatic invasive species. We developed the AIS Explorer in collaboration with the Aquatic Invasive Species Research Center of the University of Minnesota to bridge research and decision making, for example to optimise watercraft inspections.
AIS Explorer: https://www.aisexplorer.umn.edu/
In this talk we will showcase how a diverse set of technologies, like R, R Shiny, Python and cloud computing was integrated to a create a tool that is easy to use and matches stakeholder needs.
Epidemix- visualising analytical complexity to improve decision-making for disease control
Epidemix^[Muellner U, Fournie G, Muellner P, Ahlstrom C, Pfeiffer D. epidemix - an Interactive Multi-Model Application for Teaching and Visualizing Infectious Disease Transmission. Epidemics, doi: 10.1016/j.epidem.2017.12.003, 2017.] allows users to develop an understanding of the impact of disease modelling assumptions on the trajectory of an epidemic and the impact of control interventions, without having to directly deal with the complexity of equations and programming languages. The app provides a visual interface for nine generic models, plus two disease-specific models of international relevance (COVID-19 and ASF). Epidemix supports the teaching of mathematical modelling to non-specialists – including policy makers by demonstrating key concepts of disease dynamics and control in a hands-on way. Funded by the Royal Veterinary College and the City University of Hong Kong (www.epidemix.app).
RAPping in the public sector: binding your legacy code into a pipeline with Python
Straggling legacy SAS scripts requiring manual steps are a common “feature” of data processing tasks in the public sector. However, refactoring such scripts to modern, reproducible analytical pipelines (RAP) can be challenging due to a lack of IT infrastructure or high complexity. In such situations, interim solutions can at least reduce manual effort and mental overhead. One such solution is using Python as an effective glue to create one click execution pipelines. Manual tasks like downloading data from email, updating new data file names in scripts, running scripts in sequence and more, can be managed with Python and its rich ecosystem of packages. In this talk, I will showcase how three Python packages, exchangelib, jupyter and saspy, can create quick and easy automated versions of legacy SAS scripts that contain many types of manual steps.
Accelerating Dataset Assembly
Data preparation is a key stage for analytic and research projects. However, as the number of data sources increases so does the complexity of preparation. Without a consistent method for assembling analysis-ready datasets, this process can become time-consuming, expensive, and error prone.
In response to this, we have developed the Dataset Assembly Tool. By standardising and automating the data preparation and dataset assembly stages of analytic projects, the tool helps staff deliver higher quality work faster. The assembly tool is now available for other researchers to use. This presentation with describe the tool and its advantages.
Representative timeline modelling
Timeline are sequences of events and periods which reflect significant parts of a person’s experience. They can be an effective tool for researchers seeking to understand life events, and interactions between events, over time.
To share timeline information while respecting privacy and confidentiality, we have developed a representative timeline methodology. This method produces a timeline that captures experiences that are common across a group of people — similar to a group average.
This presentation will demonstrate the technique, drawing examples from our first application of it – a study of South Auckland families’ experiences around the birth of a child.
Data Wrangling At Warp Speed with Segna
As a Data scientist you want your data in an analysis-ready form faster, delivered through pipelines that don’t break as the shape of your data changes. Harnessing the power of machine learning, Segna’s smart wrangling tool aggregates and cleans data from multiple sources up to 1,600 times faster!
More information: https://segna.io
Tom Elliott
Te Rourou Tātaritanga, Victoria University of Wellington and The University of Auckland
<tom.elliott at vuw.ac.nz>
Analysing Surveys with iNZight
Survey data has a huge importance for many research groups, but many software tools require users to understand the survey design and know how to specify it to the relevent program. iNZight is a graphical user interface for R that lets researchers quickly and efficiently visualise and explore their data. Recent changes to iNZight allow users to forget the design and focus on what matters: exploring their data.
For more information, visit https://inzight.nz.
Modelling the transfer of tacit knowledge on temporal bipartite networks
In this work we analyze the employee mobility in New Zealand and its repercussions on knowledge and skills transfer amongst different industries, in the context of in temporal complex networks. Using tax records from all the employees and firms in New Zealand from the year 2000 to 2017, we created a bipartite temporal network of employers and employees that allow us to implement a model for the transfer of tacit knowledge between employees, and estimate the stock of knowledge inside firms and sectors across different skills.