Synthetic data in the social sciences using generative adversarial networks


Data across the social sciences is incredibly complex. Novel experimental and statistical designs are continually being developed to make robust inferences, but often the simulations and data used to test these methods either fail to reflect the complexity of the social system or offer little way of knowing what `correct' inference should look like. This project aims to solve this fundamental issue by using generative adversarial networks, a cutting-edge form of deep learning, to build synthetic datasets that mirror realistic data generating processes while also having discoverable population parameters. This project will define and develop a set of practical principles, as well as open-source software, to generate synthetic data for use in a variety of social science contexts. This new way of simulating social science data has wide applicability to a range of issues including experimental power calculations and the testing of new statistical methods.

Work in progress