Parliamentary dialogue is critical for political science research. In Australia, following the UK tradition, the written record of what is said in parliament is known as Hansard. While the Australian Hansard has always been publicly available, it has been difficult to use it for the purpose of large-scale macro and micro-level text analysis because it has not been available as a dataset of sufficient quality to be credibly analysed with statistical models. Following the lead of the Lipad project which achieved this for Canada, our project aims to provide a new, comprehensive, high-quality database that captures all proceedings of the Australian parliamentary debates from 1901 to the present day using Hansard. To create this database, scripts were written to parse and clean each element of the transcript from its XML format to produce a tidy, ordered dataset which captures detailed information on every statement made in parliament. This includes variables such as who spoke, their political affiliation, and exactly what they said, as well as flags to specify the nature of their statement. Our dataset will be publicly available and linked to other datasets such as election results. The creation and accessibility of this dataset will enable the exploration of questions that are not currently possible to explore, serving as a valuable resource for both researchers and policymakers. This work will provide a thorough description of the creation and computational underpinnings of this database, followed by a discussion of some applications.
Bio: Lindsay Katz holds a Masters of Statistics from the University of Toronto and a Bachelor of Arts and Science from the University of Guelph where she specialized in Mathematical Science and International Development. At Guelph she worked with Professor Ryan Briggs to explore lived poverty in Africa using Afrobarometer data. At Toronto she works with Professor Monica Alexander to research demographic variation in short-term migration patterns using Facebook data, and with Professor Rohan Alexander to digitize the Australian parliamentary debates from 1901 to present. As an interdisciplinary researcher, she is interested in using statistics to better understand social processes in the world.