Making sense with data visualisation
By James Nicholson
Statistics to me has always been about trying to make the best sense of incomplete information and having some feeling for how good that ‘best sense’ is. At a very crude level if you have a firm employing 235 people and you randomly sample 200 of these on some topic, I would feel my information was pretty good (even though it is incomplete). If the information I have is based on a sample of five people or I have asked all the people in one office, then I would know my information was nothing like as good as in the former case.
More than ever, in the current International Year of Statistics, there is an acceptance that understanding quantitative information is a necessary skill in almost any academic discipline and in almost all professional jobs (and very many jobs at lower levels). Statistics is used wide range of contexts such as physical, life and social sciences, sports, marketing, finance, geography, and psychology. In fact it’s used anywhere there is interesting data, and with supporting visual explanations of what is happening in various statistical techniques, it need not be an intimidating area to be involved in.
I am currently doing some work at Durham on data visualisation, including on education performance data, the 2011 UK riots, and health. For example, interactive data resources show the proportions of pupils gaining five good GCSEs (with and without a requirement to include English and Maths), disaggregated by sex, ethnicity and whether they are eligible for free school meals. The first screen shot shows boys’ performance rates for various ethnic groups and how eligibility for free school meals varies across ethnic groups. You can see it’s very dramatic in both white and mixed groups, and much more modest for asian, black and other groups, and almost non-existent for the Chinese. The second screen shot shows how the display changes if the bottom slider is moved to change the performance measure to remove the requirement for English and Maths. The position of the variables can be moved (just drag and drop) to different positions to allow other comparisons to be made directly, and to develop a real sense of the stories in the data.
It would be much more logical if social scientists wanting to put forward theoretical explanations for inequalities in health, in education, in crime etc., were able to explore the data actively in an interface like this – to develop a rich picture of the relationships between factors, which are important and which less so, where particular combinations of factors give unexpected outcomes – and then to try to provide theory which is consistent with the observed patterns of behaviour.
Additionally, I have just started working on a new project working on visualisations of 2011 UK Census data and with Imperial College Reach Out Lab on supporting data sharing in science. Essentially there is a Pratice Transfer Partnership of HE Reach Out Labs where we are trying to develop experiments with more variables that different institutions will collaborate on to bulld a large multi-variate data set which students and teachers would then have access to embedded in our visualisation tools. The ambition is to tie more mathematics in with authentic scientific enquiry, so the collaboration between Science and Mathematics is something with real potential in making mathematics and statistics more directly and obviously relevant to students.
James Nicholson is the author of Statistics S1 and Statistics S2 in the A Level Mathematics for Edexcel course published by Oxford University Press. He is also Principal Research Fellow at the SMART Centre at Durham University.
Subscribe to the OUPblog via email or RSS.
Subscribe to only mathematics articles on the OUPblog via email or RSS.
Image credit: Graphs created by James Nicholson. Used with permission. All rights reserved.