In the morning session today Sara Sullam and I will be presenting our work on exploring nominal (in our case study - bibliographical) data. We do it by borrowing a method from educational research - the notion of phenomenographic variation. #CHR2023🧵
First, what is #phenomenography and how could it help us. In contrast to phenomenology that studies what phenomena are, phenomenography is concerned "only" with how these are perceived. We build on the idea that scientific inquiry could also be seen as learning at the collective level. This is especially true for interdisciplinary research conducted by a team of researchers (hello digital humanities). A theory that emerged from phenomenographic research is what is called variation theory, i.e. one needs to experience variation to comprehend a phenomenon. And there is a very specific way to achieve this: by using patterns of contrast, generalisation and fusion
These patterns of variation consider aspects of phenomenon, which at a simple level could be seen as dimensions of data. The simplest of the three patterns is contrast, the idea that to start understanding a phenomenon one needs, to consider each of its dimensions in isolation (i.e. variating it while keeping others fixed). Our example is from translation of Italian novels from the post-war period into the UK market. We apply contrast on authors. Our way to fix other dimensions is by counting them
Finally, after a first exploration, one might feel ready to see the big picture, i.e. fusion. Of course after that one might tbacktrack to drill back into particular values.
One way to show multidimensional (nominal data, except for years) that we've found useful is the following graph. But more generally we need visualisation techniques that allow for multidimensional nominal data. For two dimensions heatmaps could be a good candidate. It gets more complicated with more dimensions. Alluvial diagrams could turn handy here
Finally, here's the full text. It has more context and examples. We'd love a discussion beyond the one after the presentation https://ceur-ws.org/Vol-3558/paper774.pdf
A note on visualisation. For no good reason a majority of visualisation tools cannot handle categorical dimensions. Probably they can't be bothered with explicitly having to specify an order of the visualisation dimension. This is only one of the reasons we found tools like Excel fast, but too limited to be useful. Programmable tools like ggplot or bokeh are more powerful, but iterations take too long to visualise and that slows down the exploration process. We've found #RAWgraphs a great solution for two reasons. First, it allows categorical dimensions in the charts we used, and second, it allows SVG export for post-processing. So shout out to https://www.rawgraphs.io/