In a previous post, [1] I finished with a quote from a New Yorker article by Craig Mod, in which he suggested some different types of data that might be gathered and ‘pinned’ to the ‘back’ of a digital photograph. Amongst the data he suggested we might collect were “location, weather, even radiation levels […] social status and state of mind”, in other words data drawn from sensors and from social networking sites such as Facebook and Twitter. [2] I made the point then that there is a huge difference between data taken from a sensor and data from taken from Facebook, the first is reliant on measuring some kind of physical quality; the second, the expressions of people. I stand by that point, but now I want to examine the two in a little more detail.
Before I do, I want to clarify what I mean by the terms ‘data’ and ‘information’, because I have noticed that they are often incorrectly used interchangeably. To put it in the simplest possible terms, ‘data’ refers to raw measurements, for example, a number of temperature readings taken at intervals throughout the day. ‘Information’ on the other hand is what we get when ‘data’ is collated in a manner that tells us something, for example, if the temperature readings were plotted onto a graph to show how they change over the course of the day.
It might then be assumed that the term ‘data visualisation’ refers to the creation of information. Often it does but, especially in the case of artworks, it is not necessarily so straightforward. Lev Manovich defines ‘visualisation’ as a particular subset of ‘mapping’, in which non-visual data, such as temperature, is rendered visually – think of the heat maps that appear on the weather forecast. [3] The basic idea behind mapping is that one set of data (stored digitally as numbers) can be mapped onto another set of data, image into sound for instance, without making any modifications to the original data.
The fact that the data is not modified in any way, merely expressed in a different way, might go some way to explaining why, in a significant number of data visualisation works, there seems to be a desire to avoid creating information and simply present the viewer with the data and, by extension, connect them to the physical world. [4] This is not to say that they do not create information, even in those works that expressed claim unmediated access information has a habit of creeping in through the selection of data and design of the interface. This is not unlike the situation in photography where, on the one had, the photograph might appear to offer an objective view of the world, but is in actual fact a reflection on the photographer and their decisions – those of framing, lighting and so on.
Returning to Peirce’s original writings on the index, it is clear that something might be considered indexical either by virtue of being an imprint or a pointer. In a paper on the relationship between indexicality and data visualisation, it is the latter definition that Tom Schofield and his collaborators draw on when they write, “[…] we consider as our index first that which has contiguity to the physical world or is at least perceived to have.” [5] [my emphasis] In this they remain faithful to Peirce’s original definition. However, as I will discuss on the section on social media, such a broadened definition may be somewhat problematic.
Sensor Data
It is easier to see the indexical connection if the data used to create a visualisation is taken from some kind of sensor. For instance, the project Wind Map (2012), by Fernanda Viegas and Martin Wattenberg, [6] uses data from the US National Digital Forecast Database (NDFD) [7] to visualise patterns of wind across the continental United States. Data from the NDFD is polled every hour in order to build up a visualisation fo the wind speed and direction of wind currently blowing across the country. It is, as the artists say, intended to give a sense of the wind’s physical force and presence in an easy to read way. [8]
In this case, the data is drawn from a sensor that measures something physical. It is not entirely clear from the project’s own website how this data is gathered, but some investigation into the methods employed by the National Oceanic and Atmospheric Administration (NOAA) suggests the use of both anemometers [9] and satellite telemetry. [10] In the case of the anemometer we can say that the wind acts directly upon the sensor, not entirely unlike the wind acting on Peirce’s weathervane.
Viegas and Wattenberg are at pains to point out that Wind Map, although it creates information, is not intended as a scientific visualisation tool; and furthermore that they have no ways of guaranteeing the accuracy of the data it uses. [11] In fact, their 2012 Eyeo festival talk makes clear that some of the datasets available through the NDFD were ‘bad’ datasets, one that showed the wind for the whole world in fact was missing the whole equator region. [12]
The data coming into Wind Map is just a string of numbers and it is difficult for a string of numbers, no matter where they come from or what their claim to indexicality might be, to elicit any sense of connection to the physical world. Wind Map is deliberately designed in a manner that mimics, to a degree, the actions of wind. The lines that traverse the map of the United States are in motion, reminiscent of the way the wind causes objects to move; their swirling recalls weather map and picture book illustrations of wind. The thickness of the lines evokes a feeling of intensity, which we can relate to the physical impact of strong wind. The artists have shown a number of different visualisation techniques they tried before settling on the one used in the final piece [13] and it is clear from those that mapping the same data in a different way has a very different effect on the viewer’s perception. All the decisions they have made support the data to strengthen the work’s sense of connection to the physical world: the iconical supports the indexical by giving it context.
Schofield et al. further suggest that temporality, in concert with the design discussed above, plays a key role in Wind Map‘s ability to form a strong connection to the forces visualised. [14] In other words the knowledge that the data is live emphasises its connection to the physical. We know from Peirce that the index is, generally, temporally bound, but we also know that this condition doesn’t operate in photography because of its iconical dimension. [15] Might the same be true of data visualisation?
Social Media Data
In the case of Wind Map, the data driving it has a strong correlation with events in the physical world because it was gathered directly from it. But what of social media data? Following Schofield and his colleagues’ definition, social media could too said to be said to have contiguity with the physical world, as its users respond to events in the physical world, but is this really enough to make a work indexical?
Mitchell Whitelaw has described artworks such as The Dumpster (2006, Golan Levin with Kamal Nigam and Jonathan Feinberg) and We Feel Fine (2006, Jonathan Harris and Sep Kamvar) as “indexical”, though it is not entirely clear if he had Peircian semiotics in mind at the time. [16] Nevertheless, both can be perceived has having a strong correlation with events in the physical world; and so can fit the above definition of indexicality.
Both these works operate along very similar lines: both are concerned with visualising human emotions in some way and both present themselves as interfaces to these emotions. The Dumpster uses a fixed, pre-analysed data set of around 20, 000 blog posts focusing on romantic breakups. [17] We Feel Fine is far less specific, harvesting feelings not tied to any one event and cross-referencing them with other collected data such as the blogger’s age, gender and the weather at their location. [18]
The language used to describe these works suggests that they are concerned with drawing from reality, the authors downplaying their own roles in the work. [19] Writing about We Feel Fine, Jonathan Harris even goes as far as to claim that it is “authored by everyone.” [20] The Dumpster is “visualisation of romantic breakups”; ” a slice through the lives of American teenagers”; [21] [22] We Feel Fine “a database of human feelings.” [23] Although the interface creates information, by assigning colour to an emotion based on whether it is ‘happy’ or ‘sad’ (We Feel Fine), or to a ‘breakup’ depending how similar it is to another ‘breakup’ (The Dumpster), the intention is to open up the data and make it visible. The implication here is that the works are built from, or open a window onto, the events feelings described by the data. Neither of these works, however, are a direct line to human emotion, only to the words used to express that emotion. The indexicality here is predicated on a long chain of highly mediated and symbolic signification – not on a direct measurement of a physical quality.
Both give the impression of looking beyond the data to the feeling and the people who lie beyond, much in the way that one might look though a photograph, and in this sense they present themselves as “index[es] of reality.” [24] However, much like the decisions of a photographer are reflected in their selection of a particular place and time to photograph, the data sets are too reflective not only of the selections made by the artists by the algorithms used to collect the data.
For example, We Feel Fine looks for blog posts that contain the phrases “I feel” or “I am feeling”, so it may well pick up a Tweet that says “I feel happy”, but not “I am happy.” Emotions that are not expressed in a very particular way are excluded from the data set. Artists, researchers, and social media companies all shape those data sets; in choosing what to include and what to exclude. As such, the data set can never be completely objective, and it is somewhat naïve to view any data visualisation (including those made from sensor data) as unmediated and objective. [25] It is also important to bear in mind that no social data set can ever reflect any more than the users of the social network site sampled. Twitter, despite the way some journalists portray it, is not reflective of the public as a whole, only the users of Twitter – the same could be said of data from Facebook, Flickr, Instagram, or any other social network. [26] Likewise, we do not know how the social networks sampled the data available on the API in the first place (was it the first thousand posts every hour? the most popular posts every day?) nor do we know how much of the data the artists have had access to. [27]
On top of that, computers have a great deal of trouble understanding the language of humans – natural language as it is referred to in computer science circles – because it is full of nuances that are often difficult for a computer to detect and can lead to false positives in the data set. Whitelaw points to the following example, which was identified by the We Feel Fine software as “better”, “I just start to have these looming feelings of inadequacy and fear that in a year, I will be no better off and have nothing else to offer the professional world.” [28] A human reader will easily understand that this statement, although it contains the word “better”, is in fact a negative one, but clearly the computer algorithm could not make the distinction. When these mistakes occur, they might be viewed as an index, not of the emotion of the person from whom the statement originates, but of the action of the algorithm on the data set. [29]
Locating Indexicality
The key difference to note between these two types of data is where they locate indexicality. In the case of Wind Map, the index is a measure of something physical. In The Dumpster and We Feel Fine, the index is located with the body of the blogger. Although the latter relies on largely symbolic expressions of emotions and events, the assumption is that the person who originally wrote it was physically present to experience what that words (symbols) refer to. Again, this has certain similarities with photography, if we think of the photograph not just as a document of an event – or an index by virtue of the imprint of light – but a testament to the physical presence of the photographer [30]: the witness to the event. [31] Crucially, they all point to something that exists outside of the work itself, much in the same way that the photograph is always a partial document that points to something outside of its frame. [32]
Both of these types of data, which in the works discussed above are also examples of ‘Big Data’, can create feelings of connection to the physical world, but these are highly mediated connections and not a “view from nowhere”. [33] In developing data visualisation artworks (or any other artwork that incorporate data in some way) it is important to consider exactly what it is the work is to refer to – the physical world or to the humans who inhabit the physical world. Rather than trying to claim an unmediated connection, artists need to be attentive not only to how their data sets are shaped but how their design decisions lead the viewer to particular conclusions.
It goes without saying that a large part of all these visualisation is based on trust – we have no way of knowing if the data feeding Wind Map is accurate and we have no way of knowing if the breakups and emotions referred to by The Dumpster and This is Now actually occurred. The only way it is possible to know for sure how data was gathered, what it was decided to include or exclude is for the artist (or researcher) to gather it themselves.
Notes
[1] Catherine M. Weir, “Digital Indexicality Beyond Photography” [Online], 19 November 2014, http://www.cmweir.com/digital-indexicality-beyond-photography (accessed 4 February 2015).
[2] Craig Mod, “Goodbye Cameras” [Online], The New Yorker, 29 December 2013, http://www.newyorker.com/tech/elements/goodbye-cameras (accessed 4 November 2014).
[3] Lev Manovich, “Data Visualization as New Abstraction and the Anti-Sublime” [Online], 2002, http://manovich.net/content/04-projects/038-data-visualisation-as-new-abstraction-and-anti-sublime/37_article_2002.pdf (accessed 17 December 2014).
[4] Mitchell Whitelaw, “Art Against Information: Case Studies in Data Practice,” The Fibreculture Journal 11, 2008, http://eleven.fibreculturejournal.org/fcj-067-art-against-information-case-studies-in-data-practice/ (accessed 17 December 2014).
[5] Tom Schofield, Marian Dörk, and Martyn Dade-Robertson, “Indexicality and Visualization: Exploring Analogies with Art, Cinema and Photography,” in Proceedings of the 9th ACM Conference on Creativity and Cognition, New York (2013): 179.
[6] Fernanda Viegas and Martin Wattenberg, “Wind Map” (online data visualisation, 2012) http://hint.fm/wind/index.html (accessed 4 February 2015).
[7] National Oceanic and Atmospheric Administration, “National Digital Forecast Database” [online], http://www.nws.noaa.gov/ndfd/ (accessed 4 February 2015).
[8] Fernanda Viegas and Martin Wattenberg, “Wind Map” [Online], http://hint.fm/projects/wind/ (accessed 4 February 2015).
[9] A device for measuring wind speed.
[10] See: National Oceanic and Atmospheric Administration, “What is Measured” [Online], http://www.ncdc.noaa.gov/crn/elements.html#ws (accessed 4 February 2015) and National Oceanic and Atmospheric Administration, “Satellite and Information Service” [Online], http://www.nesdis.noaa.gov/index.html (accessed 4 February 2015).
[11] Fernanda Viegas and Martin Wattenberg, “Wind Map.”
[12] Eyeo Festival Vimeo Channel, “Eyeo2012 – Viegas and Wattenberg” [Online], http://vimeo.com/48625144 (accessed 4 February 2015).
[13] Ibid.
[14] Tom Schofield, Marian Dörk, and Martyn Dade-Robertson, “Indexicality and Visualization: Exploring Analogies with Art, Cinema and Photography,” 180.
[15] Marie Shurkus, “Camera Lucida and Affect: Beyond Representation,” Photographies 7, 1 (2014): 67 – 83.
[16] Mitchell Whitelaw, “Art Against Information: Case Studies in Data Practice.”
[17] The Dumpster, “About the Project” [Online], http://artport.whitney.org/commissions/thedumpster/about.html (accessed 4 February 2015).
[18] Jonathan Harris, “We Feel Fine” [Online] http://www.number27.org/wefeelfine (accessed 4 February 2015).
[19] Mitchell Whitelaw, “Art Against Information: Case Studies in Data Practice.”
[20] Ibid.
[21] The Dumpster,” About the Project.
[22] This latter claim is somewhat dubious given that the project’s own website says “at least half” of these posting are from American teenagers between the ages of 13 and 19, which implies that around half may very well not be. It also disproportionately reflects the experiences of female bloggers, with “approximately seventy per cent of the bloggers identified as female, while roughly fifteen per cent were identified as male. See: The Dumpster, “About the Project.”
[23] Jonathan Harris, “We Feel Fine.”
[24] Mitchell Whitelaw, “Art Against Information: Case Studies in Data Practice.”
[25] danah boyd and Kate Crawford, “Six Provocations for Big Data,” (paper presented at A Decade in Internet Time: Symposium on the Dynamics of Internet and Societies, Oxford, UK, 2011), http://ssrn.com/abstract=1926431 (accessed 11 December 2014).
[26] Ibid.
[27] Ibid.
[28] Mitchell Whitelaw, “Art Against Information: Case Studies in Data Practice.”
[29] Braxton Soderman, “The Index and the Algorithm” Differences 18, 1 (2007): 153 – 186.
[30] Clearly this is complicated by what might be termed ‘remote photography’: photographs of a location where the photographer is not physically present, such as photographs taken via webcam or drone.
[31] Marie Shurkus, “Camera Lucida and Affect: Beyond Representation.”
[32] Ingrid Hölzl, “Blast Off Photography: Nancy Davenport and Expanded Photography,” History of Photography 35, 1 (2011): 33 – 43.
[33] Nathan Jurgenson, “View from Nowhere” [Online], The New Inquiry, 9 September 2014, http://thenewinquiry.com/essays/view-from-nowhere (accessed 11 December 2014).