Typicality as a measure for analysis of geo-social media data

spatial distribution of emojis from sunset- and sunrise-related Instagram posts representing the user’s physical surrounding (montane, urban and maritime environments), determined by typicality. (b) respective geographical conditions (elevation, urban zones, large lakes and coastline) in corresponding colours.

Objective:

The aim of this master thesis is to investigate the measure of typicality. This measure was developed in the context of a research project that dealt with the analysis of geo-social media data and was so far only applied in this context as well as in two master theses. Since the measure of typicality delivered conclusive and promising results, but at the same time also raised questions, its application in terms of usage, potential and limitations is now to be deepened in a number of ways.
Emoji clouds of frequency- and typicality-based top 20 emojis occurring in sunset- and sunrise-related Instagram posts.

Description:

When analyzing data from geo-social media, all researchers must ask themselves what measure to use to determine patterns, manifestations or peaks. The use of absolute or relative frequency usually highlights only the most frequently used hashtags, emojis, terms or the like, which are rarely the most characteristic ones for the subject of study. The typicality measure is able to achieve this. It is used for normalization and removal of bias and can thus highlight relative meanings. For this purpose, two relative frequencies are combined. The application requires at least one sub-dataset from a total dataset, which can be formed in different ways (e.g. temporally, thematically, spatially) and any occurrence can be subject of the measure, so not just emojis or hashtags.

There are also other approaches to measure normalization and relative differences, such as tf-idf or chi-square test, but their calculation is comparatively complex and partly bound to preconditions. In this respect, typicality is a straightforward and easily comprehensible measure. Nevertheless, the meaningfulness of typicality depends on the properties of the dataset used and on how the use of this measure is tailored to it.

With the help of several thematically different datasets from geo-social media, exactly this tailoring is to be investigated, in particular the exclusion of hashtags or emojis with a very low absolute frequency. For this purpose, a pre-selection of the most frequently used hashtags, emojis or similar is necessary, as the inclusion of only rarely used ones can yield high typicality values, even though such hashtags or emojis cannot be considered typical within a dataset, but rather non-significant. The aim is to develop a standardised procedure for determining the threshold value of the pre-selection. This threshold is influenced, among other things, by the size of the total and sub-dataset, but also by the diversity of hashtags or emojis contained. Another question addresses the effect of the extent of the total dataset on the typicality results. In other words, must the entire available dataset always be used as the total dataset or might even smaller spatial or temporal units be more appropriate? Selecting total datasets of various granularities, along with a consistent sub-dataset could help identify local and global hotspots. Further objects of investigation are appropriate visualisation methods for typicality, particularly in a spatial context as well as how to handle the range of possible typicality values from -1 to +infinity by scaling.

Staff involved: Eva Hauthal, Sagnik Mukherjee

References:

  • Hauthal E., Burghardt D. & Dunkel A. (2021): Emojis as Contextual Indicants in Location-Based Social Media Posts. International Journal of Geo-Information 10 (6), Special Issue: Social Computing for Geographic Information Science.

  • Hauthal E., Mukherjee S. & Burghardt D. (2022): The typicality measure as a novel tool for normalising geo-social media data. Abstracts of the International Cartographic Association 5: 72.

    Mukherjee S., Hauthal E. & Burghardt D. (2022): Analyzing the EU Migration Crisis as Reflected on Twitter. KN - Journal of Cartography and Geographic Information 72: 213 - 228.

  • Levi S. (2022): Emojis as Indicators of Spatial-Temporal-Thematic Developments in Geo-Social Media. Master Thesis, TU Dresden.

Domain(s):

Study Program(s):

  • MSc. Cartography (EXCLUSIVELY externally advertised)