The OPARA service was recently upgraded to a new technical platform. You are visiting the outdated OPARA website. Please use https://opara.zih.tu-dresden.de/ for new data submissions. Previously stored data will be migrated in near future and then the old version of OPARA will finally be shut down. Existing DOIs for data publications remain valid.
Supplementary materials for the publication "From sunrise to sunset: Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social media"
Subtitle: Sunset and Sunrise grid aggregation & chi
Metadaten
Ergänzende Titel | Subtitle: Sunset and Sunrise grid aggregation & chi | |
Weitere mitwirkende Personen, Institutionen oder Unternehmen | dfg - Funder | |
Weitere mitwirkende Personen, Institutionen oder Unternehmen | Burghardt, Dirk (orcid: 0000-0003-2949-4887) - ProjectLeader | |
Für den Inhalt der Forschungsdaten verantwortliche Person(en) | Dunkel, Alexander (ORCID: 0000-0003-1157-7967) | |
Für den Inhalt der Forschungsdaten verantwortliche Person(en) | Burghardt, Dirk (ORCID: 0000-0003-2949-4887) | |
Für den Inhalt der Forschungsdaten verantwortliche Person(en) | Hartmann, Maximilian | |
Für den Inhalt der Forschungsdaten verantwortliche Person(en) | Ross, Purves | |
Für den Inhalt der Forschungsdaten verantwortliche Person(en) | Eva, Hauthal (ORCID: 0000-0001-8917-600X) | |
Beschreibung der weiteren Datenverarbeitung | Throughout the project, we used the HLL functions union, intersection and cardinality estimation to generate our results shared in the publication. The initial data is reduced to a coarser ‘data collection granularity’ based on the HLL union function, which is sufficient for worldwide analysis. For coordinates, this means that we ‘snap’ points to a grid using a GeoHash of 5 (see [2]), referring to an average aggregation distance of about four kilometers. Similarly, to explore temporal distributions, dates are grouped to distinct months and years. Distinct terms are selected from the post body, the post title and tags or hashtags , and used to explore associated semantics (what). From this initial data collection, measures are stepwise aggregated (1) to a 100x100 km grid, (2) country, and (3) worldwide levels. We chose a 100 km resolution as a balance for the worldwide analysis, after testing with both 50 km and 200 km. Notebooks (S1–S9) allow for exploration of results for arbitrary resolutions and extents. The count of unique elements (i.e. the estimated number of users) are used for visualizing relationships. We chose to use the signed chi value to capture over and under representation of sunset and sunrise, with respect to the overall use of social media, rather than visualizing absolute counts [47–48,49 p156]. We use a spatial formulation of signed chi values as proposed in an exploratory analysis of social media by Clarke, Wood, Dykes & Slingsby [3]. Finally, we explore semantic patterns based on ranked terms for each country using term-frequency inverse document-frequency (TF-IDF) and binary cosine similarity to compare semantics between countries. [2]: Ruppel P, Küpper A. Geocookie: A space-efficient representation of geographic location sets. J Inf Process. 2014;22: 418–424. doi:10.2197/ipsjjip.22.418 [3]: Clarke K, Wood J, Dykes J, Slingsby A. Interactive Visual Exploration of a Large Spatio-Temporal Dataset: Reflections on a Geovisualization Mashup. IEEE Trans Vis Comput Graph. 2007;13: 1176–1183. | |
Art der Erhebung der Daten | Other: Data query from public Application Programming Interfaces (APIs). | |
Verwendete Forschungsinstrumente | Carto-Lab Docker v0.9.0 (https://gitlab.vgiscience.de/lbsn/tools/jupyterlab) | |
Zugrundeliegende Forschungsobjekte | Other: Public online reactions to the sunset and sunrise. | |
Kurzbeschreibung | Events profoundly influence human-environment interactions. Through repetition, some events manifest and amplify collective behavioral traits, which significantly affects landscapes and their use, meaning, and value. However, the majority of research on reaction to events focuses on case studies, based on spatial subsets of data. This makes it difficult to put observations into context and to isolate sources of noise or bias found in data. As a result, inclusion of perceived aesthetic values, for example, in cultural ecosystem services, as a means to protect and develop landscapes, remains problematic. In this work, we focus on human behavior worldwide by exploring global reactions to sunset and sunrise using two datasets collected from Instagram and Flickr. By focusing on the consistency and reproducibility of results across these datasets, our goal is to contribute to the development of more robust methods for identifying landscape preference using geo-social media data, while also exploring motivations for photographing these particular events. Based on a four facet context model, reactions to sunset and sunrise are explored for Where, Who, What, and When. We further compare reactions across different groups, with the aim of quantifying differences in behavior and information spread. Our results suggest that a balanced assessment of landscape preference across different regions and datasets is possible, which strengthens representativity and exploring the How and Why in particular event contexts. The process of analysis is fully documented, allowing transparent replication and adoption to other events or datasets. The data encompasses both code (jupyter notebooks) and data (abstracted using hyperloglog). Please see the git repository for any further information: https://gitlab.vgiscience.de/ad/sunset-sunrise-paper | |
Angewendete Methoden oder Verfahren | We use a workflow based on HyperLogLog (HLL) that was first demonstrated by Dunkel et al. [1], studying user frequency of worldwide Flickr posts and quantifying the effects on privacy. HLL allowed us to reduce the data collection footprint to quantitative measurements early in the process. Consequently, the study illustrated here can be repeated without the need to store raw data, providing both performance and privacy benefits [1]. All quantities available through this data repository and reported in the paper are estimates, with guaranteed error bounds of ±2.30% [1]. [1]: Dunkel A, Löchner M, Burghardt D. Privacy-aware visualization of volunteered geographic information (VGI) to analyze spatial activity: a benchmark implementation. ISPRS Int J Geo-Information. 2020;9. doi:10.3390/ijgi9100607 | |
Weitere erklärende Angaben zu den Daten | The workflow to load and process the data provided is available in Jupyter Notebooks or the respective HTML conversions of notebooks. | |
Inhaltsverzeichnis | S10 Dataset (S10.zip): Anonymized data (CSV File with HLL sets) to reproduce results using the code in Jupyter Notebooks (S1-S9). flickr_sunrise_hll.csv Observed Frequencies Usercount/Sunrise for Flickr (HLL) (2.83 MB) flickr_sunset_hll.csv Observed Frequencies Usercount/Sunset for Flickr (HLL) (2.91 MB) flickr_all_hll.csv Expected Frequencies Usercount for Flickr (HLL) (19.69 MB) instagram_sunrise_hll.csv Observed Frequencies Usercount/Sunrise for Instagram (HLL) (4.14 MB) instagram_sunset_hll.csv Observed Frequencies Usercount/Sunset for Instagram (HLL) (9.10 MB) instagram_random_hll.csv Expected Frequencies Usercount for Instagram (HLL) (19.51 MB) 2020-04-07_Flickr_Sunrise_World_CCBy.csv Flickr geotagged Creative Commons Sample Photos (Metadata) for Sunrise (7.31 MB) 2020-04-07_Flickr_Sunset_World_CCBy.csv Flickr geotagged Creative Commons Sample Photos (Metadata) for Sunset (25.0 MB) 20210202_FLICKR_SUNSET_random_country_tf_idf.csv TF-IDF Scores for Flickr Sunset (59 KB) 20211029_FLICKR_SUNSET_random_country_cosine_similarity_binary.csv Binary Cosine Similarity for Flickr Sunset (1.0 MB) flickr_sunset_terms_country.csv Flickr Sunset User Terms grouped by distinct Country (su_a3 Code) (151 MB) flickr-sunrise-months.csv Flickr Sunrise (postcount) HLL data for each month. (0.36 MB) flickr-sunset-months.csv Flickr Sunset (postcount) HLL data for each month. (0.36 MB) flickr-terms.csv Flickr Postcount (HLL) per search term (46.8 KB) instagram-terms.csv Instagram Postcount (HLL) per search term (49.6 KB) flickr-all.csv Flickr total Postcount (HLL) (2.54 KB) This repository contains a series of nine notebooks (release_v1.0.0.zip): S1: the grid aggregation notebook (01_gridagg.ipynb) is used to aggregate data from HLL sets at GeoHash 5 to a 100x100km grid S2: the visualization notebook (02_visualization.ipynb) is used to create interactive maps, with additional information shown on hover S3: the chimaps notebook (03_chimaps.ipynb) shows how to compute the chi square test per bin and event (sunset/sunrise). S4: the results notebook (04_combine.ipynb) shows how to combine results from sunset/sunrise into a single interactive map. S5-S9: Notebooks 5 to 9 are used for creating additional graphics and statistics. S1 Jupyter Notebook: 01_grid_agg.html S2 Jupyter Notebook: 02_visualization.html S3 Jupyter Notebook: 03_chimaps.html S4 Jupyter Notebook: 04_combine.html S5 Jupyter Notebook: 05_countries.html S6 Jupyter Notebook: 06_semantics.html S7 Jupyter Notebook: 07_time.html S8 Jupyter Notebook: 08_relationships.html S9 Jupyter Notebook: 09_statistics.html | |
Weitere Schlagwörter | data, hyperloglog, repository, code, jupyter | |
Sprache | eng | |
Entstehungsjahr oder Entstehungszeitraum | 2017-2022 | |
Veröffentlichungsjahr | 2023 | |
Herausgeber | Technische Universität Dresden | |
Referenzen auf ergänzende Materialien | IsPartOf: 123456789/5791 (Handle) | |
Referenzen auf ergänzende Materialien | References: doi:10.3390/ijgi9100607 (DOI) | |
Referenzen auf ergänzende Materialien | References: doi:10.2197/ipsjjip.22.418 (DOI) | |
Referenzen auf ergänzende Materialien | References: https://gitlab.vgiscience.de/lbsn/tools/jupyterlab (URL) | |
Referenzen auf ergänzende Materialien | References: https://gitlab.vgiscience.de/ad/sunset-sunrise-paper (URL) | |
Inhalt der Forschungsdaten | Dataset, Software, Workflow: The data contains generalized information on people's public responses to the sunset and sunrise from Social Media. On data query time, the data has been statistically abstracted using the Probabilistic Data Structure (PDS) HyperLogLog (HLL). HLL estimates the number of distinct items in a set by an irreversible approximation, preventing identification of individual users from collected data and significantly improving data processing performance. | |
Inhaber der Nutzungsrechte | Technische Universität Dresden | |
Nutzungsrechte des Datensatzes | CC-BY-NC-4.0 | |
Eingesetzte Software | Resource Processing: Python 3.7 | |
Eingesetzte Software | Resource Processing: xarray 0.16.1 | |
Eingesetzte Software | Resource Processing: Shapely 1.7.1 | |
Eingesetzte Software | Resource Processing: pyproj 2.6.1.post1 | |
Eingesetzte Software | Resource Processing: pandas 1.1.2 | |
Eingesetzte Software | Resource Processing: numpy 1.19.1 | |
Eingesetzte Software | Resource Processing: matplotlib 3.3.2 | |
Eingesetzte Software | Resource Processing: mapclassify 2.3.0 | |
Eingesetzte Software | Resource Processing: ipython 7.18.1 | |
Eingesetzte Software | Resource Processing: holoviews 1.13.4 | |
Eingesetzte Software | Resource Processing: geoviews 1.8.1 | |
Eingesetzte Software | Resource Processing: geopandas 0.8.1 | |
Eingesetzte Software | Resource Processing: Fiona 1.8.17 | |
Eingesetzte Software | Resource Processing: Cartopy 0.18.0 | |
Eingesetzte Software | Resource Processing: bokeh 2.2.1 | |
Eingesetzte Software | Other: Postgres 13 | |
Eingesetzte Software | Resource Production: postgresql-hll v2.14 | |
Eingesetzte Software | Other: Jupytext 1.14.0 | |
Eingesetzte Software | Other: JupyterLab v3.5.0 | |
Nähere Beschreibung der/s Fachgebiete/s | Social Cartography | |
Angabe der Fachgebiete | Geological Science | de |
Angabe der Fachgebiete | Computer Science | de |
Angabe der Fachgebiete | Information Technology | de |
Angabe der Fachgebiete | Psychology | de |
Angabe der Fachgebiete | Software Technology | de |
Angabe der Fachgebiete | Social Sciences | de |
Angabe der Fachgebiete | Environmental Science and Ecology | de |
Angabe der Fachgebiete | Behavioural Sciences | de |
Titel des Datensatzes | Supplementary materials for the publication "From sunrise to sunset: Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social media" |
Dateien zu dieser Ressource
Die Datenpakete erscheinen in:
-
Supporting Information: From sunrise to sunset - Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social media [1]
This collection contains Supporting Information for the publication "From sunrise to sunset - Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social media" (PLOS). Abstract: Events profoundly influence human-environment interactions. Through repetition, some events manifest and amplify collective behavioral traits, which significantly affects landscapes and their use, meaning, and value. However, the majority of research on reaction to events focuses on case studies, based on spatial subsets of data. This makes it difficult to put observations into context and to isolate sources of noise or bias found in data. As a result, inclusion of perceived aesthetic values, for example, in cultural ecosystem services, as a means to protect and develop landscapes, remains problematic. In this work, we focus on human behavior worldwide by exploring global reactions to sunset and sunrise using two datasets collected from Instagram and Flickr. By focusing on the consistency and reproducibility of results across these datasets, our goal is to contribute to the development of more robust methods for identifying landscape preference using geo-social media data, while also exploring motivations for photographing these particular events. Based on a four facet context model, reactions to sunset and sunrise are explored for Where, Who, What, and When. We further compare reactions across different groups, with the aim of quantifying differences in behavior and information spread. Our results suggest that a balanced assessment of landscape preference across different regions and datasets is possible, which strengthens representativity and exploring the How and Why in particular event contexts. The process of analysis is fully documented, allowing transparent replication and adoption to other events or datasets.