Supplementary materials for the publication "From sunrise to sunset: Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social media" (PLOS)

datacite.FundingReference.funderName
datacite.FundingReference.funderName

Deutsche Forschungsgemeinschaft

Contributing person
datacite.contributor.ProjectLeader

Burghardt, Dirk (orcid: 0000-0003-2949-4887)

datacite.description.TableOfContents
datacite.description.TableOfContents

S10 Dataset (S10.zip): Anonymized data (CSV File with HLL sets) to reproduce results using the code in Jupyter Notebooks (S1-S9). flickr_sunrise_hll.csv Observed Frequencies Usercount/Sunrise for Flickr (HLL) (2.83 MB) flickr_sunset_hll.csv Observed Frequencies Usercount/Sunset for Flickr (HLL) (2.91 MB) flickr_all_hll.csv Expected Frequencies Usercount for Flickr (HLL) (19.69 MB) instagram_sunrise_hll.csv Observed Frequencies Usercount/Sunrise for Instagram (HLL) (4.14 MB) instagram_sunset_hll.csv Observed Frequencies Usercount/Sunset for Instagram (HLL) (9.10 MB) instagram_random_hll.csv Expected Frequencies Usercount for Instagram (HLL) (19.51 MB) 2020-04-07_Flickr_Sunrise_World_CCBy.csv Flickr geotagged Creative Commons Sample Photos (Metadata) for Sunrise (7.31 MB) 2020-04-07_Flickr_Sunset_World_CCBy.csv Flickr geotagged Creative Commons Sample Photos (Metadata) for Sunset (25.0 MB) 20210202_FLICKR_SUNSET_random_­country_tf_idf.csv TF-IDF Scores for Flickr Sunset (59 KB) 20211029_FLICKR_SUNSET_random_country­_cosine_similarity_binary.csv Binary Cosine Similarity for Flickr Sunset (1.0 MB) flickr_sunset_terms_country.csv Flickr Sunset User Terms grouped by distinct Country (su_a3 Code) (151 MB) flickr-sunrise-months.csv Flickr Sunrise (postcount) HLL data for each month. (0.36 MB) flickr-sunset-months.csv Flickr Sunset (postcount) HLL data for each month. (0.36 MB) flickr-terms.csv Flickr Postcount (HLL) per search term (46.8 KB) instagram-terms.csv Instagram Postcount (HLL) per search term (49.6 KB) flickr-all.csv Flickr total Postcount (HLL) (2.54 KB) This repository contains a series of nine notebooks (release_v1.0.0.zip): S1: the grid aggregation notebook (01_gridagg.ipynb) is used to aggregate data from HLL sets at GeoHash 5 to a 100x100km grid S2: the visualization notebook (02_visualization.ipynb) is used to create interactive maps, with additional information shown on hover S3: the chimaps notebook (03_chimaps.ipynb) shows how to compute the chi square test per bin and event (sunset/sunrise). S4: the results notebook (04_combine.ipynb) shows how to combine results from sunset/sunrise into a single interactive map. S5-S9: Notebooks 5 to 9 are used for creating additional graphics and statistics. S1 Jupyter Notebook: 01_grid_agg.html S2 Jupyter Notebook: 02_visualization.html S3 Jupyter Notebook: 03_chimaps.html S4 Jupyter Notebook: 04_combine.html S5 Jupyter Notebook: 05_countries.html S6 Jupyter Notebook: 06_semantics.html S7 Jupyter Notebook: 07_time.html S8 Jupyter Notebook: 08_relationships.html S9 Jupyter Notebook: 09_statistics.html

Documentation of the data
datacite.description.TechnicalInfo

Resource Type: The data contains generalized information on people's public responses to the sunset and sunrise from Social Media. On data query time, the data has been statistically abstracted using the Probabilistic Data Structure (PDS) HyperLogLog (HLL). HLL estimates the number of distinct items in a set by an irreversible approximation, preventing identification of individual users from collected data and significantly improving data processing performance. Methods: We use a workflow based on HyperLogLog (HLL) that was first demonstrated by Dunkel et al. [1], studying user frequency of worldwide Flickr posts and quantifying the effects on privacy. HLL allowed us to reduce the data collection footprint to quantitative measurements early in the process. Consequently, the study illustrated here can be repeated without the need to store raw data, providing both performance and privacy benefits [1]. All quantities available through this data repository and reported in the paper are estimates, with guaranteed error bounds of ±2.30% [1]. Data Processing: Throughout the project, we used the HLL functions union, intersection and cardinality estimation to generate our results shared in the publication. The initial data is reduced to a coarser ‘data collection granularity’ based on the HLL union function, which is sufficient for worldwide analysis. For coordinates, this means that we ‘snap’ points to a grid using a GeoHash of 5 (see [2]), referring to an average aggregation distance of about four kilometers. Similarly, to explore temporal distributions, dates are grouped to distinct months and years. Distinct terms are selected from the post body, the post title and tags or hashtags , and used to explore associated semantics (what). From this initial data collection, measures are stepwise aggregated (1) to a 100x100 km grid, (2) country, and (3) worldwide levels. We chose a 100 km resolution as a balance for the worldwide analysis, after testing with both 50 km and 200 km. Notebooks (S1–S9) allow for exploration of results for arbitrary resolutions and extents. The count of unique elements (i.e. the estimated number of users) are used for visualizing relationships. We chose to use the signed chi value to capture over and under representation of sunset and sunrise, with respect to the overall use of social media, rather than visualizing absolute counts [47–48,49 p156]. We use a spatial formulation of signed chi values as proposed in an exploratory analysis of social media by Clarke, Wood, Dykes & Slingsby [3]. Finally, we explore semantic patterns based on ranked terms for each country using term-frequency inverse document-frequency (TF-IDF) and binary cosine similarity to compare semantics between countries. [1]: Dunkel A, Löchner M, Burghardt D. Privacy-aware visualization of volunteered geographic information (VGI) to analyze spatial activity: a benchmark implementation. ISPRS Int J Geo-Information. 2020;9. doi:10.3390/ijgi9100607 [2]: Ruppel P, Küpper A. Geocookie: A space-efficient representation of geographic location sets. J Inf Process. 2014;22: 418–424. doi:10.2197/ipsjjip.22.418 [3]: Clarke K, Wood J, Dykes J, Slingsby A. Interactive Visual Exploration of a Large Spatio-Temporal Dataset: Reflections on a Geovisualization Mashup. IEEE Trans Vis Comput Graph. 2007;13: 1176–1183.

References to related material
datacite.relatedItem.References

https://gitlab.vgiscience.de/lbsn/tools/jupyterlab

References to related material
datacite.relatedItem.References

doi:10.3390/ijgi9100607

References to related material
datacite.relatedItem.References

https://gitlab.vgiscience.de/ad/sunset-sunrise-paper

References to related material
datacite.relatedItem.References

doi:10.2197/ipsjjip.22.418

Description of the data
datacite.resourceType

The workflow to load and process the data provided is available in Jupyter Notebooks or the respective HTML conversions of notebooks.

Type of the data
datacite.resourceTypeGeneral

Other

Type of the data
datacite.resourceTypeGeneral

Software

Type of the data
datacite.resourceTypeGeneral

Dataset

Total size of the dataset
datacite.size

287927171

Author
dc.contributor.author

Dunkel, Alexander

Author
dc.contributor.author

Burghardt, Dirk

Author
dc.contributor.author

Hartmann, Maximilian

Author
dc.contributor.author

Ross, Purves

Author
dc.contributor.author

Eva, Hauthal

Upload date
dc.date.accessioned

2023-01-18T16:18:44Z

Publication date
dc.date.available

2023-01-18T16:18:44Z

Publication date
dc.date.available

2026-06-10T15:43:48Z

Data of data creation
dc.date.created

2017-2022

Publication date
dc.date.issued

2023-01-18

Abstract of the dataset
dc.description.abstract

Events profoundly influence human-environment interactions. Through repetition, some events manifest and amplify collective behavioral traits, which significantly affects landscapes and their use, meaning, and value. However, the majority of research on reaction to events focuses on case studies, based on spatial subsets of data. This makes it difficult to put observations into context and to isolate sources of noise or bias found in data. As a result, inclusion of perceived aesthetic values, for example, in cultural ecosystem services, as a means to protect and develop landscapes, remains problematic. In this work, we focus on human behavior worldwide by exploring global reactions to sunset and sunrise using two datasets collected from Instagram and Flickr. By focusing on the consistency and reproducibility of results across these datasets, our goal is to contribute to the development of more robust methods for identifying landscape preference using geo-social media data, while also exploring motivations for photographing these particular events. Based on a four facet context model, reactions to sunset and sunrise are explored for Where, Who, What, and When. We further compare reactions across different groups, with the aim of quantifying differences in behavior and information spread. Our results suggest that a balanced assessment of landscape preference across different regions and datasets is possible, which strengthens representativity and exploring the How and Why in particular event contexts. The process of analysis is fully documented, allowing transparent replication and adoption to other events or datasets. The data encompasses both code (jupyter notebooks) and data (abstracted using hyperloglog). Please see the git repository for any further information: https://gitlab.vgiscience.de/ad/sunset-sunrise-paper

Public reference to this page
dc.identifier.uri

https://opara.zih.tu-dresden.de/handle/123456789/2625

Public reference to this page
dc.identifier.uri

https://doi.org/10.25532/OPARA-200

dc.language
dc.language

eng

Publisher
dc.publisher

Technische Universität Dresden

Licence
dc.rights

Attribution-NonCommercial 4.0 International

URI of the licence text
dc.rights.uri

http://creativecommons.org/licenses/by-nc/4.0/

Specification of the discipline(s)
dc.subject.classification

1::12::111

Specification of the discipline(s)
dc.subject.classification

4::44::409

Specification of the discipline(s)
dc.subject.classification

3::34

Specification of the discipline(s)
dc.subject.classification

4::44::408

Specification of the discipline(s)
dc.subject.classification

1::12

Specification of the discipline(s)
dc.subject.classification

4::44::409::409-02

Specification of the discipline(s)
dc.subject.classification

1::12::110

Title of the dataset
dc.title

Supplementary materials for the publication "From sunrise to sunset: Exploring landscape preference through global reactions to ephemeral events captured in georeferenced social media" (PLOS)

dc.title.alternative
dc.title.alternative

Sunset and Sunrise grid aggregation & chi

Research instruments
opara.descriptionInstrument

Carto-Lab Docker v0.9.0 (https://gitlab.vgiscience.de/lbsn/tools/jupyterlab)

Underlying research object
opara.descriptionObject.Other

Public online reactions to the sunset and sunrise.

Software
opara.descriptionSoftware.Other

JupyterLab (Version v3.5.0)

Software
opara.descriptionSoftware.Other

Postgres (Version 13)

Software
opara.descriptionSoftware.Other

Jupytext (Version 1.14.0)

Software
opara.descriptionSoftware.ResourceProcessing

mapclassify (Version 2.3.0)

Software
opara.descriptionSoftware.ResourceProcessing

ipython (Version 7.18.1)

Software
opara.descriptionSoftware.ResourceProcessing

Cartopy (Version 0.18.0)

Software
opara.descriptionSoftware.ResourceProcessing

holoviews (Version 1.13.4)

Software
opara.descriptionSoftware.ResourceProcessing

numpy (Version 1.19.1)

Software
opara.descriptionSoftware.ResourceProcessing

matplotlib (Version 3.3.2)

Software
opara.descriptionSoftware.ResourceProcessing

geoviews (Version 1.8.1)

Software
opara.descriptionSoftware.ResourceProcessing

bokeh (Version 2.2.1)

Software
opara.descriptionSoftware.ResourceProcessing

geopandas (Version 0.8.1)

Software
opara.descriptionSoftware.ResourceProcessing

Fiona (Version 1.8.17)

Software
opara.descriptionSoftware.ResourceProcessing

Shapely (Version 1.7.1)

Software
opara.descriptionSoftware.ResourceProcessing

Python (Version 3.7)

Software
opara.descriptionSoftware.ResourceProcessing

pandas (Version 1.1.2)

Software
opara.descriptionSoftware.ResourceProcessing

xarray (Version 0.16.1)

Software
opara.descriptionSoftware.ResourceProcessing

pyproj (Version 2.6.1.post1)

Software
opara.descriptionSoftware.ResourceProduction

postgresql-hll (Version v2.14)

Project abstract
opara.project.description

"Geovisual analysis of VGI for understanding people’s behaviour in relation to multi-faceted context" Volunteered Geographic Information (VGI) in the form of actively and passively generated spatial content offers extensive potential for a wide range of applications. Realising this potential however requires methods which take account of the specific properties of such data, for example its heterogeneity, quality, subjectivity, spatial resolution and temporal relevance. The creation and production of such content through social media platforms is an expression of human behaviour, and as such influenced strongly by events and external context. In this project we will develop geovisual analysis methods which show how actors interact in LBSM, and how their interactions influence, and are influenced by, their physical and social environment and relations.

Project title
opara.project.title

EvaVGI

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
S10.zip
Size:
87.14 MB
Format:
Description:
S10 Dataset: Anonymized data (CSV File with HLL sets) to reproduce results using the code in Jupyter Notebooks (S1-S9).
Loading...
Thumbnail Image
Name:
release_v1.0.0.zip
Size:
187.45 MB
Format:
Description:
Release file of the git repository, including Notebooks (.ipynb), code (.py) and results (figures, html, pdf, svg).
Attribution-NonCommercial 4.0 International