2025 WS: Michael Wiegand

22.09.2025

Creating a compliments dataset from social media

Compliments serve as social glue and expressions of politeness, which is especially valuable today, as interpersonal interactions, particularly online, are often becoming harsher. However, compliments can also be misunderstood and may sometimes lead to offense or irritation. They can be used strategically, for example, to set a friendly tone before making a request, and the speaker may not always be sincere. If a virtual assistant could recognize and handle such situations and clarify what a compliment really means, it would be a great advantage.

The primary goal of this research project is to provide raw data on compliments from social media platforms. Not only the compliments themselves but also their surrounding context is of interest. Ideally, the entire conversation or post thread in which a compliment occurs should be extracted. The focus of this project is on the automatic extraction of large quantities of data, ideally of high quality, which can be achieved by making use of the fact that compliments are speech acts that often elicit formulaic responses, such as I'm flattered or That's so kind of you. This project does not involve manual annotation, except perhaps on a very small scale.

Prerequisites for students:

Experience with Python
Familiarity with common social media platforms, such as X (formerly Twitter)
Computing competence in processing hierarchical data structures (e.g. extracting, representing, and persistently storing thread-based hierarchies from online forums), for example using tree- or graph-based representations.
Good knowledge of data formats, such as CSV, JSON or XML
Willingness to work with unfamiliar Python packages, such as APIs for web scraping
Willingness to experiment with different tools for scraping content from various platforms
Openness to work on language data

Project open to: Business Analytics, Data Science, Digital Humanities (but students should be comfortable with coding in Python)

Number of students: 1-4