A balanced dataset of 5,000 Reddit sarcasm instances annotated with five complementary explanation types โ cognitive, intent-based, contrastive, textual, and rule-based โ generated via a systematic LLM pipeline and validated through human evaluation.
Detecting sarcasm is one of the most challenging tasks in natural language understanding. Unlike existing sarcasm datasets that provide only binary labels, SarcasmExplain-5K provides five types of natural language explanations for each sarcastic instance, enabling research in explainable AI, pragmatic language understanding, and human-AI communication.
The dataset is built on Reddit conversations (r/sarcasm), balanced across sarcastic and non-sarcastic instances, and enriched through a systematic GPT-4 pipeline designed to generate diverse, high-quality explanations from multiple linguistic perspectives.
Each sarcastic instance includes five complementary explanations. Human evaluation focuses on Cognitive and Intent-Based types, with Contrastive used for Cognitive XAI research.
Describes the mental reasoning process โ what knowledge or belief the speaker is invoking to express sarcasm.
Identifies the communicative intent โ what the speaker is trying to achieve socially or emotionally.
Contrasts the sarcastic utterance with a sincere alternative to highlight the gap between literal and intended meaning.
Analyses the rhetorical and linguistic features โ word choice, tone, exaggeration โ that signal sarcasm.
Lists concrete linguistic markers (punctuation, register shift, hyperbole) that can be formalised as rules.
This dataset is free and open for research. To maintain quality and gather human evaluation data, access requires completing 3 annotation forms (โ10 minutes total).
Three forms will be randomly assigned to you from the Cognitive and Intent-Based evaluation pools.
Each form takes ~3 minutes. You'll rate 10 sarcasm explanations and optionally write corrections.
At the end of each form, you'll receive a unique 6-character code. Enter all 3 codes below.
Your HuggingFace download link will appear immediately after code verification.
Open each form in a new tab, complete it, and note the 6-character code shown at the end.
Thank you for contributing to the evaluation. Your annotation helps improve the dataset quality.
๐ค Download on Hugging Face โPlease cite this dataset if you use it in your research (see citation below).
Each explanation type has 255 evaluation forms, with each form covering 10 unique instances. Evaluators rate explanation clarity (1โ5), agree or disagree with the LLM-generated explanation, and optionally provide corrected explanations when they disagree.
This contribute-to-access model ensures ongoing community-driven quality validation โ a key methodological contribution of this work.
Human evaluation focuses on Cognitive and Intent-Based explanations, with Contrastive forms used for Cognitive XAI research.
@misc{mamun2025sarcasmexplain,
author = {Mamun, Maliha Binte},
title = {SarcasmExplain-5K: A Multi-Perspective Sarcasm Explanation Dataset},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/datasets/maliha/sarcasm-explain-5k},
note = {Independent research. Contact: [email protected]}
}