๐Ÿ“„ NLP Dataset ยท 2025

SarcasmExplain-5K:
Multi-Perspective Sarcasm
Explanation Dataset

A balanced dataset of 5,000 Reddit sarcasm instances annotated with five complementary explanation types โ€” cognitive, intent-based, contrastive, textual, and rule-based โ€” generated via a systematic LLM pipeline and validated through human evaluation.

๐Ÿ‘ค Author: Maliha Binte Mamun
๐Ÿ“… Released: 2025
๐Ÿ› Affiliation: Independent Researcher
5,000 Total Instances
2,500 Sarcastic
2,500 Non-Sarcastic
5 Explanation Types
255 Evaluation Forms
Reddit Data Source

About the Dataset

Detecting sarcasm is one of the most challenging tasks in natural language understanding. Unlike existing sarcasm datasets that provide only binary labels, SarcasmExplain-5K provides five types of natural language explanations for each sarcastic instance, enabling research in explainable AI, pragmatic language understanding, and human-AI communication.

The dataset is built on Reddit conversations (r/sarcasm), balanced across sarcastic and non-sarcastic instances, and enriched through a systematic GPT-4 pipeline designed to generate diverse, high-quality explanations from multiple linguistic perspectives.

Explanation Types

Each sarcastic instance includes five complementary explanations. Human evaluation focuses on Cognitive and Intent-Based types, with Contrastive used for Cognitive XAI research.

Cognitive โ˜… Evaluated

Why does the mind recognise it?

Describes the mental reasoning process โ€” what knowledge or belief the speaker is invoking to express sarcasm.

Intent-Based โ˜… Evaluated

What is the speaker's goal?

Identifies the communicative intent โ€” what the speaker is trying to achieve socially or emotionally.

Contrastive โ˜… XAI Research

What would sincerity look like?

Contrasts the sarcastic utterance with a sincere alternative to highlight the gap between literal and intended meaning.

Textual

How do the words signal it?

Analyses the rhetorical and linguistic features โ€” word choice, tone, exaggeration โ€” that signal sarcasm.

Rule-Based

What patterns identify it?

Lists concrete linguistic markers (punctuation, register shift, hyperbole) that can be formalised as rules.

Sample Entry

Comment
"Yeah, like the president is a big deal!"
Parent Comment
And even a prominent democrat defended him.
Cognitive Explanation
The speaker invokes the common knowledge that the presidency is a position of immense power and responsibility โ€” using that as a foil to express dismissiveness or mock someone who downplays the president's importance.
Intent-Based Explanation
The speaker is mocking whoever minimizes the president's significance, asserting their own belief that presidential authority is undeniable. Social goal: highlight absurdity of the counterpart's position.
Contrastive Explanation
A sincere version: "The president is indeed a significant figure" โ€” the sarcastic comment inverts this through dismissive phrasing and vocal emphasis on "big deal."

Access the Dataset

This dataset is free and open for research. To maintain quality and gather human evaluation data, access requires completing 3 annotation forms (โ‰ˆ10 minutes total).

01

Click "Get My Forms"

Three forms will be randomly assigned to you from the Cognitive and Intent-Based evaluation pools.

02

Complete all 3 forms

Each form takes ~3 minutes. You'll rate 10 sarcasm explanations and optionally write corrections.

03

Enter your completion codes

At the end of each form, you'll receive a unique 6-character code. Enter all 3 codes below.

04

Instant dataset access

Your HuggingFace download link will appear immediately after code verification.

Your Assigned Forms

Open each form in a new tab, complete it, and note the 6-character code shown at the end.

โœ…

Access Granted

Thank you for contributing to the evaluation. Your annotation helps improve the dataset quality.

๐Ÿค— Download on Hugging Face โ†’

Please cite this dataset if you use it in your research (see citation below).

Dataset Structure

CSV Columns
label โ†’ 0 (non-sarcastic) | 1 (sarcastic) comment โ†’ The original Reddit comment parent_comment โ†’ Conversational context label_name โ†’ "sarcastic" | "non_sarcastic" rephrased_comment โ†’ Non-sarcastic paraphrase cognitive_explanation โ†’ Mental reasoning perspective intent_based_explanation โ†’ Speaker's communicative goal contrastive_explanation โ†’ Sarcastic vs. sincere comparison textual_explanation โ†’ Linguistic analysis perspective rule_based_explanation โ†’ Linguistic markers identified

Human Evaluation Framework

Each explanation type has 255 evaluation forms, with each form covering 10 unique instances. Evaluators rate explanation clarity (1โ€“5), agree or disagree with the LLM-generated explanation, and optionally provide corrected explanations when they disagree.

This contribute-to-access model ensures ongoing community-driven quality validation โ€” a key methodological contribution of this work.

Human evaluation focuses on Cognitive and Intent-Based explanations, with Contrastive forms used for Cognitive XAI research.

Citation

@misc{mamun2025sarcasmexplain,
  author    = {Mamun, Maliha Binte},
  title     = {SarcasmExplain-5K: A Multi-Perspective Sarcasm Explanation Dataset},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/datasets/maliha/sarcasm-explain-5k},
  note      = {Independent research. Contact: [email protected]}
}