youtube-travel-buzz-sentiment-classifier

πŸ”Ή Model Name

youtube-travel-buzz-sentiment-classifier


πŸ”Ή Model Description

A Korean multi-class sentiment classifier that decomposes travel-related YouTube comments into positive, neutral, and negative signals for travel demand analysis.


πŸ”Ή Model Summary

This model performs three-class sentiment classification on Korean YouTube comments that have already been identified as travel-related.

Sentiment labels:

  • 0: Negative
  • 1: Neutral
  • 2: Positive

Unlike conventional sentiment models, this classifier explicitly preserves neutral sentiment, which primarily captures information-seeking and intent-driven comments.

This design enables downstream analysis linking online discourse patterns to real-world travel demand signals.

The model is trained on LLM-generated synthetic comments designed to mimic the linguistic characteristics of real YouTube travel discussions.


πŸ”Ή Intended Use

Primary Use Case

  • Decomposing travel-related YouTube buzz into structured sentiment signals
  • Supporting:
    • Exploratory demand analysis
    • Early-stage travel interest detection
    • Trend-level behavioral research

Out-of-Scope Use

  • Emotion detection beyond sentiment polarity
  • Individual-level behavior prediction
  • Standalone decision-making systems

πŸ”Ή Training Data

  • Type: Synthetic Korean YouTube travel comments generated using multiple LLMs
  • Labels:
    • 0: Negative
    • 1: Neutral
    • 2: Positive
  • Key Characteristics:
    • Informal language, slang, typos, emojis
    • Mixed sentence length and ambiguity
    • Designed to approximate real-world YouTube comment noise
    • Neutral comments intentionally modeled to represent questions, factual statements, and information-seeking behavior

Downstream analysis revealed that neutral sentiment often functions as a proxy for latent travel intent, particularly for emerging destinations.

Prompt design details and data generation strategy are documented in the associated GitHub repository.


πŸ”Ή Model Architecture

  • Base model: monologg/koelectra-base-discriminator
  • Task: Multi-class sentiment classification
  • Tokenizer: KoELECTRA tokenizer
  • Fine-tuning: Hugging Face Trainer API

πŸ”Ή Performance (Indicative)

  • Overall Accuracy: ~96%
  • Macro F1-score: ~96% (balanced synthetic validation set)

These metrics were obtained on a held-out synthetic validation set and reflect controlled experimental conditions.

Given the semantic ambiguity of short-form YouTube comments, the model is intended for trend-level and aggregate analysis rather than individual comment-level judgment.

Performance on real-world YouTube comments may differ due to distribution shift and unmodeled linguistic nuance.


πŸ”Ή Limitations

  • Fine-grained emotional nuance is not modeled

  • Synthetic data bias may persist in edge cases

  • Not optimized for sarcasm-heavy or long-form comments

  • Performance may degrade on real-world comments without additional fine-tuning on authentic data.


πŸ”Ή Ethical Considerations

  • No personal data used
  • Outputs should be interpreted at aggregate signal level, not individual judgment

πŸ”Ή Related Resources


πŸ”Ή Citation / Attribution

This model was developed as part of a YouTube Travel Buzz Signal Extraction NLP pipeline for research and portfolio demonstration purposes.

Author / Contributions

  • [DalDream] – Project lead for model strategy, pipeline design, model validation, and final documentation.
  • [GY Yu] – LLM-based synthetic data generation, dataset construction, model training, and fine-tuning.

Note: This model is the result of a collaborative team project. Responsibilities are listed to clarify individual contributions.

Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DalDream/youtube-travel-buzz-sentiment-classifier

Finetuned
(2)
this model