youtube-travel-buzz-sentiment-classifier
πΉ Model Name
youtube-travel-buzz-sentiment-classifier
πΉ Model Description
A Korean multi-class sentiment classifier that decomposes travel-related YouTube comments into positive, neutral, and negative signals for travel demand analysis.
πΉ Model Summary
This model performs three-class sentiment classification on Korean YouTube comments that have already been identified as travel-related.
Sentiment labels:
0: Negative1: Neutral2: Positive
Unlike conventional sentiment models, this classifier explicitly preserves neutral sentiment, which primarily captures information-seeking and intent-driven comments.
This design enables downstream analysis linking online discourse patterns to real-world travel demand signals.
The model is trained on LLM-generated synthetic comments designed to mimic the linguistic characteristics of real YouTube travel discussions.
πΉ Intended Use
Primary Use Case
- Decomposing travel-related YouTube buzz into structured sentiment signals
- Supporting:
- Exploratory demand analysis
- Early-stage travel interest detection
- Trend-level behavioral research
Out-of-Scope Use
- Emotion detection beyond sentiment polarity
- Individual-level behavior prediction
- Standalone decision-making systems
πΉ Training Data
- Type: Synthetic Korean YouTube travel comments generated using multiple LLMs
- Labels:
0: Negative1: Neutral2: Positive
- Key Characteristics:
- Informal language, slang, typos, emojis
- Mixed sentence length and ambiguity
- Designed to approximate real-world YouTube comment noise
- Neutral comments intentionally modeled to represent questions, factual statements, and information-seeking behavior
Downstream analysis revealed that neutral sentiment often functions as a proxy for latent travel intent, particularly for emerging destinations.
Prompt design details and data generation strategy are documented in the associated GitHub repository.
πΉ Model Architecture
- Base model:
monologg/koelectra-base-discriminator - Task: Multi-class sentiment classification
- Tokenizer: KoELECTRA tokenizer
- Fine-tuning: Hugging Face Trainer API
πΉ Performance (Indicative)
- Overall Accuracy: ~96%
- Macro F1-score: ~96% (balanced synthetic validation set)
These metrics were obtained on a held-out synthetic validation set and reflect controlled experimental conditions.
Given the semantic ambiguity of short-form YouTube comments, the model is intended for trend-level and aggregate analysis rather than individual comment-level judgment.
Performance on real-world YouTube comments may differ due to distribution shift and unmodeled linguistic nuance.
πΉ Limitations
Fine-grained emotional nuance is not modeled
Synthetic data bias may persist in edge cases
Not optimized for sarcasm-heavy or long-form comments
Performance may degrade on real-world comments without additional fine-tuning on authentic data.
πΉ Ethical Considerations
- No personal data used
- Outputs should be interpreted at aggregate signal level, not individual judgment
πΉ Related Resources
π Full pipeline code and documentation:
https://github.com/DalDream/youtube-travel-buzz-nlp-pipeline
π Upstream travel relevance classifier:
https://huggingface.co/DalDream/youtube-travel-buzz-relevance-classifier
πΉ Citation / Attribution
This model was developed as part of a YouTube Travel Buzz Signal Extraction NLP pipeline for research and portfolio demonstration purposes.
Author / Contributions
- [DalDream] β Project lead for model strategy, pipeline design, model validation, and final documentation.
- [GY Yu] β LLM-based synthetic data generation, dataset construction, model training, and fine-tuning.
Note: This model is the result of a collaborative team project. Responsibilities are listed to clarify individual contributions.
- Downloads last month
- 5
Model tree for DalDream/youtube-travel-buzz-sentiment-classifier
Base model
monologg/koelectra-base-discriminator