๐ง RQA โ Reasoning Quality Analyzer (R1)
RQA is a judge model designed to evaluate the quality of reasoning in text.
It does not generate, rewrite, or explain content โ instead, it assesses whether a text contains logical problems, and if so, what kind.
RQA is a judge, not a teacher and not a generator.
๐ What Problem Does RQA Solve?
Texts written by humans or LLMs can:
- sound coherent,
- use correct vocabulary,
- appear persuasive,
โฆbut still contain logical problems that are:
- implicit,
- structural,
- hidden in argumentation.
RQA focuses strictly on reasoning quality, not on style, sentiment, or factual correctness.
๐งฉ Model Overview
| Property | Value |
|---|---|
| Model Type | Judge / Evaluator |
| Base Encoder | XLM-RoBERTa Large |
| Pooling | Mean pooling |
| Heads | 2 (binary + multi-label) |
| Language | Russian ๐ท๐บ |
| License | MIT |
๐ง What the Model Predicts
RQA produces two independent signals that are combined at inference time:
1๏ธโฃ Logical Issue Detection (Binary)
has_issue โ {false, true}- Calibrated probability available
- Designed to answer:
โDoes this text contain a reasoning problem?โ
2๏ธโฃ Error Type Signals (Multi-label)
The model estimates probabilities for specific error types:
false_causalityunsupported_claimovergeneralizationmissing_premisecontradictioncircular_reasoning
โ ๏ธ Important
Error type probabilities are diagnostic signals, not mandatory labels.
They are surfaced only if has_issue == true during inference.
๐ก Hidden Logical Problems (Key Concept)
RQA explicitly distinguishes between:
๐ด Explicit Logical Errors
Clearly identifiable fallacies:
- invalid causal inference
- circular reasoning
- contradictions
- unsupported claims
๐ก Hidden Logical Problems
Texts that are:
- argumentative or persuasive,
- structurally incomplete,
- reliant on implicit assumptions,
but do not contain a cleanly classifiable fallacy.
Examples:
- missing or unstated premises
- rhetorical generalizations
- context-dependent claims
Hidden problems are not misclassifications โ
they are an intended diagnostic category.
โ๏ธ Inference Logic (Important)
The model uses decision logic on top of raw logits:
- Binary head decides whether a problem exists
- Error heads provide type-level evidence
- If:
has_issue == false- but error probabilities are non-zero
โ the text may be flagged as borderline or hidden problem
This prevents:
- false positive error labels,
- incoherent outputs,
- over-triggering on clean factual texts.
๐๏ธ Architecture Details
- Encoder: XLM-RoBERTa Large (pretrained weights preserved)
- Pooling: Mean pooling (robust for long texts)
- Two independent projections:
- binary reasoning head
- multi-label error head
- Separate dropout and projections to reduce negative transfer
๐ Training Philosophy
๐ Strict Data Contract
- Logical texts contain no errors
- Hidden-problem texts contain no explicit fallacies
- Invalid samples are removed, not auto-corrected
โ๏ธ Balanced Difficulty
- Hidden problems โค 30% of problematic texts
- Prevents collapse into vague uncertainty detection
๐ฏ Loss Design
- Binary BCE for issue detection
- Masked multi-label loss for error types
- Stability-oriented multi-task optimization
๐ก๏ธ Confidence Calibration
RQA applies post-hoc temperature scaling:
- Separate calibration for:
has_issue- each error type
- Enables:
- meaningful probabilities
- safe threshold tuning
- production use without retraining
๐ Intended Use
โ Recommended for:
- Reasoning quality evaluation
- LLM output auditing
- AI safety pipelines
- Argumentation analysis
- Pre-filtering / routing systems
โ Not intended for:
- Text generation
- Error correction
- Explanation or tutoring
- Grammar or style analysis
- Fact checking
๐งช Model Behavior
- Conservative by design
- Optimized for low false positives
- Explicitly robust to:
- topic changes
- writing style
- emotional tone
RQA judges logical structure, not persuasion quality.
๐ Training Data (High-level)
- Custom-built dataset
- Thousands of long-form argumentative texts
- Multiple domains and reasoning styles
- Carefully controlled balance of:
- logical texts
- explicit errors
- hidden problems
The dataset was designed specifically for judge behavior, not for text generation.
โ ๏ธ Limitations
- Logical validity โ factual correctness
- Purely descriptive texts may still trigger diagnostic signals
- Highly rhetorical or persuasive texts can be flagged as hidden problems
- Philosophical disagreement is not always a logical error
๐งฉ Philosophy
Good reasoning is not about sounding convincing โ
it is about what actually follows from what.
RQA is built around this principle.
๐ง Implementation Details
- Custom Hugging Face architecture (
modeling_rqa.py) - Requires:
trust_remote_code=True
- Uses
safetensors - No
.binweights (this is expected behavior)
๐ Quick Start
import torch
from transformers import AutoTokenizer, AutoModel
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(
"skatzR/RQA-R1",
trust_remote_code=True
)
model = AutoModel.from_pretrained(
"skatzR/RQA-R1",
trust_remote_code=True
).to(device)
model.eval()
๐ง Reference Inference Logic
RQA is designed to be used with explicit post-processing logic, including:
- temperature scaling
- thresholding
- disagreement diagnostics
- hidden-problem detection
A fully working reference implementation is provided here:
๐ ๐ inference.py โ Reference Inference Implementation
โ Example
๐ ะขะตะบัั:
ะะพัะปะต ัะพะณะพ ะบะฐะบ ะฒ ะณะพัะพะดะต ะพัะบััะปะธ ะฝะพะฒัะน ัะพัะณะพะฒัะน ัะตะฝัั, ัะฒะตะปะธัะธะปะพัั ะบะพะปะธัะตััะฒะพ ัะฐะทะฒะพะดะพะฒ.
ะกะปะตะดะพะฒะฐัะตะปัะฝะพ, ะพัะบัััะธะต ัะพัะณะพะฒะพะณะพ ัะตะฝััะฐ ัะฐะทัััะฐะตั ัะตะผัะธ.
๐ ะะฑะฝะฐััะถะตะฝะฐ ะฟัะพะฑะปะตะผะฐ: ะะ (100.00%)
โ ะฏะฒะฝัะต ะปะพะณะธัะตัะบะธะต ะพัะธะฑะบะธ:
โข ะะพะถะฝะฐั ะฟัะธัะธะฝะฝะพ-ัะปะตะดััะฒะตะฝะฝะฐั ัะฒัะทั โ 95.95%
๐ Disagreement: 0.034
๐ License
MIT
- Downloads last month
- 41
Model tree for skatzR/RQA-R1
Base model
FacebookAI/xlm-roberta-large