🧠 RQA — Reasoning Quality Analyzer (R1)

RQA is a judge model designed to evaluate the quality of reasoning in text.
It does not generate, rewrite, or explain content — instead, it assesses whether a text contains logical problems, and if so, what kind.

RQA is a judge, not a teacher and not a generator.

🔍 What Problem Does RQA Solve?

Texts written by humans or LLMs can:

sound coherent,
use correct vocabulary,
appear persuasive,

…but still contain logical problems that are:

implicit,
structural,
hidden in argumentation.

RQA focuses strictly on reasoning quality, not on style, sentiment, or factual correctness.

🧩 Model Overview

Property	Value
Model Type	Judge / Evaluator
Base Encoder	XLM-RoBERTa Large
Pooling	Mean pooling
Heads	2 (binary + multi-label)
Language	Russian 🇷🇺
License	MIT

🧠 What the Model Predicts

RQA produces two independent signals that are combined at inference time:

1️⃣ Logical Issue Detection (Binary)

has_issue ∈ {false, true}
Calibrated probability available
Designed to answer:
“Does this text contain a reasoning problem?”

2️⃣ Error Type Signals (Multi-label)

The model estimates probabilities for specific error types:

false_causality
unsupported_claim
overgeneralization
missing_premise
contradiction
circular_reasoning

⚠️ Important
Error type probabilities are diagnostic signals, not mandatory labels.
They are surfaced only if has_issue == true during inference.

🟡 Hidden Logical Problems (Key Concept)

RQA explicitly distinguishes between:

🔴 Explicit Logical Errors

Clearly identifiable fallacies:

invalid causal inference
circular reasoning
contradictions
unsupported claims

🟡 Hidden Logical Problems

Texts that are:

argumentative or persuasive,
structurally incomplete,
reliant on implicit assumptions,

but do not contain a cleanly classifiable fallacy.

Examples:

missing or unstated premises
rhetorical generalizations
context-dependent claims

Hidden problems are not misclassifications —
they are an intended diagnostic category.

⚖️ Inference Logic (Important)

The model uses decision logic on top of raw logits:

Binary head decides whether a problem exists
Error heads provide type-level evidence
If:
- has_issue == false
- but error probabilities are non-zero
  → the text may be flagged as borderline or hidden problem

This prevents:

false positive error labels,
incoherent outputs,
over-triggering on clean factual texts.

🏗️ Architecture Details

Encoder: XLM-RoBERTa Large (pretrained weights preserved)
Pooling: Mean pooling (robust for long texts)
Two independent projections:
- binary reasoning head
- multi-label error head
Separate dropout and projections to reduce negative transfer

🎓 Training Philosophy

🔒 Strict Data Contract

Logical texts contain no errors
Hidden-problem texts contain no explicit fallacies
Invalid samples are removed, not auto-corrected

⚖️ Balanced Difficulty

Hidden problems ≤ 30% of problematic texts
Prevents collapse into vague uncertainty detection

🎯 Loss Design

Binary BCE for issue detection
Masked multi-label loss for error types
Stability-oriented multi-task optimization

🌡️ Confidence Calibration

RQA applies post-hoc temperature scaling:

Separate calibration for:
- has_issue
- each error type
Enables:
- meaningful probabilities
- safe threshold tuning
- production use without retraining

🚀 Intended Use

✅ Recommended for:

Reasoning quality evaluation
LLM output auditing
AI safety pipelines
Argumentation analysis
Pre-filtering / routing systems

❌ Not intended for:

Text generation
Error correction
Explanation or tutoring
Grammar or style analysis
Fact checking

🧪 Model Behavior

Conservative by design
Optimized for low false positives
Explicitly robust to:
- topic changes
- writing style
- emotional tone

RQA judges logical structure, not persuasion quality.

📚 Training Data (High-level)

Custom-built dataset
Thousands of long-form argumentative texts
Multiple domains and reasoning styles
Carefully controlled balance of:
- logical texts
- explicit errors
- hidden problems

The dataset was designed specifically for judge behavior, not for text generation.

⚠️ Limitations

Logical validity ≠ factual correctness
Purely descriptive texts may still trigger diagnostic signals
Highly rhetorical or persuasive texts can be flagged as hidden problems
Philosophical disagreement is not always a logical error

🧩 Philosophy

Good reasoning is not about sounding convincing —
it is about what actually follows from what.

RQA is built around this principle.

🔧 Implementation Details

Custom Hugging Face architecture (modeling_rqa.py)
Requires:
- trust_remote_code=True
Uses safetensors
No .bin weights (this is expected behavior)

🚀 Quick Start

import torch
from transformers import AutoTokenizer, AutoModel

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(
    "skatzR/RQA-R1",
    trust_remote_code=True
)

model = AutoModel.from_pretrained(
    "skatzR/RQA-R1",
    trust_remote_code=True
).to(device)

model.eval()

🧠 Reference Inference Logic

RQA is designed to be used with explicit post-processing logic, including:

temperature scaling
thresholding
disagreement diagnostics
hidden-problem detection

A fully working reference implementation is provided here:

👉 📄 inference.py — Reference Inference Implementation

✅ Example

📄 Текст:
После того как в городе открыли новый торговый центр, увеличилось количество разводов. 
Следовательно, открытие торгового центра разрушает семьи.

🔎 Обнаружена проблема: ДА (100.00%)

❌ Явные логические ошибки:
  • Ложная причинно-следственная связь — 95.95%

📊 Disagreement: 0.034

📜 License

MIT

Downloads last month: 41

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for skatzR/RQA-R1

Base model

FacebookAI/xlm-roberta-large

Finetuned

(886)

this model