Hugging Face

๐Ÿง  RQA โ€” Reasoning Quality Analyzer (R1)

RQA is a judge model designed to evaluate the quality of reasoning in text.
It does not generate, rewrite, or explain content โ€” instead, it assesses whether a text contains logical problems, and if so, what kind.

RQA is a judge, not a teacher and not a generator.


๐Ÿ” What Problem Does RQA Solve?

Texts written by humans or LLMs can:

  • sound coherent,
  • use correct vocabulary,
  • appear persuasive,

โ€ฆbut still contain logical problems that are:

  • implicit,
  • structural,
  • hidden in argumentation.

RQA focuses strictly on reasoning quality, not on style, sentiment, or factual correctness.


๐Ÿงฉ Model Overview

Property Value
Model Type Judge / Evaluator
Base Encoder XLM-RoBERTa Large
Pooling Mean pooling
Heads 2 (binary + multi-label)
Language Russian ๐Ÿ‡ท๐Ÿ‡บ
License MIT

๐Ÿง  What the Model Predicts

RQA produces two independent signals that are combined at inference time:

1๏ธโƒฃ Logical Issue Detection (Binary)

  • has_issue โˆˆ {false, true}
  • Calibrated probability available
  • Designed to answer:
    โ€œDoes this text contain a reasoning problem?โ€

2๏ธโƒฃ Error Type Signals (Multi-label)

The model estimates probabilities for specific error types:

  • false_causality
  • unsupported_claim
  • overgeneralization
  • missing_premise
  • contradiction
  • circular_reasoning

โš ๏ธ Important
Error type probabilities are diagnostic signals, not mandatory labels.
They are surfaced only if has_issue == true during inference.


๐ŸŸก Hidden Logical Problems (Key Concept)

RQA explicitly distinguishes between:

๐Ÿ”ด Explicit Logical Errors

Clearly identifiable fallacies:

  • invalid causal inference
  • circular reasoning
  • contradictions
  • unsupported claims

๐ŸŸก Hidden Logical Problems

Texts that are:

  • argumentative or persuasive,
  • structurally incomplete,
  • reliant on implicit assumptions,

but do not contain a cleanly classifiable fallacy.

Examples:

  • missing or unstated premises
  • rhetorical generalizations
  • context-dependent claims

Hidden problems are not misclassifications โ€”
they are an intended diagnostic category.


โš–๏ธ Inference Logic (Important)

The model uses decision logic on top of raw logits:

  • Binary head decides whether a problem exists
  • Error heads provide type-level evidence
  • If:
    • has_issue == false
    • but error probabilities are non-zero
      โ†’ the text may be flagged as borderline or hidden problem

This prevents:

  • false positive error labels,
  • incoherent outputs,
  • over-triggering on clean factual texts.

๐Ÿ—๏ธ Architecture Details

  • Encoder: XLM-RoBERTa Large (pretrained weights preserved)
  • Pooling: Mean pooling (robust for long texts)
  • Two independent projections:
    • binary reasoning head
    • multi-label error head
  • Separate dropout and projections to reduce negative transfer

๐ŸŽ“ Training Philosophy

๐Ÿ”’ Strict Data Contract

  • Logical texts contain no errors
  • Hidden-problem texts contain no explicit fallacies
  • Invalid samples are removed, not auto-corrected

โš–๏ธ Balanced Difficulty

  • Hidden problems โ‰ค 30% of problematic texts
  • Prevents collapse into vague uncertainty detection

๐ŸŽฏ Loss Design

  • Binary BCE for issue detection
  • Masked multi-label loss for error types
  • Stability-oriented multi-task optimization

๐ŸŒก๏ธ Confidence Calibration

RQA applies post-hoc temperature scaling:

  • Separate calibration for:
    • has_issue
    • each error type
  • Enables:
    • meaningful probabilities
    • safe threshold tuning
    • production use without retraining

๐Ÿš€ Intended Use

โœ… Recommended for:

  • Reasoning quality evaluation
  • LLM output auditing
  • AI safety pipelines
  • Argumentation analysis
  • Pre-filtering / routing systems

โŒ Not intended for:

  • Text generation
  • Error correction
  • Explanation or tutoring
  • Grammar or style analysis
  • Fact checking

๐Ÿงช Model Behavior

  • Conservative by design
  • Optimized for low false positives
  • Explicitly robust to:
    • topic changes
    • writing style
    • emotional tone

RQA judges logical structure, not persuasion quality.


๐Ÿ“š Training Data (High-level)

  • Custom-built dataset
  • Thousands of long-form argumentative texts
  • Multiple domains and reasoning styles
  • Carefully controlled balance of:
    • logical texts
    • explicit errors
    • hidden problems

The dataset was designed specifically for judge behavior, not for text generation.


โš ๏ธ Limitations

  • Logical validity โ‰  factual correctness
  • Purely descriptive texts may still trigger diagnostic signals
  • Highly rhetorical or persuasive texts can be flagged as hidden problems
  • Philosophical disagreement is not always a logical error

๐Ÿงฉ Philosophy

Good reasoning is not about sounding convincing โ€”
it is about what actually follows from what.

RQA is built around this principle.


๐Ÿ”ง Implementation Details

  • Custom Hugging Face architecture (modeling_rqa.py)
  • Requires:
    • trust_remote_code=True
  • Uses safetensors
  • No .bin weights (this is expected behavior)

๐Ÿš€ Quick Start

import torch
from transformers import AutoTokenizer, AutoModel

device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained(
    "skatzR/RQA-R1",
    trust_remote_code=True
)

model = AutoModel.from_pretrained(
    "skatzR/RQA-R1",
    trust_remote_code=True
).to(device)

model.eval()

๐Ÿง  Reference Inference Logic

RQA is designed to be used with explicit post-processing logic, including:

  • temperature scaling
  • thresholding
  • disagreement diagnostics
  • hidden-problem detection

A fully working reference implementation is provided here:

๐Ÿ‘‰ ๐Ÿ“„ inference.py โ€” Reference Inference Implementation


โœ… Example

๐Ÿ“„ ะขะตะบัั‚:
ะŸะพัะปะต ั‚ะพะณะพ ะบะฐะบ ะฒ ะณะพั€ะพะดะต ะพั‚ะบั€ั‹ะปะธ ะฝะพะฒั‹ะน ั‚ะพั€ะณะพะฒั‹ะน ั†ะตะฝั‚ั€, ัƒะฒะตะปะธั‡ะธะปะพััŒ ะบะพะปะธั‡ะตัั‚ะฒะพ ั€ะฐะทะฒะพะดะพะฒ. 
ะกะปะตะดะพะฒะฐั‚ะตะปัŒะฝะพ, ะพั‚ะบั€ั‹ั‚ะธะต ั‚ะพั€ะณะพะฒะพะณะพ ั†ะตะฝั‚ั€ะฐ ั€ะฐะทั€ัƒัˆะฐะตั‚ ัะตะผัŒะธ.

๐Ÿ”Ž ะžะฑะฝะฐั€ัƒะถะตะฝะฐ ะฟั€ะพะฑะปะตะผะฐ: ะ”ะ (100.00%)

โŒ ะฏะฒะฝั‹ะต ะปะพะณะธั‡ะตัะบะธะต ะพัˆะธะฑะบะธ:
  โ€ข ะ›ะพะถะฝะฐั ะฟั€ะธั‡ะธะฝะฝะพ-ัะปะตะดัั‚ะฒะตะฝะฝะฐั ัะฒัะทัŒ โ€” 95.95%

๐Ÿ“Š Disagreement: 0.034

๐Ÿ“œ License

MIT


Downloads last month
41
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for skatzR/RQA-R1

Finetuned
(886)
this model