Heting Mao
IkanRiddle
·
AI & ML interests
None yet
Recent Activity
reacted
to
kanaria007's
post
with ❤️
about 6 hours ago
✅ New Article: *Observations, Under-Observation, and Repair Loops* (v0.1)
Title:
👁️ Observations, Under-Observation, and Repair Loops: The OBS Cookbook for SI-Core
🔗 https://huggingface.co/blog/kanaria007/observations-under-observation
---
Summary:
SI-Core’s rule is simple: *No effectful Jump without PARSED observations.*
This article turns that slogan into an operational design: define *observation units* (sem_type/scope/status/confidence/backing_refs), detect *under-observation* (missing / degraded / biased), and run *repair loops* instead of “jumping in the dark.”
Key clarification: under-observed conditions may still run *read / eval_pre / jump-sandbox*, but must not commit or publish (sandbox: `publish_result=false`, `memory_writes=disabled`).
---
Why It Matters:
• Prevents “we had logs, so we had context” failures: *logs ≠ observations* unless typed + contract-checked
• Makes safety real: even PARSED observations should be gated by *coverage/confidence minima* (declared thresholds)
• Turns OBS into something measurable: *SCover_obs + SInt* become “OBS health” and safe-mode triggers
• Links semantic compression to reality: distinguish *missing raw* vs *compression loss*, and fix the right thing
---
What’s Inside:
• A practical observation-status taxonomy: `PARSED / DEGRADED / STUB / ESTIMATED / MISSING / REDACTED / INVALID` (+ mapping to core status)
• Per-jump *observation contracts* (required sem_types, allowed statuses, age/confidence limits) + explicit fallback actions
• Fallback patterns: *safe-mode / conservative default / sandbox-only / human-in-loop*
• Repair loops as first-class: ledgered `obs.repair_request`, PLB proposals, governance review for contract changes
• Testing OBS itself: property tests, chaos drills, golden-diff for observation streams
---
📖 Structured Intelligence Engineering Series
this is the *“how to operate OBS”* layer—so the system can *know when it doesn’t know* and repair over time.
upvoted
an
article
about 9 hours ago
Observations, Under-Observation, and Repair Loops
reacted
to
kanaria007's
post
with ❤️
6 days ago
✅ New Article: *Measuring What Matters in Learning* (v0.1)
Title:
📏 Measuring What Matters in Learning: GCS and Metrics for Support Systems
🔗 https://huggingface.co/blog/kanaria007/measuring-what-matters-in-learning
---
Summary:
Most “AI for education” metrics measure *grades, time-on-task, and engagement*.
That’s not enough for *support systems* (tutors, developmental assistants, social-skills coaches), where the real failure mode is: *the score goes up while the learner breaks*.
This guide reframes learning evaluation as *multi-goal contribution*, tracked as a *GCS vector* (mastery, retention, wellbeing/load, self-efficacy, autonomy, fairness, safety) — and shows how to operationalize it without falling into classic metric traps.
> If you can’t measure wellbeing, fairness, and safety,
> you’re not measuring learning — you’re measuring extraction.
---
Why It Matters:
• Moves beyond “grading” into *support metrics* designed for real learners
• Makes *wellbeing, autonomy, fairness, and safety* first-class (not afterthoughts)
• Separates *daily ops metrics* vs *research evaluation* vs *governance/safety*
• Turns “explainability” into *answerable questions* (“why this intervention, now?”)
---
What’s Inside:
• A practical *GCS vector* for learning & developmental support
• How core metrics translate into education contexts (plan consistency, trace coverage, rollback health)
• A tiered metric taxonomy: *Ops / Research / Safety*
• Parent-facing views that avoid shaming, leaderboards, and over-monitoring
• Pitfalls and failure patterns: “optimize test scores”, “maximize engagement”, “ignore fairness”, etc.
---
📖 Structured Intelligence Engineering Series
Formal contracts live in the evaluation/spec documents; this is the *how-to-think / how-to-use* layer.
Organizations
None yet