Heting Mao's picture

17 1

Heting Mao

IkanRiddle

·

IkanRiddle

AI & ML interests

None yet

Recent Activity

reacted to kanaria007's post with ❤️ about 6 hours ago

✅ New Article: *Observations, Under-Observation, and Repair Loops* (v0.1) Title: 👁️ Observations, Under-Observation, and Repair Loops: The OBS Cookbook for SI-Core 🔗 https://huggingface.co/blog/kanaria007/observations-under-observation --- Summary: SI-Core’s rule is simple: *No effectful Jump without PARSED observations.* This article turns that slogan into an operational design: define *observation units* (sem_type/scope/status/confidence/backing_refs), detect *under-observation* (missing / degraded / biased), and run *repair loops* instead of “jumping in the dark.” Key clarification: under-observed conditions may still run *read / eval_pre / jump-sandbox*, but must not commit or publish (sandbox: `publish_result=false`, `memory_writes=disabled`). --- Why It Matters: • Prevents “we had logs, so we had context” failures: *logs ≠ observations* unless typed + contract-checked • Makes safety real: even PARSED observations should be gated by *coverage/confidence minima* (declared thresholds) • Turns OBS into something measurable: *SCover_obs + SInt* become “OBS health” and safe-mode triggers • Links semantic compression to reality: distinguish *missing raw* vs *compression loss*, and fix the right thing --- What’s Inside: • A practical observation-status taxonomy: `PARSED / DEGRADED / STUB / ESTIMATED / MISSING / REDACTED / INVALID` (+ mapping to core status) • Per-jump *observation contracts* (required sem_types, allowed statuses, age/confidence limits) + explicit fallback actions • Fallback patterns: *safe-mode / conservative default / sandbox-only / human-in-loop* • Repair loops as first-class: ledgered `obs.repair_request`, PLB proposals, governance review for contract changes • Testing OBS itself: property tests, chaos drills, golden-diff for observation streams --- 📖 Structured Intelligence Engineering Series this is the *“how to operate OBS”* layer—so the system can *know when it doesn’t know* and repair over time.

upvoted an article about 9 hours ago

Observations, Under-Observation, and Repair Loops

reacted to kanaria007's post with ❤️ 6 days ago

✅ New Article: *Measuring What Matters in Learning* (v0.1) Title: 📏 Measuring What Matters in Learning: GCS and Metrics for Support Systems 🔗 https://huggingface.co/blog/kanaria007/measuring-what-matters-in-learning --- Summary: Most “AI for education” metrics measure *grades, time-on-task, and engagement*. That’s not enough for *support systems* (tutors, developmental assistants, social-skills coaches), where the real failure mode is: *the score goes up while the learner breaks*. This guide reframes learning evaluation as *multi-goal contribution*, tracked as a *GCS vector* (mastery, retention, wellbeing/load, self-efficacy, autonomy, fairness, safety) — and shows how to operationalize it without falling into classic metric traps. > If you can’t measure wellbeing, fairness, and safety, > you’re not measuring learning — you’re measuring extraction. --- Why It Matters: • Moves beyond “grading” into *support metrics* designed for real learners • Makes *wellbeing, autonomy, fairness, and safety* first-class (not afterthoughts) • Separates *daily ops metrics* vs *research evaluation* vs *governance/safety* • Turns “explainability” into *answerable questions* (“why this intervention, now?”) --- What’s Inside: • A practical *GCS vector* for learning & developmental support • How core metrics translate into education contexts (plan consistency, trace coverage, rollback health) • A tiered metric taxonomy: *Ops / Research / Safety* • Parent-facing views that avoid shaming, leaderboards, and over-monitoring • Pitfalls and failure patterns: “optimize test scores”, “maximize engagement”, “ignore fairness”, etc. --- 📖 Structured Intelligence Engineering Series Formal contracts live in the evaluation/spec documents; this is the *how-to-think / how-to-use* layer.

View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet