메인 내용으로 이동
이 문서를 언급한 문서들2

Anthropic AI Safety Fellow

Why are you interested in participating in the Fellows program?

Read: coscientist.app

I build cognitive infrastructure for science: epistemic systems that function as a co-scientist for human researchers. This is not AGI. It is capable enough to measurably expand human scientific throughput while preserving human control over claims, evidence, and verification.

The objective is "unrotting researchers' brains," or epistemic sovereignty under AI: keeping knowledge networks reliable when synthetic content is cheap and ubiquitous. I have written about knowledge rot and verification collapse, in which unverified AI output degrades shared truth. This aligns with Anthropic's focus on AI that remains under human direction and in accordance with human values. I want to build epistemic infrastructure for AI that increases truthfulness, auditability, and trust instead of eroding them.

My fit is practical. I operate across research, infrastructure engineering, and system design. I build systems that translate theory into durable tools, evaluation, and operations. I aim to be a Science Medici, accelerating scientific progress by enabling other researchers with high-leverage platforms.

I bring experience in AIOps and federated learning, plus a safety-first approach to deployment. I want to contribute to co-scientist systems with rigorous verification, provenance, uncertainty tracking, and human-in-the-loop oversight at scale.

Please tell us briefly about an area of technical AI safety work you're currently excited about, and why.

Building AI epistemic infrastructure that makes large language models more reliable and more truthful. Implementing systems that monitor, audit, and augment a model's knowledge so that deployed models resist knowledge decay and failure modes such as model collapse.

The risk is known: training or fine-tuning on low-quality, synthetic, or poorly filtered data can degrade model performance and distort learned distributions, weakening factual reliability and value alignment. The response is engineering, not slogans.

My approach combines retrieval and operations. Use retrieval-augmented generation so that outputs are grounded in trusted corpora with provenance, citations, and adversarially robust retrieval. Add continuous AI-Ops monitoring that treats models as production-critical systems: drift detection, calibration checks, red-team regressions, automated eval suites, anomaly detection on outputs, and human-in-the-loop feedback loops tied to measurable quality gates.

This is safety work with concrete artifacts. It creates verifiability, transparency, and long-horizon resilience. It prevents small errors from compounding into systemic misinformation by detecting degradation early and enforcing corrective interventions.

Relevant AI safety background

My background spans AI research and production engineering, with a consistent focus on robustness, trustworthiness, and alignment with human constraints.

At Lunit, a medical AI company, I led engineering work on an AI-Ops platform (INCL) to support the reliable deployment of models in high-stakes clinical settings. That work imposed rigor on monitoring, rollback, evaluation, and cost controls, making safety and operational discipline non-negotiable.

At Grammarly, I worked on the Experimentation Platform and saw how large language models are managed and evaluated for quality in a product environment.

On the research side, I collaborated with Prof. Jose-Luis Ambite at USC ISI on vertical federated learning for privacy-preserving data integration. The goal was to enable cross-institution learning without centralizing sensitive data, aligning model development with privacy requirements by design.

I also maintain a personal knowledge base, Extracranial, where I publish analyses on knowledge rot, epistemic scaffolding, and self-amplifying failure modes in AI-mediated knowledge systems. That includes an "encyclopedia meltdown" scenario in which AI-generated content recursively degrades a knowledge base, reinforcing my focus on safeguards, verification, and provenance.