이 문서를 언급한 문서들1
Anthropic AI Safety Fellow
Why are you interested in participating in the Fellows program?
Read: coscientist.app
I build cognitive infrastructure for science: epistemic systems that function as a co-scientist for human researchers. Not AGI. Capable enough to measurably expand human scientific throughput while preserving human control over claims, evidence, and verification.
The objective is "unrotting researcher's brain". Or, epistemic sovereignty under AI: keeping knowledge networks reliable when synthetic content is cheap and ubiquitous. I have written about knowledge rot and verification collapse, where unverified AI output degrades shared truth. This aligns with Anthropic's focus on AI that remains under human direction and aligned with human values. I want to build AI epistemic infrastructure that increases truthfulness, auditability, and trust instead of eroding them.
My fit is practical. I operate across research, infrastructure engineering, and system design. I build systems that translate theory into durable tooling, evaluation, and operations. I aim to be a Science Medici: accelerating scientific progress by enabling other researchers with high-leverage platforms.
I bring experience in AI-Ops and federated learning, plus a safety-first approach to deployment. I want to contribute to co-scientist systems with rigorous verification, provenance, uncertainty tracking, and human-in-the-loop oversight at scale.
Please tell us briefly about an area of technical AI safety work you're currently excited about, and why.
Build AI epistemic infrastructure that makes large language models more reliable and more truthful. Implement systems that monitor, audit, and augment a model's knowledge so deployed models resist knowledge decay and failure modes such as model collapse.
The risk is known: training or fine-tuning on low-quality, synthetic, or poorly filtered data can degrade model performance and distort learned distributions, weakening factual reliability and value alignment. The response is engineering, not slogans.
My approach combines retrieval and operations. Use retrieval-augmented generation so outputs are grounded in trusted corpora with provenance, citations, and adversarially robust retrieval. Add continuous AI-Ops monitoring that treats models as production-critical systems: drift detection, calibration checks, red-team regressions, automated eval suites, anomaly detection on outputs, and human-in-the-loop feedback loops tied to measurable quality gates.
This is safety work with concrete artifacts. It creates verifiability, transparency, and long-horizon resilience. It prevents small errors from compounding into systemic misinformation by detecting degradation early and enforcing corrective interventions.
Relevant AI safety background
My background spans AI research and production engineering, with a consistent focus on robustness, trustworthiness, and alignment with human constraints.
At Lunit, a medical AI company, I led engineering work on an AI-Ops platform (INCL) to support reliable deployment of models in high-stakes clinical settings. That work forced rigor on monitoring, rollback, evaluation, and cost controls, and made safety and operational discipline non-negotiable.
At Grammarly, I worked on Experimentation Platform and saw how large language models are managed and evaluated for quality in a product environment.
On the research side, I collaborated with Prof. Jose-Luis Ambite at USC ISI on vertical federated learning for privacy-preserving data integration. The goal was to enable cross-institution learning without centralizing sensitive data, aligning model development with privacy requirements by design.
I also maintain a personal knowledge base, Extracranial, where I publish analyses on knowledge rot, epistemic scaffolding, and self-amplifying failure modes in AI-mediated knowledge systems. That includes an "encyclopedia meltdown" scenario where AI-generated content recursively degrades a knowledge base, reinforcing my focus on safeguards, verification, and provenance.