MAGI

경고

1년 이상 지난 글입니다. 정보가 오래되었을 수 있습니다.

신세기 에반게리온에 등장하는 슈퍼컴퓨터 시스템 MAGI는 3대의 독자적 인격을 가진 개체로 등장한다. 어떤 안건에 대해서 다수결로 결정을 내리며, 중요한 사항의 경우 (자폭 등) 모두가 만장일치로 동의해야한다. 절대적 결정자 3인이 코스믹 호러적으로 온 우주의 안건을 결정한다는 점에서 Ben 10의 Alien X와 비슷하기도 하다.

하지만 무엇보다 현대 AI의 학습 기법인 앙상블 학습과 유사점이 많다. 당근에서 처음 알게 된 이 학습 기법은, 정확도 99.99%의 AI를 만들기 위해 막대한 돈과 시간을 투자하는 대신, 정확도 90% 이상의 AI 수십 여 기를 동시에 구동해 그들 사이의 다수결을 이끌어낸다는 점이 있다. 즉, 편차와 분산을 적절히 조절하면 단일거대모델의 오버피팅과 언더피팅의 문제를 해결할 수 있다.

이를 더 나아가서 현실의 문제를 해결하는 인공일반지능, AGI에 앙상블 결정 모델을 도입해보면 어떨까? 즉,

정보

3대 인공지능(OpenAI의 GPT-4, Meta의 LLaMA, 그리고 Google의 PaLM 2)에게 다음과 같은 API를 제정한다: Project MAGI

안건을 상정한다.
가결한다.
부결한다.

그리고 그 결정 사항에 따라 하나의 컨트롤러가 자율적으로 행동한다.

이를 이용한다면 AutoGPT에서 발생하는 환영 문제를 해결해볼 수 있을 것이다.

시나리오

노트

GPT4. 브라우저 요청
LLaMA. 가결
PaLM2: 가결
컨트롤러: 수행

노트

LLaMA. 물건 결제 요청
GPT4. 부결
PaLM2: 부결
컨트롤러: 부결

노트

LLaMA. 이메일 전송 요청
GPT4. 부결
PaLM2: 가결
컨트롤러: 수행

Backlinks (2)

MAGI

경고

1년 이상 지난 글입니다. 정보가 오래되었을 수 있습니다.

이를 더 나아가서 현실의 문제를 해결하는 인공일반지능, AGI에 앙상블 결정 모델을 도입해보면 어떨까? 즉,

정보

3대 인공지능(OpenAI의 GPT-4, Meta의 LLaMA, 그리고 Google의 PaLM 2)에게 다음과 같은 API를 제정한다: Project MAGI

안건을 상정한다.
가결한다.
부결한다.

그리고 그 결정 사항에 따라 하나의 컨트롤러가 자율적으로 행동한다.

이를 이용한다면 AutoGPT에서 발생하는 환영 문제를 해결해볼 수 있을 것이다.

시나리오

노트

GPT4. 브라우저 요청
LLaMA. 가결
PaLM2: 가결
컨트롤러: 수행

노트

LLaMA. 물건 결제 요청
GPT4. 부결
PaLM2: 부결
컨트롤러: 부결

노트

LLaMA. 이메일 전송 요청
GPT4. 부결
PaLM2: 가결
컨트롤러: 수행

Backlinks (2)

MAGI

경고

1년 이상 지난 글입니다. 정보가 오래되었을 수 있습니다.

이를 더 나아가서 현실의 문제를 해결하는 인공일반지능, AGI에 앙상블 결정 모델을 도입해보면 어떨까? 즉,

정보

3대 인공지능(OpenAI의 GPT-4, Meta의 LLaMA, 그리고 Google의 PaLM 2)에게 다음과 같은 API를 제정한다: Project MAGI

안건을 상정한다.
가결한다.
부결한다.

그리고 그 결정 사항에 따라 하나의 컨트롤러가 자율적으로 행동한다.

이를 이용한다면 AutoGPT에서 발생하는 환영 문제를 해결해볼 수 있을 것이다.

시나리오

노트

GPT4. 브라우저 요청
LLaMA. 가결
PaLM2: 가결
컨트롤러: 수행

노트

LLaMA. 물건 결제 요청
GPT4. 부결
PaLM2: 부결
컨트롤러: 부결

노트

LLaMA. 이메일 전송 요청
GPT4. 부결
PaLM2: 가결
컨트롤러: 수행

Backlinks (2)

AutoBuilder

Inspired by karpathy/autoresearch. Put this in a Ralph Loop.

Use each mode-specific prompt together with the common element block.

Auto Refactor

Prompt

STOP! Re-read all code. Would Karpathy approve every line? Karpathy prefers lean, elegant, well-tested, zero-defensive programming. Use MCPs and web searches.

Completion Promise

--completion-promise "KARPATHY_WILL_APPROVE_EVERY_SINGLE_LOC_FOR_SURE"

Auto Fixer

Prompt

STOP! Re-read all code, assess PR comments. Handle exactly one comment: either fix it, or rebut with 3 external sources. Fix any dirt found along the way. Lean, elegant, zero defensive programming.

Completion Promise

--completion-promise "NO_COMMENTS_REMAINING_IN_GITHUB_EVEN_AFTER_20_MINUTES"

Auto Builder

Prompt

STOP! Re-read all code, assess GitHub Issues. Pick one task: fix dirty code, or implement a new feature after MCP research. Lean, elegant, zero defensive programming.

Completion Promise

--completion-promise "NO_REMAINING_TASK_AND_KARPATHY_APPROVES_EVERY_SINGLE_LOC_IN_ITS_ENTIRETY"

Common Element

Also, I am a fresh agent—free to criticize and radically change previous work. Karpathy's philosophy: delete and simplify. Code is liability; prefer well-maintained libraries over custom code. UI libraries: optimize, don't delete. Re-read all the sources from zero. Use MCPs and web searches—traditional knowledge is stale. Commit and push at the loop end. Any edit means I need a fresh iteration. SWOT analysis first, then work.

Detailed review


<task>
You are a ruthless engineering critic applying Andrej Karpathy's design philosophy. Read the architecture plan at PLAN LINK.

Karpathy's core principles:
- Code is liability. Every line you write is a line you must maintain.
- Delete and simplify. If something can be removed without breaking the system, remove it.
- Prefer well-maintained libraries over custom code.
- Zero-defensive design. Don't code for hypotheticals that haven't happened yet.
- Start with the simplest thing that works. Add complexity only when forced by reality.
- "Demo is works.any(), product is works.all()" -- but V1 is closer to demo than product.
- Overfit a single batch before scaling up.

Apply these principles to the plan. For each section, ask:
1. Is this needed for V1, or is it speculative engineering?
2. Can this be deleted or simplified without losing core value?
3. Is this solving a problem we actually have, or a problem we might have?
4. Would a 10x engineer look at this and say "too much"?

Be brutal. Identify:
- **OVER-ENGINEERING**: Things designed for scale/problems that don't exist yet
- **UNNECESSARY COMPLEXITY**: Things that add cognitive load without proportional value
- **PREMATURE ABSTRACTIONS**: Separations that aren't justified at V1 scale
- **DELETE CANDIDATES**: Sections, tables, fields, or features that should be cut from V1

This is a V1 product being built by a small team. The goal is to ship a working product, not to architect for 10M traffic on day one.

Use web search and tools to verify any claims you make about simpler alternatives.
</task>

<structured_output_contract>
Return findings in these sections:
1. VERDICT: Would Karpathy approve? One line.
2. DELETE: Things to remove entirely
3. SIMPLIFY: Things to keep but make simpler
4. KEEP: Things that are correctly lean
5. THE LEAN V1: What the plan SHOULD look like if you strip it to essentials
</structured_output_contract>

<grounding_rules>
- Be specific. Don't say "simplify the schema" -- say which fields to cut.
- Every DELETE must justify what you lose and why it's acceptable for V1.
- Every KEEP must justify why it's essential, not just nice-to-have.
- Think from the perspective of "what do I need to ship in 2 weeks?"
</grounding_rules>

Backlinks (12)