0d 0h 0m
AGI House AI for Science Hackathon

The Iterative
AI Scientist

Autonomous research that compounds insights between human check-ins — powered by continuous reasoning-chain evolution

"LLMs can be very smart, but if they're not iterating on science, they won't discover science."
— Ekin Dogus, Periodic Labs
Why Standard Agents Stall

The Problem

Current AI systems operate in one-shot bursts. Without persistent reasoning chains, they forget past explorations, redraw the same conclusions, and never reach the breakthrough questions scientists chase.

Build persistent reasoning chains that evolve over time

Generate hypotheses from gaps in existing knowledge

Critique and refine their own ideas

Form connections across disparate research domains

Remember and build upon previous explorations

Status Quo

One-shot completions with brittle focus. No memory. No critique. No accumulation.

What We Need
  • Reasoning memories that compound instead of resetting
  • Self-critique loops that graduate weak ideas
  • Cross-domain graph search that spots hidden links
  • Human-style research workflows that stay in context
Persistent iteration turns scattered insights into a scientific discovery engine.

What Makes It Powerful

A platform for continuous reasoning that could be applied to any domain—once the right knowledge graph and specialist agents are in place.

Persistent Research Context

Unlike one-shot queries, the system maintains reasoning chains across sessions. Research from Monday informs Friday's hypotheses, creating compound insights over time.

Cross-Domain Pattern Recognition

By traversing graph connections between disparate fields, it surfaces patterns that single-domain researchers might miss—the foundation for interdisciplinary breakthroughs.

Hypothesis Generation Platform

With the right knowledge graph schema (proteins, materials, etc.), the continuous reasoning loop becomes applicable to specialized domains—from biology to quantum chemistry.

Autonomous Background Research

While you sleep, the system keeps exploring gaps, validating chains, and preparing insights for your next check-in—research that never stops compounding.

The infrastructure for autonomous research. The applications are limitless.

The Orchestration Flow

Coordinator → specialist handoffs with built-in guardrails and full trace observability

01

RECEIVE

User request routed to coordinator agent

02

DELEGATE

Coordinator hands off to Graph/Research/Outreach specialists

03

EXECUTE

Specialist runs MCP tools, browser automation, or outreach flows

04

GUARD

Shared guardrails block sensitive payloads and mask disclosures

05

TRACE

Every event tagged with traceId for replay and debugging

06

PERSIST

Graph writes and memories stored in Neo4j, streamed to dashboard

Runs between human check-ins

Why It's Trustworthy

Built-in observability and guardrails make autonomous research safe and debuggable

Trace-First Observability

Every event carries a traceId logged in memories for complete replay and debugging. Full transparency into agent handoffs and reasoning chains.

Persistent Guardrails

Shared reject_sensitive_requests and mask_sensitive_disclosures functions block sensitive payloads across all specialist agents.

Autonomous Background Research

Coordinator delegates work to specialists that run continuously between human check-ins, compounding insights over time.

Graph-Backed Memory

Neo4j MCP tools persist reasoning chains and research activities, creating a durable knowledge base that survives sessions.

Progress So Far

Now we need your help to resolve bugs and feature gaps to make it production-ready

Trace & Memory Plumbing

Week 1

TraceId metadata flows through all SSE events and persists in memory logs for complete replay capability

Multi-Agent Orchestration Live

Week 2

Coordinator agent delegates to Graph Ops, Research Ops, and Outreach Ops using OpenAI Agents SDK handoffs

Guardrails & Dashboard Streaming

Week 3

Shared guardrail functions block sensitive payloads; Next.js dashboard streams orchestrator events in real-time

Recruiting 2–3 Teammates

Now

Seeking contributors to add regression tests, UI observability, guardrail tuning, and MCP resilience before hackathon

Join us to push this across the finish line before the AGI House hackathon

What you'll build

The orchestration pipeline is live. You'll harden it with regression tests, observability UI, and production-grade resilience.

Multi-Agent Regression Prompts

Create deterministic test prompts for each specialist agent (Graph/Research/Outreach) to validate handoffs and outputs

PythonTestingPrompt Engineering

UI Trace + Handoff Surfacing

Build dashboard components that visualize trace lineage and agent handoff chains for full observability

ReactNext.jsSSE Streaming

Guardrail Coverage Tests

Develop automated tests for guardrails with real research payloads, then tune sensitivity thresholds

TestingSecurityPython

Neo4j MCP Write Resilience

Add retries, idempotency, and telemetry to graph write operations for production reliability

Neo4jMCP ToolsError Handling

This is a hackathon, not a job interview. We're looking for people who want to build something ambitious in a weekend.

Multi-Agent Orchestration

A coordinator delegates research, graph updates, and outreach to specialist agents running continuously in the background

Coordinator Agent

  • Multi-agent handoff orchestration
  • OpenAI Agents SDK routing
  • Trace-first observability (traceId)
  • Shared guardrail enforcement

Graph Ops Specialist

  • Neo4j MCP read/write tools
  • Reasoning-chain persistence
  • Knowledge graph maintenance
  • Subgraph retrieval

Research Ops Specialist

  • Browser Use + Hunter intelligence
  • Live data gathering
  • Structured research activities
  • Evidence compilation

Outreach Ops Specialist

  • Human-in-the-loop proposals
  • Composio Gmail drafts
  • Twilio voice calls
  • Contact management

Tested Reference Prototypes

We're not starting from scratch—leveraging prior prototypes to move fast

Semantic Graph Memory MCP

Cognitive knowledge graph with reasoning chain support and advanced graph operations

View on GitHub

Multi-Agent Graph Deep Research

Multi-step orchestrated research system with iterative refinement capabilities

View on GitHub

Voice Integration with Graph Context

Hume EVI and real-time interaction experience for natural scientific exploration

View on GitHub

What Success Looks Like

Clear milestones for the hackathon and beyond

Minimum Viable Demo

  • Ingest 100+ papers from arXiv/PubMed
  • Identify gaps in literature
  • Generate novel hypothesis
  • Grade with confidence scores
  • Real-time visualization

Stretch Goals

  • Cross-domain hypothesis generation
  • Experimental design suggestions
  • Multiple iteration cycles
  • Voice-guided exploration
  • Collaborative reasoning sessions
Team Roles

Join the Team

Looking for 2-3 passionate people to make this real. One spot already filled.

Neo4j/Graph Database

Optimize queries, design schema, and build efficient graph operations for scientific knowledge

Scientific Domain Knowledge

Understand research methodologies, reasoning patterns, and scientific validation processes

Frontend/Visualization

Build compelling graph visualizations and intuitive interfaces for exploring reasoning chains

Filled

AI/ML Engineering

Work on hypothesis generation, critique systems, and iterative reasoning algorithms

Bring your expertise, passion for science, and desire to push the boundaries of AI

Interested?

Drop your email and I'll reach out with more details.

Or reach out directly at shan@rizvi.nu

Let's Build the Future of Science

Ready to enable AI systems that truly discover? Get in touch.

AGI House AI for Science Hackathon 2025