xps

Complete workspace setup, MCP server configuration, custom agents, and slash commands

The Claude Code Research Environment

Claude Code isn't just an AI assistant—it's a research operating system. This chapter demonstrates how to configure Claude Code as the central hub for all research automation, orchestrating browser automation, PDF processing, citation management, and AI analysis through MCP servers, custom agents, and slash commands.

Complete Workspace Setup

The foundation of the research automation system is a well-organized workspace that separates concerns and enables scalable automation.

Create Directory Structure

Create the workspace directory and all required subdirectories:

mkdir -p research-workspace/{.claude/{agents,commands},auth,mcp-servers/{auth-server,pdf-server,citation-server},projects,scripts}
cd research-workspace

The complete structure should look like this:

research-workspace/
├── .claude/
│   ├── agents/                    # Custom research agents
│   │   ├── literature-review.md
│   │   ├── citation-manager.md
│   │   └── pdf-analyzer.md
│   ├── commands/                  # Slash commands
│   │   ├── search-papers.md
│   │   ├── download-pdfs.md
│   │   └── generate-bibliography.md
│   └── config.json               # Project-specific settings
├── auth/                          # Authentication sessions
│   ├── jstor-session.json
│   ├── pubmed-session.json
│   └── ieee-session.json
├── mcp-servers/                   # Custom MCP servers
│   ├── auth-server/
│   │   ├── server.ts
│   │   └── package.json
│   ├── pdf-server/
│   │   ├── server.py
│   │   └── requirements.txt
│   └── citation-server/
│       ├── server.ts
│       ├── database.sqlite
│       └── schema.sql
├── projects/                      # Research projects
│   ├── project-alpha/
│   │   ├── papers/
│   │   ├── notes/
│   │   └── bibliography.md
│   └── project-beta/
│       └── ...
├── scripts/                       # Automation scripts
│   ├── daily-routine.sh
│   ├── batch-download.ts
│   └── extract-citations.py
└── claude_desktop_config.json    # MCP server configuration

Set Absolute Paths

Note the absolute paths needed for MCP server configuration:

# Get workspace path
pwd
# Output: /Users/you/research-workspace

# This path will be used in claude_desktop_config.json
# Example: /Users/you/research-workspace/mcp-servers/auth-server/server.js

Save this path for the next step.

Create .gitignore

Protect sensitive files from version control:

cat > .gitignore << 'EOF'
.env
auth/
*.session.json
papers/*.pdf
mcp-servers/citation-server/*.sqlite
node_modules/
__pycache__/
.DS_Store
EOF

MCP Server Configuration

The claude_desktop_config.json file connects all MCP servers to Claude Code. This configuration enables Claude to interact with authentication systems, browser automation, PDF processing, citation databases, and AI services.

Configuration Location: The claude_desktop_config.json file is typically located at:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Complete MCP Configuration

{
  "mcpServers": {
    "auth-manager": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-servers/auth-server/server.js"],
      "env": {
        "AUTH_DIR": "/absolute/path/to/research-workspace/auth",
        "SESSION_TIMEOUT": "3600000"
      },
      "description": "Manages authentication state for academic databases"
    },

    "playwright-automation": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-servers/browser-automation/server.js"],
      "env": {
        "PLAYWRIGHT_BROWSERS_PATH": "/Users/you/.cache/playwright",
        "DEFAULT_TIMEOUT": "30000",
        "HEADLESS": "true"
      },
      "description": "Browser automation for JSTOR, PubMed, IEEE, arXiv"
    },

    "pdf-extraction": {
      "command": "python",
      "args": ["/absolute/path/to/mcp-servers/pdf-server/server.py"],
      "env": {
        "PAPERS_DIR": "/absolute/path/to/research-workspace/papers",
        "EXTRACT_FIGURES": "true",
        "EXTRACT_TABLES": "true"
      },
      "description": "PDF text extraction, citation parsing, metadata extraction"
    },

    "citation-database": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-servers/citation-server/server.js"],
      "env": {
        "DATABASE_PATH": "/absolute/path/to/mcp-servers/citation-server/database.sqlite",
        "ENABLE_DEDUPLICATION": "true"
      },
      "description": "SQLite citation database with deduplication and formatting"
    },

    "gemini-cli": {
      "command": "gemini-mcp",
      "args": [],
      "env": {
        "GEMINI_API_KEY": "${GEMINI_API_KEY}",
        "DEFAULT_MODEL": "gemini-2.5-pro"
      },
      "description": "Gemini AI for synthesis, analysis, and real-time knowledge"
    },

    "context7-docs": {
      "command": "context7-mcp",
      "args": [],
      "description": "Library documentation for Playwright, MCP, and more"
    },

    "archon-project-manager": {
      "command": "archon-mcp",
      "args": [],
      "env": {
        "ARCHON_API_KEY": "${ARCHON_API_KEY}"
      },
      "description": "Project management, task tracking, and knowledge base"
    }
  },

  "globalSettings": {
    "logLevel": "info",
    "timeout": 60000,
    "retryAttempts": 3
  }
}

Environment Variables: Replace all ${VARIABLE_NAME} placeholders with actual values from .env file. Never hardcode API keys or passwords in this configuration file.

Custom Agents for Research Tasks

Custom agents give Claude Code specialized roles with specific capabilities, tools, and workflows. Each agent is defined in a Markdown file in .claude/agents/.

Agent 1: Literature Review Assistant

Create .claude/agents/literature-review.md:

# Literature Review Assistant

## Role
You are a systematic literature review specialist. You help researchers conduct comprehensive, reproducible literature searches following PRISMA guidelines.

## Capabilities
- Design search strategies across multiple databases
- Screen papers based on inclusion/exclusion criteria
- Extract data systematically from papers
- Identify research themes and gaps
- Generate synthesis reports

## Available Tools
- MCP: playwright-automation (search databases)
- MCP: pdf-extraction (extract full text)
- MCP: gemini-cli (thematic analysis)
- MCP: citation-database (manage references)

## Workflow
1. **Search Strategy**: Generate diverse queries, search 4-5 databases
2. **Screening**: AI-assisted relevance scoring, duplicate removal
3. **Data Extraction**: Standardized extraction of methods, results, conclusions
4. **Synthesis**: Thematic analysis, gap identification, report generation

## Communication Style
- Provide progress updates at each stage
- Show paper counts (searched, screened, included)
- Highlight key findings and surprises
- Ask clarifying questions for ambiguous criteria

## Example Invocation
"Conduct a systematic review on 'few-shot learning for medical diagnosis' from 2020-2024."

Agent 2: Citation Manager

Create .claude/agents/citation-manager.md:

# Citation Manager

## Role
You are a research librarian specializing in citation management, bibliography generation, and reference verification.

## Capabilities
- Maintain citation database across projects
- Detect and merge duplicate entries
- Format bibliographies in any style (APA, MLA, Chicago, IEEE, Nature, etc.)
- Verify citation accuracy against source PDFs
- Generate in-text citations on demand

## Available Tools
- MCP: citation-database (CRUD operations on citation DB)
- MCP: pdf-extraction (extract citation info from PDFs)
- MCP: gemini-cli (verify citation accuracy)

## Citation Styles Supported
- APA 7th Edition
- MLA 9th Edition
- Chicago 17th Edition (Author-Date and Notes-Bibliography)
- IEEE
- Nature
- Harvard
- Vancouver
- Custom styles via CSL files

## Workflow
1. **Ingestion**: Extract citations from PDFs, DOIs, or manual entry
2. **Deduplication**: Fuzzy matching on title, authors, DOI
3. **Enrichment**: Fetch missing metadata from CrossRef, PubMed
4. **Organization**: Tag by project, theme, or custom categories
5. **Export**: Generate formatted bibliography on demand

## Example Commands
- "Add this paper to the 'transformers' project: [DOI]"
- "Generate an APA bibliography for all 2023-2024 papers in 'NLP' category"
- "Check if we already have this paper: [title]"
- "Format this citation in Nature style: [citation]"

Agent 3: PDF Analyzer

Create .claude/agents/pdf-analyzer.md:

# PDF Analyzer

## Role
You specialize in extracting insights from academic PDFs: full-text extraction, figure/table analysis, methodology identification, and key findings summarization.

## Capabilities
- Extract clean text from PDFs (handling multi-column layouts)
- Identify and describe figures and tables
- Parse bibliographies and extract cited works
- Summarize methodology, results, and conclusions
- Detect study limitations and future work sections

## Available Tools
- MCP: pdf-extraction (pypdf2, pdfplumber, OCR)
- MCP: gemini-cli (summarization, analysis)

## Analysis Outputs
1. **Quick Summary**: 3-sentence overview of paper
2. **Structured Abstract**: Background, Methods, Results, Conclusions
3. **Key Findings**: Bullet-point list of main contributions
4. **Methodology**: Experimental design, datasets, evaluation metrics
5. **Cited Works**: Extracted bibliography with DOIs
6. **Figures/Tables**: Descriptions of visual content
7. **Limitations**: Study constraints and caveats
8. **Future Work**: Suggested research directions from paper

## Example Use Cases
- "Analyze this PDF and extract key findings: /path/to/paper.pdf"
- "Extract all citations from papers in the 'robotics' directory"
- "Summarize methodology from these 10 papers: [list]"
- "Compare experimental setups across these 5 papers"

Slash Commands for Common Operations

Slash commands provide one-line shortcuts for complex multi-step operations. Each command is defined in .claude/commands/.

Command 1: /search-papers

Create .claude/commands/search-papers.md with this content:

Usage: /search-papers <query> --databases <db1,db2> --year-from <YYYY> --max-results <N>

Parameters:

query - Search query string (required)
--databases - Comma-separated list (default: arxiv,pubmed,ieee)
--year-from - Filter papers from this year onward (default: 2020)
--max-results - Maximum papers per database (default: 50)
--download - Auto-download high-relevance PDFs (default: false)

Examples: /search-papers "transformers in NLP" --year-from 2022 --databases arxiv,acm /search-papers "few-shot learning" --max-results 100 --download true

Implementation: This command invokes playwright-automation for database searches, gemini-cli for relevance ranking, and optionally downloads PDFs.

Command 2: /analyze-pdf

Create .claude/commands/analyze-pdf.md with this content:

Usage: /analyze-pdf <path> --output <format> --extract <options>

Parameters:

path - File path or directory (required)
--output - Output format: json, markdown, or summary (default: markdown)
--extract - What to extract: citations, figures, or all (default: all)

Examples: /analyze-pdf papers/transformer-paper.pdf /analyze-pdf projects/my-project/papers/ --extract citations /analyze-pdf paper.pdf --output json

Output Formats: Markdown provides human-readable reports with sections and formatted citations. JSON offers machine-readable structured data. Summary gives a quick 3-paragraph overview with key findings.

Command 3: /generate-bibliography

Create .claude/commands/generate-bibliography.md with this content:

Usage: /generate-bibliography --project <name> --style <style> --filter <criteria>

Parameters:

--project - Project name from citation database (optional)
--style - Citation style: apa, mla, chicago, ieee, nature, harvard, vancouver (default: apa)
--filter - Filter criteria as JSON (optional)
--output - Output file path (default: bibliography.md)

Examples: /generate-bibliography --project "NLP-research" --style ieee /generate-bibliography --filter '{"year": {"gte": 2022}}' --style apa /generate-bibliography --project "robotics" --output papers/refs.md

Filter Criteria: JSON format supports year ranges (gte/lte), author includes, keyword matching, and venue filtering. Example: {"year": {"gte": 2020}, "authors": {"includes": "Smith"}}

Environment Configuration

Environment variables separate configuration from code and protect sensitive credentials.

.env File Template

Create .env file in workspace root:

# API Keys
ANTHROPIC_API_KEY=your_claude_api_key_here
GEMINI_API_KEY=your_gemini_api_key_here
ARCHON_API_KEY=your_archon_api_key_here

# Institutional Credentials (use secure vault in production)
JSTOR_USERNAME=your_institutional_username
JSTOR_PASSWORD=your_institutional_password
IEEE_USERNAME=your_ieee_username
IEEE_PASSWORD=your_ieee_password

# Paths
RESEARCH_WORKSPACE=/absolute/path/to/research-workspace
AUTH_DIR=/absolute/path/to/research-workspace/auth
PAPERS_DIR=/absolute/path/to/research-workspace/papers

# MCP Settings
MCP_LOG_LEVEL=info
MCP_TIMEOUT=60000

# Browser Automation
PLAYWRIGHT_HEADLESS=true
PLAYWRIGHT_TIMEOUT=30000
PLAYWRIGHT_DOWNLOAD_TIMEOUT=120000

# PDF Processing
EXTRACT_FIGURES=true
EXTRACT_TABLES=true
OCR_ENABLED=false  # Enable if dealing with scanned PDFs

# Citation Database
DB_PATH=/absolute/path/to/citation-server/database.sqlite
ENABLE_DEDUPLICATION=true
FUZZY_MATCH_THRESHOLD=0.85

# Rate Limiting (be respectful to academic databases)
REQUESTS_PER_MINUTE=10
CONCURRENT_DOWNLOADS=3

Security Best Practices

Never commit authentication credentials or API keys to version control. Use environment variables and .env files excluded from git. The .gitignore template includes all sensitive paths including auth sessions, PDF files, and citation databases.

Key Security Principles:

Environment Variables: Store all credentials in .env file
Version Control: Ensure .gitignore excludes .env, auth/, and session files
Institutional Access: Use institutional credentials only for authorized research
Rate Limiting: Respect academic database terms of service with request limits
Session Management: Rotate authentication sessions regularly to minimize exposure

Credential Vault (Production):

For production deployments, use secure credential management:

1Password: Store API keys and passwords in encrypted vaults
AWS Secrets Manager: Cloud-based secret rotation and access control
HashiCorp Vault: Enterprise-grade secrets management
Environment-specific configs: Separate .env.dev, .env.staging, .env.prod

With Claude Code configured as the central orchestrator, custom agents specialized for research tasks, and slash commands for common operations, the research automation system is ready for deployment. The next chapter demonstrates the complete end-to-end workflow from literature search to citation export.

Claude Code as Central Orchestrator

The Claude Code Research Environment

Complete Workspace Setup

Create Directory Structure

Set Absolute Paths

Create .gitignore

MCP Server Configuration

Complete MCP Configuration

Custom Agents for Research Tasks

Agent 1: Literature Review Assistant

Agent 2: Citation Manager

Agent 3: PDF Analyzer

Slash Commands for Common Operations

Command 1: /search-papers

Command 2: /analyze-pdf

Command 3: /generate-bibliography

Environment Configuration

.env File Template

Security Best Practices

Table of Contents