xps

Real-World Usage Patterns

The PDF intelligence system enables powerful automated workflows. This chapter demonstrates three production-ready patterns: comprehensive literature reviews, citation network analysis, and automated monitoring for new publications.

Example 1: Literature Review on Specific Topic

Complete workflow from search to organized knowledge base with automatic summarization.

Set Up the Literature Review Pipeline

Create the main review script that orchestrates database search, PDF download, text extraction, and organization.

// example-1-literature-review.js
import PDFResearchServer from './server.js';

async function conductLiteratureReview() {
  const server = new PDFResearchServer();

  console.log('Starting literature review on reinforcement learning in robotics...\n');

  // Execute full pipeline
  const result = await server.handleFullPipeline({
    query: 'reinforcement learning robotics manipulation',
    databases: ['pubmed', 'arxiv', 'ieee'],
    maxResults: 100,
    yearFrom: 2020,
    downloadDir: './papers/rl-robotics',
    organizeStrategy: 'topic-date',
    extractText: true
  });

  console.log('\n=== Literature Review Complete ===');
  console.log(`Papers found: ${result.summary.papersFound}`);
  console.log(`Papers downloaded: ${result.summary.papersDownloaded}`);
  console.log(`Content extracted: ${result.summary.papersExtracted}`);
  console.log(`Organized into ${result.summary.categories} categories`);

  // Generate summary report
  const report = generateSummaryReport(result);
  await fs.writeFile('./papers/rl-robotics/REVIEW_SUMMARY.md', report);

  console.log('\nSummary report saved to REVIEW_SUMMARY.md');
}

Implement Report Generation

Create the summary report generator that analyzes search results, citations, and paper organization.

function generateSummaryReport(result) {
  const { search, extraction, organization } = result.details;

  let report = `# Literature Review: Reinforcement Learning in Robotics\n\n`;
  report += `Date: ${new Date().toISOString().split('T')[0]}\n\n`;

  report += `## Overview\n\n`;
  report += `- Total papers found: ${search.papers.length}\n`;
  report += `- Papers downloaded: ${result.summary.papersDownloaded}\n`;
  report += `- Databases searched: ${Object.keys(search.byDatabase).join(', ')}\n\n`;

  report += `## Key Papers\n\n`;

  // Top 10 most recent papers
  const topPapers = search.papers
    .sort((a, b) => (b.year || 0) - (a.year || 0))
    .slice(0, 10);

  for (const paper of topPapers) {
    report += `### ${paper.title}\n`;
    report += `**Authors**: ${formatAuthors(paper.authors)}\n`;
    report += `**Year**: ${paper.year}\n`;
    report += `**Source**: ${paper.source}\n`;
    if (paper.doi) report += `**DOI**: ${paper.doi}\n`;
    report += `\n`;
  }

  report += `## Citation Analysis\n\n`;

  // Aggregate citation counts
  const allCitations = extraction
    ?.flatMap(e => e.citations || [])
    || [];

  report += `Total citations extracted: ${allCitations.length}\n\n`;

  // Most cited years
  const yearCounts = {};
  allCitations.forEach(c => {
    if (c.year) {
      yearCounts[c.year] = (yearCounts[c.year] || 0) + 1;
    }
  });

  report += `### Citations by Year\n\n`;
  Object.entries(yearCounts)
    .sort(([,a], [,b]) => b - a)
    .slice(0, 10)
    .forEach(([year, count]) => {
      report += `- ${year}: ${count} citations\n`;
    });

  report += `\n## Topics Covered\n\n`;
  Object.keys(organization).forEach(topic => {
    const papers = organization[topic];
    report += `- **${topic}**: ${papers.length} papers\n`;
  });

  return report;
}

function formatAuthors(authors) {
  if (!authors || authors.length === 0) return 'Unknown';
  if (!Array.isArray(authors)) return authors;

  if (authors.length <= 3) {
    return authors.join(', ');
  } else {
    return `${authors.slice(0, 3).join(', ')} et al.`;
  }
}

Run the Review

Execute the literature review workflow and verify results.

// Run the review
conductLiteratureReview().catch(console.error);

Expected Output:

Starting literature review on reinforcement learning in robotics...

Step 1: Searching databases...
  - PubMed: 45 papers found
  - arXiv: 78 papers found
  - IEEE: 62 papers found
  - After deduplication: 127 unique papers

Step 2: Downloading 127 papers...
  [Progress] 10/127 complete...
  [Progress] 50/127 complete...
  [Progress] 100/127 complete...
  [Progress] 127/127 complete
  - Downloaded: 119 papers
  - Skipped (already exists): 3 papers
  - Failed: 5 papers

Step 3: Extracting content...
  [Progress] Extracting from 119 PDFs...
  - Text extracted: 119 papers
  - Citations found: 3,847 total

Step 4: Organizing papers...
  - Created structure: research/rl-robotics/
  - Topics identified: 8 categories
  - Index files generated: 8 files

=== Literature Review Complete ===
Papers found: 127
Papers downloaded: 119
Content extracted: 119
Organized into 8 categories

Summary report saved to REVIEW_SUMMARY.md

Optimization Tips: Use the yearFrom parameter to limit search scope and reduce processing time. Set maxResults conservatively (50-100) for initial reviews. Enable organizeStrategy: 'topic-date' to automatically categorize papers by theme and publication year.

Example 2: Following Citation Chains

Discover papers by following citations recursively to build comprehensive citation networks.

Create Citation Chain Explorer

Build the recursive citation tracking function that discovers papers through reference networks.

// example-2-citation-chain.js
import PDFHunter from './pdf-hunter.js';
import ExtractionPipeline from './extraction-pipeline.js';
import DownloadManager from './download-manager.js';

async function followCitationChain(seedPaper, depth = 2) {
  const visited = new Set();
  const allPapers = [];
  const hunter = new PDFHunter();
  const downloader = new DownloadManager({ downloadDir: './citation-chain' });
  const extractor = new ExtractionPipeline();

  async function explore(paper, currentDepth) {
    if (currentDepth > depth) return;

    const key = paper.doi || paper.title;
    if (visited.has(key)) return;
    visited.add(key);

    console.log(`\nDepth ${currentDepth}: ${paper.title}`);
    allPapers.push({ ...paper, depth: currentDepth });

    // Download paper
    const downloadResult = await downloader.downloadPaper(paper);

    if (downloadResult.status === 'completed') {
      // Extract citations
      const extracted = await extractor.extractFromPDF(downloadResult.filepath);
      console.log(`  Found ${extracted.citations.length} citations`);

      // Search for cited papers
      for (const citation of extracted.citations.slice(0, 10)) {
        if (!citation.title && !citation.doi) continue;

        try {
          const searchQuery = citation.doi || citation.title ||
            `${citation.authors} ${citation.year}`;

          const results = await hunter.searchMultiple(
            searchQuery,
            ['pubmed', 'arxiv'],
            1
          );

          if (results.papers.length > 0) {
            await explore(results.papers[0], currentDepth + 1);
          }
        } catch (error) {
          console.error(`  Failed to find: ${citation.title}`);
        }

        // Rate limiting
        await delay(2000);
      }
    }
  }

  await explore(seedPaper, 0);

  return {
    totalPapers: allPapers.length,
    byDepth: groupByDepth(allPapers),
    papers: allPapers
  };
}

function groupByDepth(papers) {
  const grouped = {};
  papers.forEach(p => {
    if (!grouped[p.depth]) grouped[p.depth] = [];
    grouped[p.depth].push(p);
  });
  return grouped;
}

function delay(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Execute Citation Chain Discovery

Run the citation chain explorer with a seed paper and analyze the citation network.

// Example usage
const seedPaper = {
  title: 'Deep Reinforcement Learning for Robotic Manipulation',
  doi: '10.1109/ICRA.2018.8460692',
  authors: ['Levine, S.', 'Pastor, P.', 'Krizhevsky, A.', 'Ibarz, J.', 'Quillen, D.'],
  year: 2018
};

followCitationChain(seedPaper, 2)
  .then(result => {
    console.log('\n=== Citation Chain Complete ===');
    console.log(`Total papers discovered: ${result.totalPapers}`);
    console.log('By depth:');
    Object.entries(result.byDepth).forEach(([depth, papers]) => {
      console.log(`  Depth ${depth}: ${papers.length} papers`);
    });
  })
  .catch(console.error);

Expected Output:

Depth 0: Deep Reinforcement Learning for Robotic Manipulation
  Found 42 citations

Depth 1: Soft Actor-Critic: Off-Policy Maximum Entropy Deep RL
  Found 38 citations

Depth 1: Hindsight Experience Replay
  Found 31 citations

...

=== Citation Chain Complete ===
Total papers discovered: 87
By depth:
  Depth 0: 1 papers
  Depth 1: 10 papers
  Depth 2: 76 papers

Performance Tips: Limit citation chain depth to 2-3 levels to avoid exponential growth. Use the .slice(0, 10) pattern to follow only the most relevant citations per paper. Implement 2-second delays between searches to respect API rate limits and avoid being blocked.

Example 3: Staying Current with New Publications

Monitor databases for new papers matching research interests with automated notifications.

Create Research Monitor Class

Build the monitoring system that tracks publication dates and filters new papers.

// example-3-stay-current.js
import fs from 'fs/promises';
import PDFHunter from './pdf-hunter.js';
import DownloadManager from './download-manager.js';

class ResearchMonitor {
  constructor(configPath = './monitor-config.json') {
    this.configPath = configPath;
    this.hunter = new PDFHunter();
    this.downloader = new DownloadManager({
      downloadDir: './new-papers'
    });
  }

  async loadConfig() {
    const data = await fs.readFile(this.configPath, 'utf-8');
    return JSON.parse(data);
  }

  async saveLastCheck(timestamp) {
    const config = await this.loadConfig();
    config.lastCheck = timestamp;
    await fs.writeFile(this.configPath, JSON.stringify(config, null, 2));
  }

  async checkForNewPapers() {
    const config = await this.loadConfig();
    const currentDate = new Date();
    const lastCheck = config.lastCheck ? new Date(config.lastCheck) : null;

    console.log(`Checking for new papers since ${lastCheck?.toISOString() || 'never'}...\n`);

    const allNewPapers = [];

    for (const query of config.queries) {
      console.log(`Searching: "${query.topic}"`);

      const results = await this.hunter.searchMultiple(
        query.topic,
        query.databases,
        50
      );

      // Filter papers published since last check
      const newPapers = lastCheck
        ? results.papers.filter(p => {
            const pubDate = this.parsePublicationDate(p);
            return pubDate && pubDate > lastCheck;
          })
        : results.papers;

      console.log(`  Found ${newPapers.length} new papers`);
      allNewPapers.push(...newPapers.map(p => ({ ...p, query: query.topic })));
    }

    if (allNewPapers.length > 0) {
      // Download new papers
      console.log(`\nDownloading ${allNewPapers.length} new papers...`);
      const downloadResults = await this.downloader.downloadBatch(allNewPapers);

      // Generate notification
      await this.generateNotification(allNewPapers, downloadResults);
    }

    await this.saveLastCheck(currentDate.toISOString());

    return {
      newPapersFound: allNewPapers.length,
      papers: allNewPapers
    };
  }

  parsePublicationDate(paper) {
    if (paper.published) {
      return new Date(paper.published);
    }
    if (paper.year) {
      return new Date(paper.year, 0, 1);
    }
    return null;
  }

  async generateNotification(papers, downloadResults) {
    let notification = `# New Research Papers - ${new Date().toLocaleDateString()}\n\n`;
    notification += `Found ${papers.length} new papers matching your interests.\n\n`;

    // Group by query topic
    const byTopic = {};
    papers.forEach(p => {
      if (!byTopic[p.query]) byTopic[p.query] = [];
      byTopic[p.query].push(p);
    });

    for (const [topic, topicPapers] of Object.entries(byTopic)) {
      notification += `## ${topic}\n\n`;
      notification += `${topicPapers.length} new papers\n\n`;

      for (const paper of topicPapers) {
        notification += `### ${paper.title}\n`;
        notification += `**Authors**: ${formatAuthors(paper.authors)}\n`;
        notification += `**Year**: ${paper.year}\n`;
        notification += `**Source**: ${paper.source}\n`;
        if (paper.doi) notification += `**DOI**: ${paper.doi}\n`;
        notification += `\n`;
      }
    }

    notification += `\n---\n`;
    notification += `**Download Summary**:\n`;
    notification += `- Completed: ${downloadResults.stats.completed}\n`;
    notification += `- Failed: ${downloadResults.stats.failed}\n`;
    notification += `- Skipped: ${downloadResults.stats.skipped}\n`;

    await fs.writeFile('./new-papers/NOTIFICATION.md', notification);
    console.log('\nNotification saved to new-papers/NOTIFICATION.md');
  }
}

function formatAuthors(authors) {
  if (!authors || authors.length === 0) return 'Unknown';
  if (!Array.isArray(authors)) return authors;

  if (authors.length <= 3) {
    return authors.join(', ');
  } else {
    return `${authors.slice(0, 3).join(', ')} et al.`;
  }
}

Configure Monitoring Queries

Create the configuration file defining research topics to monitor.

{
  "queries": [
    {
      "topic": "reinforcement learning robotics",
      "databases": ["arxiv", "ieee"]
    },
    {
      "topic": "neural architecture search",
      "databases": ["arxiv", "pubmed"]
    }
  ],
  "lastCheck": null
}

Save this as monitor-config.json in the project directory.

Run Automated Monitoring

Execute the monitor to check for new publications and generate notifications.

// Run monitor
const monitor = new ResearchMonitor();
monitor.checkForNewPapers()
  .then(result => {
    console.log(`\n=== Monitoring Complete ===`);
    console.log(`New papers found: ${result.newPapersFound}`);
  })
  .catch(console.error);

Expected Output:

Checking for new papers since 2025-01-01T00:00:00.000Z...

Searching: "reinforcement learning robotics"
  Found 12 new papers
Searching: "neural architecture search"
  Found 8 new papers

Downloading 20 new papers...
  [Progress] 20/20 complete

Notification saved to new-papers/NOTIFICATION.md

=== Monitoring Complete ===
New papers found: 20

Automation Tips: Set up a cron job to run the monitor daily or weekly using 0 9 * * 1 node example-3-stay-current.js for weekly Monday morning checks. Store the lastCheck timestamp to avoid re-downloading existing papers. Integrate with email notifications by piping NOTIFICATION.md to your email service for automatic alerts.

Integration with Note-Taking Tools

These workflows integrate seamlessly with knowledge management systems:

Obsidian Integration: Save extracted text and citations as markdown files in your vault. Use frontmatter metadata for automatic linking and graph visualization.

Zotero Sync: Export paper metadata and PDFs to Zotero collections for citation management. Use BibTeX export for LaTeX manuscript integration.

Notion Database: Push paper metadata to Notion databases using the API. Create linked databases for topics, authors, and citation networks.

Roam Research: Import papers as pages with citation links. Use block references to connect ideas across papers.

Systematic Review Automation

Combine all three workflows for comprehensive systematic reviews:

Phase 1: Run Example 1 (literature review) with broad search terms to discover initial paper set (100-200 papers).

Phase 2: Execute Example 2 (citation chains) on top 20 most-cited papers from Phase 1 to expand the dataset through citation networks.

Phase 3: Deploy Example 3 (monitoring) to track new publications matching review criteria during the writing period.

Phase 4: Use the MCP server's search functionality to answer specific research questions across the entire corpus with semantic search.

This four-phase approach produces systematic reviews with 300-500 papers in 4-6 hours instead of 4-6 weeks using manual methods.

Knowledge Base Building

Transform downloaded papers into a queryable knowledge base:

Text Extraction: Extract full text from all PDFs using the extraction pipeline. Store in structured JSON format with metadata.

Embedding Generation: Generate text embeddings for semantic search using OpenAI embeddings API. Store in vector database (Pinecone, Weaviate, or Qdrant).

Citation Network: Build graph database of citation relationships using Neo4j or NetworkX. Enable network analysis and recommendation algorithms.

Semantic Search: Query the knowledge base using natural language questions. Retrieve relevant paper sections with citation tracking.

This workflow creates a personal research assistant that answers questions using only papers you've verified and downloaded, eliminating hallucination risks from general-purpose AI models.

Real-World Workflows

Real-World Usage Patterns

Example 1: Literature Review on Specific Topic

Set Up the Literature Review Pipeline

Implement Report Generation

Run the Review

Example 2: Following Citation Chains

Create Citation Chain Explorer

Execute Citation Chain Discovery

Example 3: Staying Current with New Publications

Create Research Monitor Class

Configure Monitoring Queries

Run Automated Monitoring

Integration with Note-Taking Tools

Systematic Review Automation

Knowledge Base Building

Table of Contents