End-to-End Example: Automated Literature Review
Complete 8-step literature review walkthrough from search to final document (2 hours vs 40 hours)
Execute a complete literature review from scratch. This is the integration showcase—every component working together.
Goal: Comprehensive literature review on "transformer models in NLP"
Timeline: 2 hours (vs. 40 hours manual)
Expected Output: 3,000-word literature review with 50+ citations
Step 1: Generate Targeted Search Queries (10 minutes)
Use Gemini's brainstorming capability to create diverse search queries covering different aspects of the topic.
// step1-generate-queries.ts
async function generateSearchQueries(topic: string) {
const queries = await mcp.invoke('brainstorm', {
prompt: `Generate 10 diverse academic search queries for: ${topic}
Vary by:
- Specificity (broad to narrow)
- Sub-domains (e.g., efficiency, applications, theory)
- Methodological approaches
- Time periods (foundational vs. recent)
Return queries optimized for academic database search.`,
domain: 'research',
methodology: 'divergent',
ideaCount: 10,
includeAnalysis: true
});
console.log('Generated search queries:');
queries.ideas.forEach((q, i) => {
console.log(`${i + 1}. ${q.text}`);
console.log(` Rationale: ${q.rationale}\n`);
});
return queries.ideas.map(q => q.text);
}
const searchQueries = await generateSearchQueries('transformer models in NLP');Expected Output:
1. "transformer models natural language processing"
Rationale: Broad foundational query
2. "attention mechanisms BERT GPT architecture"
Rationale: Specific model architectures
3. "efficient transformers linear attention"
Rationale: Focus on efficiency improvements
4. "transformer applications machine translation summarization"
Rationale: Application-specific research
5. "transformer training optimization techniques"
Rationale: Methodological focus
... (5 more)Step 2: Search Multiple Databases (30 minutes)
Execute parallel searches across arXiv, PubMed, IEEE, ACM, and JSTOR with rate limiting.
// step2-multi-database-search.ts
async function searchAllDatabases(queries: string[]) {
const databases = [
{ name: 'arxiv', category: 'cs.CL' },
{ name: 'pubmed', category: null },
{ name: 'ieee', category: 'Computer Science' },
{ name: 'acm', category: 'Computing methodologies' },
{ name: 'jstor', category: 'Computer Science' }
];
const allResults = [];
for (const query of queries) {
console.log(`\nSearching for: "${query}"`);
// Parallel search across databases
const dbResults = await Promise.all(
databases.map(async (db) => {
try {
const results = await mcp.invoke('search_database', {
database: db.name,
query: query,
category: db.category,
year_from: 2017, // Transformers paper published 2017
max_results: 50
});
console.log(` ${db.name}: ${results.length} papers found`);
return results;
} catch (error) {
console.error(` ${db.name}: Search failed - ${error.message}`);
return [];
}
})
);
allResults.push(...dbResults.flat());
// Rate limiting: wait between queries
await delay(2000);
}
console.log(`\nTotal papers found: ${allResults.length}`);
return allResults;
}
const rawResults = await searchAllDatabases(searchQueries);Expected Output:
Searching for: "transformer models natural language processing"
arxiv: 127 papers found
pubmed: 34 papers found
ieee: 89 papers found
acm: 56 papers found
jstor: 23 papers found
Searching for: "attention mechanisms BERT GPT architecture"
arxiv: 98 papers found
pubmed: 12 papers found
ieee: 67 papers found
acm: 45 papers found
jstor: 8 papers found
... (8 more queries)
Total papers found: 1,247 papersStep 3: Deduplication and Filtering (15 minutes)
Remove duplicates using DOI matching and fuzzy title matching, then filter by relevance using Gemini.
// step3-deduplicate-filter.ts
async function deduplicateAndFilter(papers: Paper[]) {
// Step 3a: Remove exact duplicates by DOI
const uniqueByDOI = new Map();
papers.forEach(p => {
if (p.doi && !uniqueByDOI.has(p.doi)) {
uniqueByDOI.set(p.doi, p);
}
});
console.log(`After DOI deduplication: ${uniqueByDOI.size} papers`);
// Step 3b: Fuzzy matching for papers without DOI
const withoutDOI = papers.filter(p => !p.doi);
const fuzzyDuplicates = await detectFuzzyDuplicates(withoutDOI);
// Step 3c: Merge deduplicated sets
const uniquePapers = [
...Array.from(uniqueByDOI.values()),
...withoutDOI.filter(p => !fuzzyDuplicates.has(p.title))
];
console.log(`After fuzzy deduplication: ${uniquePapers.length} papers`);
// Step 3d: AI-powered relevance filtering
console.log('Filtering by relevance with Gemini...');
const filtered = await mcp.invoke('ask-gemini', {
prompt: `Filter these papers by relevance to "transformer models in NLP".
Criteria:
- Focus on transformer architecture, attention mechanisms, or major models (BERT, GPT, T5, etc.)
- Exclude: purely medical NLP without model innovation, non-transformer models
- Include: theoretical papers, efficiency improvements, novel applications
Return papers with relevance score >= 7/10.
Papers (titles and abstracts):
${JSON.stringify(uniquePapers.map(p => ({
id: p.id,
title: p.title,
abstract: p.abstract,
year: p.year
})))}`,
model: 'gemini-2.5-pro'
});
console.log(`After relevance filtering: ${filtered.papers.length} papers`);
return filtered.papers;
}
const filteredPapers = await deduplicateAndFilter(rawResults);Expected Output:
After DOI deduplication: 892 papers
After fuzzy deduplication: 867 papers
Filtering by relevance with Gemini...
After relevance filtering: 287 papers (relevance >= 7/10)Step 4: Download PDFs (40 minutes)
Download top 100 papers sorted by relevance and recency with concurrent downloads and rate limiting.
// step4-download-pdfs.ts
async function downloadRelevantPDFs(papers: Paper[]) {
const downloadDir = 'projects/transformer-nlp-review/papers';
await fs.mkdir(downloadDir, { recursive: true });
// Sort by relevance and year (prioritize recent, highly relevant)
const sorted = papers.sort((a, b) => {
if (b.relevance_score !== a.relevance_score) {
return b.relevance_score - a.relevance_score;
}
return b.year - a.year;
});
// Download top 100 papers (or fewer if unavailable)
const targetCount = Math.min(100, sorted.length);
console.log(`Attempting to download ${targetCount} PDFs...`);
const downloaded = [];
const failed = [];
// Concurrent downloads with rate limiting
const concurrency = 3;
for (let i = 0; i < sorted.length && downloaded.length < targetCount; i += concurrency) {
const batch = sorted.slice(i, i + concurrency);
const results = await Promise.allSettled(
batch.map(async (paper) => {
try {
const filename = sanitizeFilename(
`${paper.authors[0]?.split(' ').pop()}_${paper.year}_${paper.title}.pdf`
).slice(0, 150); // Limit filename length
await mcp.invoke('download_pdf', {
url: paper.pdf_url,
save_path: `${downloadDir}/${filename}`,
timeout: 120000
});
return { paper, filename, status: 'success' };
} catch (error) {
return { paper, error: error.message, status: 'failed' };
}
})
);
results.forEach(result => {
if (result.status === 'fulfilled' && result.value.status === 'success') {
downloaded.push(result.value);
console.log(`✓ Downloaded: ${result.value.filename}`);
} else {
failed.push(result.status === 'fulfilled' ? result.value : { error: 'Unknown error' });
console.log(`✗ Failed: ${result.value?.paper?.title || 'Unknown'}`);
}
});
// Rate limiting delay
if (i + concurrency < sorted.length) {
await delay(5000); // 5 seconds between batches
}
}
console.log(`\nDownload summary:`);
console.log(` Success: ${downloaded.length} PDFs`);
console.log(` Failed: ${failed.length} papers`);
// Save metadata
await fs.writeFile(
`${downloadDir}/download-metadata.json`,
JSON.stringify({ downloaded, failed }, null, 2)
);
return downloaded;
}
const downloadedPDFs = await downloadRelevantPDFs(filteredPapers);Expected Output:
Attempting to download 100 PDFs...
✓ Downloaded: Vaswani_2017_Attention_Is_All_You_Need.pdf
✓ Downloaded: Devlin_2019_BERT_Pretraining_Deep_Bidirectional.pdf
✓ Downloaded: Brown_2020_Language_Models_Few_Shot_Learners.pdf
✗ Failed: Some_Paper_Title (403 Forbidden)
✓ Downloaded: Raffel_2020_Exploring_Limits_Transfer_Learning.pdf
... (96 more)
Download summary:
Success: 78 PDFs
Failed: 22 papersStep 5: Extract Full-Text and Citations (20 minutes)
Extract full text, citations, sections, figures, and tables from all downloaded PDFs.
// step5-extract-content.ts
async function extractContentFromPDFs(downloads: DownloadedPaper[]) {
console.log(`Extracting content from ${downloads.length} PDFs...`);
const extracted = await Promise.all(
downloads.map(async (download, index) => {
try {
console.log(`[${index + 1}/${downloads.length}] Extracting: ${download.filename}`);
const content = await mcp.invoke('extract_pdf_content', {
pdf_path: download.save_path,
extract_citations: true,
extract_sections: true,
extract_figures: true,
extract_tables: true
});
return {
paper: download.paper,
filename: download.filename,
...content,
extraction_status: 'success'
};
} catch (error) {
console.error(` Failed to extract: ${error.message}`);
return {
paper: download.paper,
filename: download.filename,
extraction_status: 'failed',
error: error.message
};
}
})
);
const successful = extracted.filter(e => e.extraction_status === 'success');
console.log(`\nExtraction complete: ${successful.length}/${downloads.length} successful`);
// Save extracted content
await fs.writeFile(
'projects/transformer-nlp-review/extracted-content.json',
JSON.stringify(extracted, null, 2)
);
return successful;
}
const extractedContent = await extractContentFromPDFs(downloadedPDFs);Expected Output:
Extracting content from 78 PDFs...
[1/78] Extracting: Vaswani_2017_Attention_Is_All_You_Need.pdf
✓ Full text: 41,237 characters
✓ Citations: 37 references
✓ Sections: Abstract, Introduction, Background, Model Architecture, ...
✓ Figures: 4 figures extracted
[2/78] Extracting: Devlin_2019_BERT_Pretraining_Deep_Bidirectional.pdf
✓ Full text: 38,914 characters
✓ Citations: 42 references
...
Extraction complete: 76/78 successfulStep 6: Thematic Analysis and Categorization (15 minutes)
Use Gemini to identify major research themes, methodological patterns, key innovations, and research gaps.
// step6-thematic-analysis.ts
async function performThematicAnalysis(papers: ExtractedPaper[]) {
console.log('Performing thematic analysis with Gemini...');
// Prepare paper summaries for analysis
const paperSummaries = papers.map(p => ({
title: p.paper.title,
authors: p.paper.authors,
year: p.paper.year,
abstract: p.paper.abstract,
key_sections: p.sections?.map(s => ({ title: s.title, preview: s.content.slice(0, 500) })),
methodology: p.methodology_summary,
findings: p.key_findings
}));
const thematicAnalysis = await mcp.invoke('ask-gemini', {
prompt: `Conduct a comprehensive thematic analysis of these ${papers.length} papers on transformer models in NLP.
Your analysis should identify:
1. **Major Research Themes** (5-7 themes)
- Group papers by research focus
- Identify theme evolution over time
2. **Methodological Approaches**
- Categorize by experimental design
- Identify common datasets and benchmarks
3. **Key Innovations**
- Groundbreaking papers that shifted the field
- Novel techniques and their impact
4. **Research Gaps**
- Under-explored areas
- Methodological limitations
- Future research directions
5. **Performance Trends**
- How metrics evolved (accuracy, efficiency, etc.)
- Current state-of-the-art
6. **Application Domains**
- Where transformers have been successfully applied
- Domain-specific innovations
Papers:
${JSON.stringify(paperSummaries, null, 2)}
Provide detailed analysis with specific paper citations for each point.`,
model: 'gemini-2.5-pro'
});
// Save analysis
await fs.writeFile(
'projects/transformer-nlp-review/thematic-analysis.json',
JSON.stringify(thematicAnalysis, null, 2)
);
console.log('Thematic analysis complete. Themes identified:');
thematicAnalysis.themes.forEach((theme, i) => {
console.log(`${i + 1}. ${theme.name} (${theme.paper_count} papers)`);
});
return thematicAnalysis;
}
const themes = await performThematicAnalysis(extractedContent);Expected Output:
Performing thematic analysis with Gemini...
Thematic analysis complete. Themes identified:
1. Foundational Transformer Architectures (12 papers)
2. Pre-training and Transfer Learning (18 papers)
3. Attention Mechanism Efficiency (15 papers)
4. Model Scaling and Large Language Models (14 papers)
5. Domain-Specific Applications (11 papers)
6. Multilingual and Cross-lingual Models (9 papers)
7. Interpretability and Analysis (7 papers)Step 7: Synthesize Findings with Citations (20 minutes)
Generate comprehensive literature review synthesis with proper academic citations.
// step7-synthesize-findings.ts
async function synthesizeFindings(papers: ExtractedPaper[], themes: ThematicAnalysis) {
console.log('Synthesizing literature review with Gemini...');
const synthesis = await mcp.invoke('ask-gemini', {
prompt: `Write a comprehensive literature review synthesis on transformer models in NLP.
Use this structure:
1. **Introduction** (300 words)
- Brief history of transformers in NLP
- Significance of the architecture
- Scope of this review
2. **Thematic Analysis** (1,500 words)
For each theme from the analysis:
- Overview of the theme
- Key papers and their contributions (cite specifically)
- Evolution of approaches within the theme
- Current state and trends
3. **Cross-Cutting Findings** (500 words)
- Common methodological patterns
- Convergent findings across themes
- Contradictions and debates
4. **Research Gaps and Future Directions** (400 words)
- Under-explored areas
- Methodological limitations
- Promising future directions
5. **Conclusion** (300 words)
- Summary of key insights
- Impact on the field
- Outlook
CITATION REQUIREMENTS:
- Use in-text citations in format: (Author, Year)
- Cite specific papers from the provided set
- Ensure every major claim is supported by citation
- Include paper titles when first introducing key works
Available papers with full details:
${JSON.stringify(papers.map(p => ({
title: p.paper.title,
authors: p.paper.authors,
year: p.paper.year,
abstract: p.paper.abstract,
key_findings: p.key_findings,
methodology: p.methodology_summary
})), null, 2)}
Themes and groupings:
${JSON.stringify(themes, null, 2)}
Write in academic style, suitable for a journal literature review.`,
model: 'gemini-2.5-pro'
});
return synthesis.review_text;
}
const reviewText = await synthesizeFindings(extractedContent, themes);Step 8: Generate Literature Review Draft (15 minutes)
Compile final document with bibliography, appendices, and metadata.
// step8-generate-final-review.ts
async function generateFinalReview(synthesis: string, papers: ExtractedPaper[]) {
console.log('Generating final literature review document...');
// Extract all cited papers from the synthesis
const citedPapers = extractCitations(synthesis, papers);
// Generate bibliography
const bibliography = await mcp.invoke('generate_bibliography', {
papers: citedPapers,
style: 'apa',
sort_by: 'author'
});
// Compile final document
const finalReview = `
# Transformer Models in Natural Language Processing: A Comprehensive Literature Review
**Generated**: ${new Date().toLocaleDateString()}
**Papers Reviewed**: ${papers.length}
**Databases Searched**: arXiv, PubMed, IEEE Xplore, ACM Digital Library, JSTOR
**Time Period**: 2017-2025
---
${synthesis}
---
## References
${bibliography}
---
## Appendix A: Thematic Distribution
${generateThematicChart(papers)}
## Appendix B: Chronological Distribution
${generateYearChart(papers)}
## Appendix C: Database Sources
${generateSourceChart(papers)}
`;
// Save final review
const outputPath = 'projects/transformer-nlp-review/literature-review.md';
await fs.writeFile(outputPath, finalReview);
console.log(`\n✓ Literature review generated: ${outputPath}`);
console.log(` Word count: ${countWords(finalReview)} words`);
console.log(` Citations: ${citedPapers.length} papers`);
return outputPath;
}
const finalPath = await generateFinalReview(reviewText, extractedContent);Expected Output:
Generating final literature review document...
✓ Literature review generated: projects/transformer-nlp-review/literature-review.md
Word count: 3,247 words
Citations: 52 papers
File saved to: /Users/you/research-workspace/projects/transformer-nlp-review/literature-review.mdComplete Workflow Summary
Total Time: 2 hours 5 minutes
Breakdown:
- Step 1: Generate search queries (10 min)
- Step 2: Search databases (30 min)
- Step 3: Deduplicate and filter (15 min)
- Step 4: Download PDFs (40 min)
- Step 5: Extract content (20 min)
- Step 6: Thematic analysis (15 min)
- Step 7: Synthesize findings (20 min)
- Step 8: Generate final review (15 min)
Results:
- Papers searched: 1,247 across 5 databases
- Papers filtered: 287 highly relevant
- PDFs downloaded: 78 full-text papers
- Content extracted: 76 papers successfully processed
- Final output: 3,247-word literature review with 52 citations
Productivity Comparison
Manual approach requires 40+ hours over 2-3 weeks. Automated approach completed in 2 hours and 5 minutes with superior quality including more papers reviewed, better organization, and complete reproducibility. This represents a 19x productivity gain.