Extension Patterns: Advanced Automation

Advanced techniques for script enhancement: chaining workflows, parallel execution, caching strategies, and error recovery patterns

When Extension Patterns Matter

Basic scripts work for simple tasks. Extension patterns become essential when:

  • Performance bottlenecks emerge - Scripts take minutes instead of seconds
  • Complex workflows develop - Multi-step processes with dependencies
  • API costs increase - Redundant calls drain budgets
  • Reliability becomes critical - Need graceful error handling
  • Scale demands optimization - Processing hundreds of files or requests

Extension patterns transform single-purpose scripts into production-grade automation systems.

Pattern 1: Script Chaining

The Concept

Script chaining connects multiple specialized scripts into complex workflows. Each script does one thing well, then passes results to the next stage.

Philosophy: Compose simple tools into powerful pipelines rather than building monolithic scripts. Each stage remains testable, reusable, and maintainable.

Basic Sequential Chaining

# 3-stage pipeline
./download.sh | ./process.sh | ./analyze.sh > results.txt

Each script:

  • Reads from stdin (previous stage output)
  • Writes to stdout (next stage input)
  • Handles errors independently

Conditional Chaining

# Success-dependent execution
./download.sh && ./process.sh && ./report.sh

# Error fallback
./primary.sh || ./backup.sh || ./emergency.sh

Operators explained:

  • && - Run next only if previous succeeds (exit code 0)
  • || - Run next only if previous fails (non-zero exit code)
  • | - Always pass output regardless of success

Domain-Specific Applications

Research Paper Pipeline:

# Download → Extract → Synthesize
./arxiv-search.sh "platform economics" | \
  ./extract-citations.sh | \
  ./generate-bibliography.sh > literature.bib

Each stage:

  1. Search: Fetch paper metadata from arXiv API
  2. Extract: Parse PDFs for citation data
  3. Synthesize: Format as BibTeX bibliography

CI/CD Pipeline:

# Test → Build → Deploy (only if tests pass)
./run-tests.sh && ./build-docker.sh && ./deploy-staging.sh

Safety guarantee: Build never runs with failing tests, deployment never runs with failed builds.

Competitive Analysis:

# Scrape → Clean → Analyze → Visualize
./scrape-competitors.sh | \
  ./clean-data.sh | \
  ./analyze-trends.sh | \
  ./generate-report.sh

Pipeline benefits: Intermediate outputs can be inspected, individual stages can be rerun independently.

Pattern 2: Parallel Execution

The Concept

Parallel execution runs independent scripts simultaneously, reducing total workflow time from sum of durations to maximum of individual durations.

Performance Math: Three 5-minute scripts running sequentially take 15 minutes. Run in parallel, they complete in 5 minutes—a 3× speedup with zero code changes.

Basic Parallel Pattern

# Launch 3 scripts in background, wait for all
./task1.sh & ./task2.sh & ./task3.sh &
wait

Syntax breakdown:

  • & - Run script in background (returns control immediately)
  • wait - Block until all background jobs complete

Real-World Performance Gains

Example: Multi-source data collection

ApproachTimeScriptsCalculation
Sequential15 min3 × 5 min each5 + 5 + 5 = 15
Parallel5 minmax(5, 5, 5)5
Speedup15 ÷ 5 = 3

When to Use Parallel Execution

Independent Tasks: Parallel execution requires task independence—scripts must not depend on each other's outputs. Use parallel when fetching from different APIs, processing separate files, or running isolated calculations. Avoid parallel when one task requires another's results or when shared resources create race conditions.

Domain-Specific Parallel Examples

Economics Research:

# Download papers from 3 databases simultaneously
./arxiv-fetch.sh & ./jstor-fetch.sh & ./ssrn-fetch.sh &
wait

Software Testing:

# Run test suites in parallel
./unit-tests.sh & ./integration-tests.sh & ./e2e-tests.sh &
wait

Business Monitoring:

# Monitor competitors concurrently
./track-competitor-a.sh & ./track-competitor-b.sh & ./track-competitor-c.sh &
wait

Pattern 3: Caching for Speed

The Concept

Caching stores expensive operation results—API calls, computations, file downloads—to avoid redundant work. First run pays full cost, subsequent runs return instantly.

Performance Impact: API calls typically take 500-2000ms. Cache reads take 10-50ms—a 10-50× speedup. For development workflows with repeated runs, caching reduces iteration time from minutes to seconds.

Simple Cache Pattern

# Check cache first, fetch only if missing
[ -f cache.json ] && cat cache.json && exit
curl "$API_ENDPOINT" | tee cache.json

Pattern breakdown:

  • [ -f cache.json ] - Test if cache file exists
  • cat cache.json && exit - Return cached data if found
  • tee cache.json - Save API response to cache while outputting

Cache Invalidation Strategies

Expire cache after duration:

# Delete cache older than 1 hour
find cache.json -mmin +60 -delete 2>/dev/null
curl "$API_ENDPOINT" | tee cache.json

Use for: Data that updates periodically (stock prices, weather, news feeds)

Invalidate when source changes:

# Recompute if input file is newer than cache
[ cache.json -nt input.csv ] || ./expensive-analysis.sh < input.csv > cache.json
cat cache.json

Use for: Derived data, analysis results, processed outputs

User controls cache refresh:

# Support --refresh flag
[ "$1" = "--refresh" ] && rm -f cache.json
[ -f cache.json ] && cat cache.json && exit
curl "$API_ENDPOINT" | tee cache.json

Use for: Development, debugging, forcing fresh data

Caching Benefits

Speed comparison:

OperationWithout CacheWith CacheSpeedup
API call1000ms50ms20×
File download5000ms10ms500×
LLM analysis3000ms100ms30×

Cost reduction:

  • Development: 100 test runs × $0.01 per API call = $1.00 without cache, $0.01 with cache
  • Production: 1000 daily requests → 100 unique requests cached = 90% cost reduction

Offline capability:

Cached data enables scripts to function without network connectivity, critical for:

  • Development on unstable connections
  • Demo environments without API access
  • Disaster recovery scenarios

Combining Patterns

Advanced Research Pipeline

Real-world workflows benefit from multiple patterns working together:

Parallel Download with Caching

# Fetch from 3 sources simultaneously, cache each
./econ-papers.sh & ./cs-papers.sh & ./stats-papers.sh &
wait

Benefit: First run takes 5 minutes (parallel). Subsequent runs take 2 seconds (cache).

Chain Processing

# Deduplicate → Extract → Analyze → Report
cat *.json | ./deduplicate.sh | ./extract-data.sh | ./analyze.sh | ./report.sh

Benefit: Each stage processes only unique papers from cached sources.

Performance Impact

ApproachFirst RunSubsequent RunsSpeedup
No patterns30 min30 min
Parallel only10 min10 min
Parallel + cache10 min2 min15×
All patterns6 min30 sec60×

Pattern Selection Guide

Script Chaining: Use when building complex workflows from simple components. Best for sequential dependencies where each stage transforms data before passing to the next. Compose specialized scripts rather than writing monolithic code—easier to test, debug, and reuse individual stages.

Parallel Execution: Use when tasks are independent and performance matters. Essential for batch processing, multi-source data collection, and any workflow where scripts don't depend on each other's outputs. Avoid when tasks share resources or have sequential dependencies.

Caching: Use when making repeated API calls for the same data, especially during development and testing. Critical for cost reduction and speed optimization. Implement when iteration speed affects productivity or when API rate limits constrain workflows.

Advanced Considerations

Error Handling in Chains

Stop on first error:

set -e  # Exit immediately if any command fails
./stage1.sh | ./stage2.sh | ./stage3.sh

Fallback behavior:

# Try primary, fall back to secondary
./primary.sh || ./secondary.sh || echo "All methods failed"

Resource Management in Parallel

Limit concurrent processes:

  • Prevent system overload by capping parallel executions
  • Use GNU parallel for sophisticated job control
  • Monitor CPU, memory, and network bandwidth

Respect API rate limits:

  • Add delays between parallel requests
  • Implement token bucket algorithms
  • Queue requests to stay within limits

Cache Strategy Evolution

Time-to-Live (TTL) expiration:

  • Set appropriate cache lifetimes based on data volatility
  • Stock prices: 1 minute
  • Weather forecasts: 1 hour
  • Research papers: 1 week

Least Recently Used (LRU) eviction:

  • Limit cache size to prevent disk exhaustion
  • Remove oldest unused entries first
  • Balance between hit rate and storage

Versioning for data changes:

  • Include version identifiers in cache keys
  • Invalidate automatically when schemas change
  • Support migration between cache versions

Next Steps

Extension patterns transform basic scripts into production automation:

  • Start simple: Add one pattern to existing scripts
  • Measure impact: Compare performance before and after
  • Iterate gradually: Combine patterns as complexity grows
  • Monitor resources: Watch for bottlenecks and optimize

The next chapter covers troubleshooting common script failures and debugging techniques.