Extension Patterns: Advanced Automation
Advanced techniques for script enhancement: chaining workflows, parallel execution, caching strategies, and error recovery patterns
When Extension Patterns Matter
Basic scripts work for simple tasks. Extension patterns become essential when:
- Performance bottlenecks emerge - Scripts take minutes instead of seconds
- Complex workflows develop - Multi-step processes with dependencies
- API costs increase - Redundant calls drain budgets
- Reliability becomes critical - Need graceful error handling
- Scale demands optimization - Processing hundreds of files or requests
Extension patterns transform single-purpose scripts into production-grade automation systems.
Pattern 1: Script Chaining
The Concept
Script chaining connects multiple specialized scripts into complex workflows. Each script does one thing well, then passes results to the next stage.
Philosophy: Compose simple tools into powerful pipelines rather than building monolithic scripts. Each stage remains testable, reusable, and maintainable.
Basic Sequential Chaining
# 3-stage pipeline
./download.sh | ./process.sh | ./analyze.sh > results.txtEach script:
- Reads from stdin (previous stage output)
- Writes to stdout (next stage input)
- Handles errors independently
Conditional Chaining
# Success-dependent execution
./download.sh && ./process.sh && ./report.sh
# Error fallback
./primary.sh || ./backup.sh || ./emergency.shOperators explained:
&&- Run next only if previous succeeds (exit code 0)||- Run next only if previous fails (non-zero exit code)|- Always pass output regardless of success
Domain-Specific Applications
Research Paper Pipeline:
# Download → Extract → Synthesize
./arxiv-search.sh "platform economics" | \
./extract-citations.sh | \
./generate-bibliography.sh > literature.bibEach stage:
- Search: Fetch paper metadata from arXiv API
- Extract: Parse PDFs for citation data
- Synthesize: Format as BibTeX bibliography
CI/CD Pipeline:
# Test → Build → Deploy (only if tests pass)
./run-tests.sh && ./build-docker.sh && ./deploy-staging.shSafety guarantee: Build never runs with failing tests, deployment never runs with failed builds.
Competitive Analysis:
# Scrape → Clean → Analyze → Visualize
./scrape-competitors.sh | \
./clean-data.sh | \
./analyze-trends.sh | \
./generate-report.shPipeline benefits: Intermediate outputs can be inspected, individual stages can be rerun independently.
Pattern 2: Parallel Execution
The Concept
Parallel execution runs independent scripts simultaneously, reducing total workflow time from sum of durations to maximum of individual durations.
Performance Math: Three 5-minute scripts running sequentially take 15 minutes. Run in parallel, they complete in 5 minutes—a 3× speedup with zero code changes.
Basic Parallel Pattern
# Launch 3 scripts in background, wait for all
./task1.sh & ./task2.sh & ./task3.sh &
waitSyntax breakdown:
&- Run script in background (returns control immediately)wait- Block until all background jobs complete
Real-World Performance Gains
Example: Multi-source data collection
| Approach | Time | Scripts | Calculation |
|---|---|---|---|
| Sequential | 15 min | 3 × 5 min each | 5 + 5 + 5 = 15 |
| Parallel | 5 min | max(5, 5, 5) | 5 |
| Speedup | 3× | — | 15 ÷ 5 = 3 |
When to Use Parallel Execution
Independent Tasks: Parallel execution requires task independence—scripts must not depend on each other's outputs. Use parallel when fetching from different APIs, processing separate files, or running isolated calculations. Avoid parallel when one task requires another's results or when shared resources create race conditions.
Domain-Specific Parallel Examples
Economics Research:
# Download papers from 3 databases simultaneously
./arxiv-fetch.sh & ./jstor-fetch.sh & ./ssrn-fetch.sh &
waitSoftware Testing:
# Run test suites in parallel
./unit-tests.sh & ./integration-tests.sh & ./e2e-tests.sh &
waitBusiness Monitoring:
# Monitor competitors concurrently
./track-competitor-a.sh & ./track-competitor-b.sh & ./track-competitor-c.sh &
waitPattern 3: Caching for Speed
The Concept
Caching stores expensive operation results—API calls, computations, file downloads—to avoid redundant work. First run pays full cost, subsequent runs return instantly.
Performance Impact: API calls typically take 500-2000ms. Cache reads take 10-50ms—a 10-50× speedup. For development workflows with repeated runs, caching reduces iteration time from minutes to seconds.
Simple Cache Pattern
# Check cache first, fetch only if missing
[ -f cache.json ] && cat cache.json && exit
curl "$API_ENDPOINT" | tee cache.jsonPattern breakdown:
[ -f cache.json ]- Test if cache file existscat cache.json && exit- Return cached data if foundtee cache.json- Save API response to cache while outputting
Cache Invalidation Strategies
Expire cache after duration:
# Delete cache older than 1 hour
find cache.json -mmin +60 -delete 2>/dev/null
curl "$API_ENDPOINT" | tee cache.jsonUse for: Data that updates periodically (stock prices, weather, news feeds)
Invalidate when source changes:
# Recompute if input file is newer than cache
[ cache.json -nt input.csv ] || ./expensive-analysis.sh < input.csv > cache.json
cat cache.jsonUse for: Derived data, analysis results, processed outputs
User controls cache refresh:
# Support --refresh flag
[ "$1" = "--refresh" ] && rm -f cache.json
[ -f cache.json ] && cat cache.json && exit
curl "$API_ENDPOINT" | tee cache.jsonUse for: Development, debugging, forcing fresh data
Caching Benefits
Speed comparison:
| Operation | Without Cache | With Cache | Speedup |
|---|---|---|---|
| API call | 1000ms | 50ms | 20× |
| File download | 5000ms | 10ms | 500× |
| LLM analysis | 3000ms | 100ms | 30× |
Cost reduction:
- Development: 100 test runs × $0.01 per API call = $1.00 without cache, $0.01 with cache
- Production: 1000 daily requests → 100 unique requests cached = 90% cost reduction
Offline capability:
Cached data enables scripts to function without network connectivity, critical for:
- Development on unstable connections
- Demo environments without API access
- Disaster recovery scenarios
Combining Patterns
Advanced Research Pipeline
Real-world workflows benefit from multiple patterns working together:
Parallel Download with Caching
# Fetch from 3 sources simultaneously, cache each
./econ-papers.sh & ./cs-papers.sh & ./stats-papers.sh &
waitBenefit: First run takes 5 minutes (parallel). Subsequent runs take 2 seconds (cache).
Chain Processing
# Deduplicate → Extract → Analyze → Report
cat *.json | ./deduplicate.sh | ./extract-data.sh | ./analyze.sh | ./report.shBenefit: Each stage processes only unique papers from cached sources.
Performance Impact
| Approach | First Run | Subsequent Runs | Speedup |
|---|---|---|---|
| No patterns | 30 min | 30 min | 1× |
| Parallel only | 10 min | 10 min | 3× |
| Parallel + cache | 10 min | 2 min | 15× |
| All patterns | 6 min | 30 sec | 60× |
Pattern Selection Guide
Script Chaining: Use when building complex workflows from simple components. Best for sequential dependencies where each stage transforms data before passing to the next. Compose specialized scripts rather than writing monolithic code—easier to test, debug, and reuse individual stages.
Parallel Execution: Use when tasks are independent and performance matters. Essential for batch processing, multi-source data collection, and any workflow where scripts don't depend on each other's outputs. Avoid when tasks share resources or have sequential dependencies.
Caching: Use when making repeated API calls for the same data, especially during development and testing. Critical for cost reduction and speed optimization. Implement when iteration speed affects productivity or when API rate limits constrain workflows.
Advanced Considerations
Error Handling in Chains
Stop on first error:
set -e # Exit immediately if any command fails
./stage1.sh | ./stage2.sh | ./stage3.shFallback behavior:
# Try primary, fall back to secondary
./primary.sh || ./secondary.sh || echo "All methods failed"Resource Management in Parallel
Limit concurrent processes:
- Prevent system overload by capping parallel executions
- Use GNU
parallelfor sophisticated job control - Monitor CPU, memory, and network bandwidth
Respect API rate limits:
- Add delays between parallel requests
- Implement token bucket algorithms
- Queue requests to stay within limits
Cache Strategy Evolution
Time-to-Live (TTL) expiration:
- Set appropriate cache lifetimes based on data volatility
- Stock prices: 1 minute
- Weather forecasts: 1 hour
- Research papers: 1 week
Least Recently Used (LRU) eviction:
- Limit cache size to prevent disk exhaustion
- Remove oldest unused entries first
- Balance between hit rate and storage
Versioning for data changes:
- Include version identifiers in cache keys
- Invalidate automatically when schemas change
- Support migration between cache versions
Next Steps
Extension patterns transform basic scripts into production automation:
- Start simple: Add one pattern to existing scripts
- Measure impact: Compare performance before and after
- Iterate gradually: Combine patterns as complexity grows
- Monitor resources: Watch for bottlenecks and optimize
The next chapter covers troubleshooting common script failures and debugging techniques.
Business Management Automation
5 AI automation scripts for business managers: competitor monitoring, news digests, customer feedback analysis, financial reports, and meeting prep
Troubleshooting: Common Issues
Solutions for common automation problems: API authentication errors, permission issues, environment variables, rate limiting, and debugging techniques