xps

Scenario: Production-Ready Competitor Monitoring

The weekly competitor scraping automation from T2.2 works, but it's fragile. Websites change their structure, servers timeout, rate limiting kicks in, and sometimes the script fails silently. For business intelligence, silent failures mean making decisions on stale data.

This chapter upgrades the competitor monitoring system with enterprise-grade error handling and logging to ensure data reliability and immediate failure detection.

Error Handling Upgrades

Retry on Timeout and Rate Limiting

Web scraping hits rate limits and timeouts regularly. Instead of failing immediately, implement exponential backoff:

for i in {1..3}; do
  curl -s "$URL" && break
  sleep $((2**i))  # Wait 2, 4, 8 seconds
done

This gives the server time to recover and avoids cascading failures.

Alert on Structure Changes

Competitor websites redesign without notice. Detect structural changes by checking expected data fields:

if ! echo "$response" | grep -q "pricing"; then
  echo "Alert: Competitor site structure changed" | mail -s "Scraping Failed" team@company.com
fi

This catches silent failures where the script runs but extracts nothing useful.

Fallback to Previous Data

When scraping fails completely, use the last successful dataset instead of empty results:

if [ $? -ne 0 ]; then
  cp data/last_success.json data/current.json
  log "Using fallback data from $(date -r data/last_success.json)"
fi

Business teams get the most recent reliable data rather than nothing at all.

Logging Enhancements

Comprehensive logging transforms debugging from guesswork into data analysis. Log every scraping attempt with timestamp, target URL, response size, and success status. Track the scraping success rate weekly to detect gradual degradation before it becomes critical. For competitive intelligence, knowing when data was last updated is as important as the data itself.

Example log entry structure:

[2025-01-15 08:00:15] Scraped competitor-a.com: 45KB, success=true, duration=2.3s
[2025-01-15 08:00:20] Scraped competitor-b.com: 0KB, success=false, error=timeout

Health Monitoring

Silent failures are the enemy of business intelligence. Implement data freshness checks that alert if the latest scrape is older than 10 days. Monitor the gap between scheduled runs and actual execution times to detect LaunchAgent issues. Track the percentage of successful scrapes over rolling 30-day windows. If success rate drops below 90%, investigate immediately before the team makes decisions on incomplete data.

Health check command:

find data/ -name "*.json" -mtime +10 | xargs -I {} echo "Stale: {}"

Business Impact

Before: Competitor monitoring runs weekly, sometimes fails silently, team unknowingly uses 3-week-old pricing data.

After: Automated scraping with 95%+ success rate, immediate alerts on failures, always-current intelligence or clearly labeled fallback data.

This reliability upgrade transforms competitive intelligence from "best effort" to "decision-grade" data quality.

Implementation Notes

The key to production-ready scraping is defensive programming. Assume websites will change, assume networks will timeout, assume APIs will be rate-limited. Build retry logic, fallbacks, and alerts into every external integration. Log everything with timestamps for forensic analysis.

For business teams, the difference between toy automation and production systems is simple: production systems never fail silently. They either succeed, retry until success, or alert a human immediately. This chapter's patterns ensure your competitive intelligence automation meets that standard.

Business Management: Enterprise Competitive Intelligence