**License: Pro** - Requires a Pro or Enterprise license.
Load WebApp Probe#
The Load WebApp probe monitors HTTP/HTTPS web application performance by measuring detailed timing metrics throughout the request lifecycle. It provides comprehensive insights into DNS resolution, TCP connection, TLS handshake, server response, and data transfer phases.
Quick Start#
Basic Configuration#
probes:
- name: load_webapp
params:
url: "https://www.example.com"
timeout: 30 # Request timeout in seconds (default: 30)Multiple URL Monitoring#
probes:
- name: production_webapp
type: load_webapp
params:
url: "https://app.example.com"
timeout: 30
- name: staging_webapp
type: load_webapp
params:
url: "https://staging.example.com"
timeout: 45
- name: api_endpoint
type: load_webapp
params:
url: "https://api.example.com/health"
timeout: 15Supported Protocols#
- HTTP: Unencrypted web traffic (http://)
- HTTPS: TLS/SSL encrypted web traffic (https://)
Key Metrics Summary#
| Metric | Description | Use Case |
|---|---|---|
dnstime | DNS resolution time (ms) | DNS server performance, caching effectiveness |
connecttime | TCP connection establishment (ms) | Network latency, firewall delays |
tlstime | TLS handshake duration (ms) | Certificate validation, encryption overhead |
ttfb | Time to First Byte (ms) | Server processing time, backend performance |
total_time | Complete request time (ms) | End-to-end performance, user experience |
Configuration Parameters#
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
url | string | Yes | - | - | Full HTTP/HTTPS URL to monitor |
timeout | integer | No | 30 | 1-300 | Request timeout in seconds |
URL Requirements#
- Must include protocol:
http://orhttps:// - Must be valid URL: Hostname and path properly formatted
- Examples:
https://www.example.comhttps://api.example.com/v1/statushttp://internal-app.local/healthwww.example.com(missing protocol) – invalidftp://files.example.com(unsupported protocol) – invalid
Example Configurations#
Fast API monitoring (short timeout):
probes:
- name: api_health_check
type: load_webapp
params:
url: "https://api.example.com/health"
timeout: 5 # Quick timeout for health checkSlow backend monitoring (long timeout):
probes:
- name: legacy_app
type: load_webapp
params:
url: "https://legacy.example.com/dashboard"
timeout: 60 # Extended timeout for slow applicationCDN performance monitoring:
probes:
- name: cdn_homepage
type: load_webapp
params:
url: "https://www.example.com"
timeout: 30Monitoring Tool Integration#
PRTG Network Monitor#
Access Load WebApp metrics in PRTG JSON format:
# All Load WebApp metrics
curl http://localhost:8080/api/{agentkey}/prtg/metrics
# Configure PRTG HTTP Advanced Sensor:
# - URL: http://agent-host:8080/api/{agentkey}/prtg/metrics
# - Method: POST
# - Request body: {"probe": "load_webapp"}PRTG Channels Available:
- DNS Resolution Time (ms)
- Connect Time (ms)
- TLS Handshake Time (ms)
- Time to First Byte (ms)
- Total Load Time (ms)
PRTG Configuration Example:
{
"prtg": {
"result": [
{
"channel": "DNS Resolution Time",
"value": 12.5,
"unit": "TimeMilliseconds",
"LimitMaxWarning": 100,
"LimitMaxError": 500
},
{
"channel": "Total Load Time",
"value": 245.8,
"unit": "TimeMilliseconds",
"LimitMaxWarning": 3000,
"LimitMaxError": 5000
}
]
}
}Nagios/Icinga#
Access Load WebApp metrics in Nagios format:
# All Load WebApp metrics with performance data
curl http://localhost:8080/api/{agentkey}/nagios/metrics?probe=load_webapp
# Example output:
# OK - WebApp load monitoring active | dnstime=12.5ms ttfb=145.2ms total_time=245.8msNagios Performance Data:
dnstime- DNS resolution time with 100ms warning, 500ms criticalconnecttime- TCP connection time with 200ms warning, 1000ms criticaltlstime- TLS handshake time with 500ms warning, 2000ms criticalttfb- Time to First Byte with 1000ms warning, 5000ms criticaltotal_time- Total request time with 3000ms warning, 10000ms critical
Grafana/Prometheus#
Access metrics in Prometheus-compatible format:
# Prometheus format
curl http://localhost:8080/api/{agentkey}/prometheus/metrics
# Example output:
# load_webapp_dnstime{url="https://www.example.com"} 12.5
# load_webapp_connecttime{url="https://www.example.com"} 45.3
# load_webapp_tlstime{url="https://www.example.com"} 87.5
# load_webapp_ttfb{url="https://www.example.com"} 145.2
# load_webapp_total_time{url="https://www.example.com"} 245.8Grafana Dashboard Queries:
# Total load time over time
load_webapp_total_time{url="https://www.example.com"}
# DNS resolution performance
load_webapp_dnstime{url=~".*"}
# Backend processing time (TTFB)
load_webapp_ttfb{url="https://api.example.com"}
# TLS overhead percentage
(load_webapp_tlstime / load_webapp_total_time) * 100Web Interface#
View Load WebApp metrics in the built-in dashboard:
http://localhost:8080/web/{agentkey}/dashboardFeatures:
- Real-time load time visualization
- Timing phase breakdown (DNS, Connect, TLS, TTFB, Transfer)
- Historical performance trends
- Multi-URL comparison
Use Cases#
Performance Monitoring#
Monitor web application performance to identify:
- Slow DNS resolution (DNS server issues, missing caching)
- Network latency (high connect times)
- TLS overhead (certificate chain validation delays)
- Backend processing delays (high TTFB)
- Transfer bottlenecks (large payloads, slow bandwidth)
Bottleneck Detection#
Identify performance bottlenecks by phase:
Total Time: 2450ms breakdown:
|- DNS Time: 12ms (0.5%) <- Normal
|- Connect Time: 45ms (1.8%) <- Normal
|- TLS Time: 387ms (15.8%) <- HIGH (investigate certificate chain)
|- TTFB: 1856ms (75.8%) <- CRITICAL (backend bottleneck)
'- Transfer: 150ms (6.1%) <- NormalInterpretation:
- DNS/Connect time normal – Network OK
- High TLS time – Certificate validation issue
- Very high TTFB – Backend processing bottleneck
- Transfer time acceptable – Content size reasonable
CDN Performance Analysis#
Monitor Content Delivery Network effectiveness:
probes:
# Origin server (no CDN)
- name: origin_server
type: load_webapp
params:
url: "https://origin.example.com/page.html"
# CDN endpoint
- name: cdn_endpoint
type: load_webapp
params:
url: "https://cdn.example.com/page.html"Compare metrics:
- DNS time should be similar (both resolve quickly)
- Connect time should be lower for CDN (closer to users)
- Total time should be significantly lower for CDN
- TTFB should be minimal for CDN (cached content)
API Health Monitoring#
Monitor REST API endpoint performance:
probes:
- name: api_health
type: load_webapp
params:
url: "https://api.example.com/v1/health"
timeout: 10
- name: api_users
type: load_webapp
params:
url: "https://api.example.com/v1/users/profile"
timeout: 15Track API response times and detect degradation early.
SSL/TLS Certificate Monitoring#
Monitor certificate validation performance:
- Normal TLS time: 50-200ms
- Slow TLS time: 200-500ms (investigate certificate chain)
- Very slow TLS time: >500ms (OCSP stapling issues, revocation checks)
Geographic Performance Testing#
Deploy agents in different regions to compare performance:
# US East agent
probes:
- name: webapp_from_us_east
type: load_webapp
params:
url: "https://www.example.com"
# EU West agent (separate deployment)
probes:
- name: webapp_from_eu_west
type: load_webapp
params:
url: "https://www.example.com"Compare connect times and total times to optimize CDN configuration.
Troubleshooting#
No Metrics Collected#
Check probe status:
# View agent logs with Load WebApp probe debugging
./agent run --authentication-key YOUR_KEY --verbose --debug-modules probe.loadwebappVerify probe is enabled:
# Check configuration
cat agent-config.yaml | grep -A5 "name: load_webapp"DNS Resolution Failures#
Symptom: Error: “DNS resolution failed” or high DNS times
Causes:
- Invalid hostname
- DNS server unreachable
- DNS timeout
Solutions:
Verify hostname resolves:
nslookup www.example.com dig www.example.comCheck DNS server configuration:
# Linux/macOS cat /etc/resolv.conf # Windows ipconfig /allTest with alternative DNS:
# Temporarily use Google DNS nslookup www.example.com 8.8.8.8
Connection Timeouts#
Symptom: Error: “request timed out” or connect time equals timeout
Causes:
- Firewall blocking connection
- Server unreachable
- Network routing issues
- Timeout too short for slow connections
Solutions:
Verify connectivity:
# Test TCP connection telnet www.example.com 443 nc -zv www.example.com 443 # Test with curl curl -v -m 30 https://www.example.comCheck firewall rules:
# Linux (iptables) sudo iptables -L -n # Windows netsh advfirewall show currentprofileIncrease timeout if needed:
params: url: "https://slow-server.example.com" timeout: 60 # Increase from default 30s
SSL/TLS Certificate Errors#
Symptom: Error: “certificate error” or “x509: certificate” errors
Causes:
- Expired certificate
- Self-signed certificate
- Untrusted CA
- Certificate hostname mismatch
- Certificate chain incomplete
Solutions:
Check certificate validity:
# View certificate details openssl s_client -connect www.example.com:443 -showcerts # Check expiration echo | openssl s_client -connect www.example.com:443 2>/dev/null | openssl x509 -noout -datesVerify certificate chain:
# Test full certificate chain curl -v https://www.example.comFor internal/self-signed certificates:
- Note: Current implementation enforces SSL verification (InsecureSkipVerify=false)
- For production use, ensure valid certificates from trusted CA
- For development/testing, consider using valid certificates (Let’s Encrypt is free)
High TTFB (Time to First Byte)#
Symptom: TTFB > 1000ms consistently
Causes:
- Backend server overloaded
- Database query bottlenecks
- Slow application code
- Server-side caching disabled
Solutions:
Monitor backend server resources:
- CPU usage (system probe)
- Memory usage (system probe)
- Disk I/O (system probe)
Analyze application logs for slow queries:
# Check application logs tail -f /var/log/application.log | grep "slow"Enable server-side caching:
- Redis/Memcached for data caching
- Varnish/Nginx for HTTP caching
- CloudFlare/CDN for static content
Database optimization:
- Add indexes for slow queries
- Optimize query patterns
- Enable query caching
HTTP Status Code Errors#
Symptom: Error: “unexpected status code: 404/500/503”
Monitoring behavior:
- Probe only succeeds on HTTP 2xx and 3xx status codes
- HTTP 4xx and 5xx trigger errors
Solutions:
Verify URL is correct:
curl -I https://www.example.com/correct/pathCheck server logs for errors:
# Web server logs tail -f /var/log/nginx/error.log tail -f /var/log/apache2/error.logTest endpoint manually:
# Full request curl -v https://www.example.com
Performance Degradation#
Symptom: Metrics show increasing load times over days/weeks
Analysis approach:
Compare timing phases:
Week 1 vs Week 4: DNS Time: 12ms -> 15ms (+25%) <- Minor Connect Time: 45ms -> 52ms (+15%) <- Minor TLS Time: 87ms -> 95ms (+9%) <- Minor TTFB: 145ms -> 458ms (+216%) <- MAJOR (investigate backend) Transfer: 56ms -> 65ms (+16%) <- MinorIdentify the bottleneck phase:
- DNS degradation – DNS server issues
- Connect degradation – Network issues
- TLS degradation – Certificate/OCSP issues
- TTFB degradation – Backend performance (most common)
- Transfer degradation – Bandwidth or content size increase
Correlate with other metrics:
- Check CPU probe for server load
- Check memory probe for memory leaks
- Check disk probe for I/O bottlenecks
- Check network probe for bandwidth saturation
Performance Considerations#
Collection Overhead#
The Load WebApp probe overhead:
- Network: Full HTTP request per collection (~KB to MB depending on response size)
- CPU: Minimal (HTTP client + timing tracking ~5-10ms)
- Memory: ~2-5 MB per active request
Recommended Intervals#
| Use Case | Interval | Reason |
|---|---|---|
| Critical API monitoring | 30s | Detect issues quickly |
| Standard web monitoring | 60s | Balance accuracy and load |
| Long-term trending | 300s | Reduce network traffic |
Important: Frequent polling can impact target server:
- Generates real traffic to monitored URLs
- Consumes server resources
- May trigger rate limiting
- Consider using
/healthor lightweight endpoints
Response Body Handling#
The probe downloads the complete response body to accurately measure transfer time:
- Small responses (< 100KB): Negligible impact
- Large responses (> 1MB): Consider impact on agent bandwidth
- Very large responses (> 10MB): May want to use dedicated endpoints
Best practice: Monitor lightweight endpoints or specific health check URLs rather than full pages with large assets.
Advanced Configuration#
Multi-Environment Monitoring#
Monitor multiple environments with consistent configuration:
probes:
- name: production_app
type: load_webapp
params:
url: "https://app.example.com"
timeout: 30
- name: staging_app
type: load_webapp
params:
url: "https://staging.example.com"
timeout: 30
- name: development_app
type: load_webapp
params:
url: "https://dev.example.com"
timeout: 30Compare performance across environments to detect configuration issues.
API Endpoint Testing#
Monitor critical API endpoints:
probes:
- name: auth_api
type: load_webapp
params:
url: "https://api.example.com/v1/auth/health"
timeout: 10
- name: users_api
type: load_webapp
params:
url: "https://api.example.com/v1/users/health"
timeout: 10
- name: payments_api
type: load_webapp
params:
url: "https://api.example.com/v1/payments/health"
timeout: 15Track individual microservice performance independently.
Integration with Other Probes#
Combine Load WebApp probe with system probes for comprehensive monitoring:
probes:
# Application performance
- name: webapp_frontend
type: load_webapp
params:
url: "https://www.example.com"
# Server health
- name: cpu
params:
interval: 30
- name: memory
params:
interval: 30
- name: network
params:
interval: 60Correlate application response times with server resource usage.
Security Considerations#
TLS Configuration#
Current implementation:
- TLS verification: Enabled (InsecureSkipVerify=false)
- Minimum TLS version: TLS 1.2
- Certificate validation: Full chain validation required
- Connection reuse: Disabled (DisableKeepAlives=true) for consistent measurements
Best Practices#
- Use HTTPS: Always prefer HTTPS over HTTP for production monitoring
- Valid certificates: Ensure monitored endpoints have valid, trusted certificates
- Secure URLs: Avoid embedding sensitive data in monitored URLs
- Authentication: Use dedicated health check endpoints that don’t require authentication
- Rate limiting: Be mindful of target server rate limits
Authentication#
The Load WebApp probe:
- Requires no authentication for the probe configuration itself
- Does not support HTTP Basic Auth, Bearer tokens, or custom headers (current implementation)
- Monitors public endpoints or endpoints accessible without authentication
- For authenticated endpoints: Consider using dedicated health check endpoints
Future enhancement consideration: HTTP header support for authenticated API monitoring.
Requirements#
Network#
- Outbound HTTP/HTTPS access to monitored URLs
- DNS resolution capability
- Firewall rules allowing connections to target hosts
Agent#
- HTTP strategy enabled for metric access
- Sufficient network bandwidth for full response downloads
- Proper timeout configuration for slow endpoints
Alert Threshold Recommendations#
DNS Time Thresholds#
| Level | Threshold | Description |
|---|---|---|
| Normal | < 50ms | Healthy DNS performance |
| Warning | 50-200ms | Slow DNS, check DNS server |
| Critical | > 200ms | DNS issues, investigate immediately |
Connect Time Thresholds#
| Level | Threshold | Description |
|---|---|---|
| Normal | < 100ms | Good network latency |
| Warning | 100-500ms | High latency, check network |
| Critical | > 500ms | Network issues or distant server |
TLS Time Thresholds#
| Level | Threshold | Description |
|---|---|---|
| Normal | < 200ms | Normal TLS handshake |
| Warning | 200-500ms | Slow handshake, check cert chain |
| Critical | > 500ms | Certificate issues, OCSP problems |
TTFB Thresholds#
| Level | Threshold | Description |
|---|---|---|
| Normal | < 500ms | Fast backend processing |
| Warning | 500-2000ms | Slow backend, investigate |
| Critical | > 2000ms | Backend bottleneck, urgent action |
Total Time Thresholds#
| Level | Threshold | Description |
|---|---|---|
| Normal | < 1000ms | Excellent user experience |
| Warning | 1000-3000ms | Acceptable but monitor |
| Critical | > 3000ms | Poor user experience |
Note: Thresholds should be adjusted based on application requirements and user expectations.