**License: Pro** - Requires a Pro or Enterprise license.

Overview#

The Redfish probe monitors server and storage hardware through the DMTF Redfish API, providing comprehensive health, capacity, and performance metrics. It supports a wide range of hardware platforms including Dell iDRAC, Dell PowerVault ME series, HPE iLO, Lenovo XClarity, and Cisco UCS.

The probe automatically detects the hardware vendor via the Redfish API and adapts its metric collection accordingly. No vendor-specific configuration is required.

Collected Data:

  • Controller, drive, pool, and volume health status
  • Storage capacity (total, allocated, used, free) for pools and volumes
  • I/O performance metrics (reads, writes, latency, throughput)
  • Hardware event logs (critical, warning, informational entries)
  • Drive failure predictions and hotspare status
  • Encryption status for volumes and drives

Supported Hardware:

  • Dell iDRAC (PowerEdge servers)
  • Dell PowerVault ME series (ME5024, etc.)
  • HPE iLO (ProLiant servers)
  • Lenovo XClarity (ThinkSystem servers)
  • Cisco UCS (Unified Computing System)
  • Any hardware implementing the DMTF Redfish standard

Quick Start#

Basic Configuration#

probes:
  - name: "Hardware Server01"
    type: redfish
    params:
      base_url: "https://idrac-server01.company.com"
      username: "monitoring"
      password: "SecurePassword123"
      interval: 300
      tls:
        verify_ssl: false

Important notes:

  • base_url: The Redfish API endpoint (iDRAC, iLO, or BMC management address)
  • interval: 300 seconds (5 minutes) is recommended for hardware monitoring
  • verify_ssl: Set to false for self-signed certificates commonly used on BMC interfaces

Multiple Servers#

Monitor multiple hardware targets with separate probe instances:

probes:
  - name: "Dell Storage ME5024"
    type: redfish
    params:
      base_url: "https://dell-me5024.company.com"
      username: "admin"
      password: "StoragePassword"
      interval: 300
      tls:
        verify_ssl: false

  - name: "HPE ProLiant DL380"
    type: redfish
    params:
      base_url: "https://ilo-dl380.company.com"
      username: "monitoring"
      password: "ServerPassword"
      interval: 300
      tls:
        verify_ssl: false

Metrics Collected#

Health Metrics#

Monitor the health status of hardware components. Health values use a standard scale: 0=OK, 1=Warning, 2=Critical, 3=Unknown.

Storage Controllers#

Metric NameDescriptionType
hardware.storage.controller.healthController health statusGauge
hardware.storage.redundancy.healthController redundancy healthGauge
hardware.storage.redundancy.controllers_activeNumber of active controllersGauge
hardware.storage.redundancy.controllers_minMinimum controllers requiredGauge
hardware.storage.redundancy.controllers_maxMaximum controllers supportedGauge

Drives#

Metric NameDescriptionType
hardware.storage.drive.healthDrive health statusGauge
hardware.storage.drive.failure_predictedFailure prediction (1=failure predicted)Gauge
hardware.storage.drive.hotspareHotspare status (1=active hotspare)Gauge

Storage Pools#

Metric NameDescriptionType
hardware.storage.pool.healthPool health statusGauge

Volumes#

Metric NameDescriptionType
hardware.storage.volume.healthVolume health statusGauge
hardware.storage.volume.encryptedEncryption status (1=encrypted)Gauge

Events and Logs#

Metric NameDescriptionType
hardware.logs.entries.totalTotal log entriesGauge
hardware.logs.entries.criticalCritical log entriesGauge
hardware.logs.entries.warningWarning log entriesGauge
hardware.logs.entries.infoInformational log entriesGauge
hardware.logs.entries.last_24hEvents in the last 24 hoursGauge
hardware.logs.entries.last_7dEvents in the last 7 daysGauge
hardware.eventservice.healthEvent service health statusGauge
hardware.eventservice.subscriptionsNumber of event subscriptionsGauge

Capacity Metrics#

Storage Pools#

Metric NameDescriptionUnit
hardware.storage.pool.capacity.totalTotal pool capacitybytes
hardware.storage.pool.capacity.allocatedAllocated space in poolbytes
hardware.storage.pool.capacity.allocated_percentAllocated space percentage%
hardware.storage.pool.capacity.usedActually consumed spacebytes
hardware.storage.pool.capacity.used_percentConsumed space percentage%
hardware.storage.pool.capacity.freeFree spacebytes
hardware.storage.pool.capacity.free_percentFree space percentage%
hardware.storage.pool.capacity.volumesSpace allocated to volumesbytes
hardware.storage.pool.capacity.snapshotsSpace allocated to snapshotsbytes
hardware.storage.pool.capacity.committedTotal committed spacebytes
hardware.storage.pool.capacity.overcommitOver-allocated space (thin provisioning)bytes

Volumes#

Metric NameDescriptionUnit
hardware.storage.volume.capacity.totalTotal volume capacitybytes
hardware.storage.volume.capacity.allocatedAllocated spacebytes
hardware.storage.volume.capacity.allocated_percentAllocated space percentage%
hardware.storage.volume.capacity.usedActually used spacebytes
hardware.storage.volume.capacity.used_percentUsed space percentage%
hardware.storage.volume.capacity.freeFree spacebytes
hardware.storage.volume.capacity.free_percentFree space percentage%

Drives#

Metric NameDescriptionUnit
hardware.storage.drive.capacity.totalTotal drive capacitybytes

Performance Metrics#

Volume I/O#

Metric NameDescriptionUnit
hardware.storage.volume.io.total_opsTotal I/O operationscount
hardware.storage.volume.io.readsRead operationscount
hardware.storage.volume.io.writesWrite operationscount
hardware.storage.volume.io.total_bytesTotal data transferredbytes
hardware.storage.volume.io.read.bytesData readbytes
hardware.storage.volume.io.write.bytesData writtenbytes
hardware.storage.volume.io.read.latencyRead latencyms
hardware.storage.volume.io.write.latencyWrite latencyms

Pool I/O#

Metric NameDescriptionUnit
hardware.storage.pool.io.readsRead operationscount
hardware.storage.pool.io.writesWrite operationscount
hardware.storage.pool.io.read.bytesData readbytes
hardware.storage.pool.io.write.bytesData writtenbytes

Operations Metrics#

Metric NameDescriptionType
hardware.storage.drive.has_operationsOperations in progress (1=yes)Gauge
hardware.storage.drive.operation.progressOperation progress%

Tags#

All metrics include contextual tags for filtering and grouping.

Controller Tags#

TagDescriptionExample
controller_idController identifierA
controller_nameController nameController A
controllerController letterA, B
controller_typeController typestorage
hostHost system nameme5024-prod
manufacturerController manufacturerDell
modelController modelPERC H740P
serial_numberController serial numberABC123

Drive Tags#

TagDescriptionExample
drive_idDrive identifierDisk.Bay.0:Enclosure.Internal.0-1
drive_nameDrive nameDisk 0
modelDrive modelST1200MM0009
drive_manufacturerDrive manufacturerSeagate
serial_numberDrive serial numberWFK12345
media_typeMedia typeSSD, HDD
protocolCommunication protocolSAS, SATA
hotspare_typeHotspare typeGlobal, Dedicated
encryption_abilityEncryption capabilitySelfEncryptingDrive
encryption_statusEncryption statusUnlocked
service_labelService labelBay 0
location_typeLocation typeSlot
location_ordinalLocation ordinal value0
operation_nameCurrent operation nameRebuild

Pool Tags#

TagDescriptionExample
pool_idPool identifierA
pool_namePool namePool A
descriptionPool descriptionVirtual storage pool
supported_raid_typesSupported RAID typesRAID1, RAID5, RAID6
max_block_size_bytesMaximum block size512
thin_provisionedThin provisioning indicatortrue

Volume Tags#

TagDescriptionExample
volume_idVolume identifierVD1
volume_nameVolume nameProduction-Vol1
pool_idAssociated pool identifierA
raid_typeRAID typeRAID5
write_cache_policyWrite cache policyWriteBack
block_size_bytesBlock size512
access_capabilitiesAccess capabilitiesRead, Write
encryption_typeEncryption typeNativeDriveEncryption

Event and Log Tags#

TagDescriptionExample
hostHost system nameme5024-prod
manager_idManager identifierBMC
manager_nameManager nameiDRAC
modelManager modeliDRAC9
log_service_idLog service identifierSel
log_service_nameLog service nameSystem Event Log

Recommended Alerting#

Essential Health Alerts#

  • Monitor hardware.storage.controller.health for controller failures
  • Monitor hardware.storage.redundancy.health for redundancy issues
  • Monitor hardware.storage.drive.failure_predicted for drives with predicted failures
  • Monitor hardware.storage.drive.has_operations for ongoing maintenance operations
  • Monitor hardware.logs.entries.critical for critical system events

Capacity Alerts#

  • Monitor hardware.storage.pool.capacity.free_percent for available space
  • Monitor hardware.storage.volume.capacity.used_percent for volume utilization

Performance Alerts#

  • Monitor hardware.storage.volume.io.total_ops for general I/O activity
  • Monitor hardware.storage.volume.io.read.latency and hardware.storage.volume.io.write.latency for performance issues

Event Monitoring#

  • Monitor hardware.logs.entries.critical and hardware.logs.entries.warning for system issues
  • Use hardware.logs.entries.last_24h to track recent system activity
  • Compare trends between hardware.logs.entries.last_24h and hardware.logs.entries.last_7d to identify event spikes
  • Use hardware.eventservice.health to verify the event service is operating correctly

Troubleshooting#

Connection Issues#

Symptom: Cannot connect to the Redfish endpoint

Diagnosis:

  1. Verify the BMC/iDRAC/iLO management interface is reachable from the agent host:
    curl -k https://idrac-server01.company.com/redfish/v1/
  2. Verify the management interface is powered on and network-connected
  3. Check for firewall rules blocking HTTPS (port 443) between the agent and the BMC

TLS/SSL Errors#

Symptom: TLS handshake failures or certificate errors

Resolution: BMC management interfaces typically use self-signed certificates. Set verify_ssl: false in the probe configuration:

params:
  tls:
    verify_ssl: false

If your environment uses properly signed certificates, ensure the CA chain is trusted by the system running the agent.

Authentication Failures#

Symptom: 401 Unauthorized or login failures

Diagnosis:

  1. Verify credentials by logging in to the BMC web interface manually
  2. Check that the account is not locked out due to failed login attempts
  3. Verify the account has sufficient privileges for Redfish API access (read-only access is sufficient)
  4. Some BMCs limit concurrent sessions – ensure the session limit is not reached

Dell ME Capacity Shows Zero#

Symptom: Pool or volume capacity metrics return 0

Explanation: Dell PowerVault ME series systems may return CapacityBytes=0 in the standard Redfish response. The agent automatically detects this and uses Capacity.Data.AllocatedBytes as the effective capacity. Ensure you are running a recent version of the agent for this workaround to be active.

Debug Logging#

Enable debug logging for the Redfish probe:

# Runtime log level change
curl -X POST http://localhost:8080/api/{key}/debug/logs \
  -H "Content-Type: application/json" \
  -d '{"module_levels": [{"module": "probe.redfish", "level": "debug"}]}'

# Or start agent with verbose logging
./senhub-agent run --authentication-key KEY --verbose --debug-modules probe.redfish

License Requirements#

The Redfish probe requires a Pro or Enterprise license.

TierRedfish Probe
FreeNot available
ProIncluded
EnterpriseIncluded
SenHub Agent 0.1.80-beta