DatadogでTCP疎通確認を設定する方法
Contents
概要
はじめに
- 今回は、DatadogのTCP疎通確認(TCP Check)を設定する方法をご紹介します。
- Datadogは、SaaSで提供されるシステムモニタリングツールであり、現在担当する案件でも、サーバーやアプリケーションの監視に使用しています。Datadog の概要は、下記のログミーTechで公開されている服部さんの記事を参照ください。
Datadogで実現するTCP疎通確認
- DatadogのTCP Checkを利用して、ホストとポートのTCP接続と応答時間が監視可能です。
TCP疎通確認のmonitorはNetworkから作る
- DatadogでTCP疎通確認(TCP Check)を行う際のmonitor は、「New Monitor」 ⇒ 「monitor type: Network」 を選択します。
- ただし、監視対象のサーバーでAgent が未設定の場合は、下記のメッセージが表示されます。
- 「Please ensure that HTTP and/or TCP checks have been configured in the agent.」
TCP疎通確認の送信元を決める
- 先ず、監視対象のサーバーに定期的にリクエストを送る検証サーバーを選定します。
- 上記検証サーバーは、Datadog のAgent がインストールされており、かつ監視対象のTCPサービス提供するサーバーと異なるサーバーであるべきです。
Datadog Agent の設定
- Linux の場合は、下記にDatadog Agent のTCP Check設定を配置します。
- /etc/datadog-agent/conf.d/tcp_check.d/conf.yaml
- Windows の場合は、下記にDatadog Agent のTCP Check設定を配置します。
- C:\ProgramData\Datadog\conf.d\tcp_check.d\conf.yaml
- 以下、conf.yaml のサンプルです。
- Webサービス(port 80, 443)だけでなく、sshd(port 22)やRDP(port 3389)などのネットワークに必須のサービスを疎通確認も可です。
init_config:
instances:
- name: TCP-443_check_AAA.example.com
host: AAA.example.com
port: 443
collect_response_time: true
- name: TCP-80_check_BBB.example.com
host: BBB.example.com
port: 80
collect_response_time: true
Datadog Agent の再起動
- Linuxの場合は、下記コマンドでconf.yamlを配置して、Datadog Agentを再起動します。
$ cd /etc/datadog-agent/conf.d/tcp_check.d
$ sudo vi conf.yaml
$ sudo systemctl restart datadog-agent.service
$ sudo datadog-agent status
- Windowsの場合は、下記コマンドでconf.yamlを配置して、Datadog Agentを再起動します。
← "C:\ProgramData\Datadog\conf.d\tcp_check.d"にconf.yamlを配置します
← PowerShell を管理者として起動します
PS C:\Users\Administrator> cd "C:\Program Files\Datadog\Datadog Agent\embedded"
PS C:\Program Files\Datadog\Datadog Agent\embedded> ./agent.exe restart-service
PS C:\Program Files\Datadog\Datadog Agent\embedded> ./agent.exe status
- 以下は、agent.exe status の結果サンプルです。(Linux のsudo datadog-agent status も同様です)
- TCP Checkの内容が「tcp_check」 として、情報が出力されています。
===============
Agent (v6.14.1)
===============
Status date: 2019-10-28 00:36:34.429243 JST
Agent start: 2019-10-28 00:27:48.177974 JST
Pid: 2200
Go Version: go1.12.9
Python Version: 2.7.16
Check Runners: 4
Log Level: info
Paths
=====
Config File: C:\ProgramData\Datadog\datadog.yaml
conf.d: C:\ProgramData\Datadog\conf.d
checks.d: C:\ProgramData\Datadog\checks.d
Clocks
======
NTP offset: -3.887ms
System UTC time: 2019-10-28 00:36:34.429243 JST
Host Info
=========
bootTime: 2019-10-07 16:58:43.000000 JST
os: windows
platform: Windows Server 2019 Datacenter
platformFamily: Windows Server 2019 Datacenter
platformVersion: 10.0 Build 17763
procs: 109
uptime: 487h29m4s
Hostnames
=========
ec2-hostname: ip-XX-XX-XX-XX.ap-northeast-1.compute.internal
hostname: hostname1
instance-id: i-0123456789abcdefg
socket-fqdn: hostname1
socket-hostname: hostname1
hostname provider: os
unused hostname providers:
aws: not retrieving hostname from AWS: the host is not an ECS instance, and other providers already retrieve non-default hostnames
configuration/environment: hostname is empty
gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname
=========
Collector
=========
Running Checks
==============
cpu
---
Instance ID: cpu [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\cpu.d\conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 7, Total: 239
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
disk (2.5.0)
------------
Instance ID: disk:e5dffb8bef24336f [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\disk.d\conf.yaml.default
Total Runs: 34
Metric Samples: Last Run: 6, Total: 204
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 2ms
file_handle
-----------
Instance ID: file_handle [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\file_handle.d\conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 1, Total: 35
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 1ms
http_check (4.2.0)
------------------
Instance ID: http_check:AAA.example.com:18c391eda9aa3f3d [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\http_check.d\conf.yaml
Total Runs: 2
Metric Samples: Last Run: 5, Total: 10
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 2, Total: 4
Average Execution Time : 419ms
Instance ID: http_check:BBB.example.com:86495e385cafb9dd [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\http_check.d\conf.yaml
Total Runs: 2
Metric Samples: Last Run: 5, Total: 10
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 2, Total: 4
Average Execution Time : 52ms
io
--
Instance ID: io [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\io.d\conf.yaml.default
Total Runs: 34
Metric Samples: Last Run: 5, Total: 170
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
memory
------
Instance ID: memory [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\memory.d\conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 17, Total: 595
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
network (1.11.4)
----------------
Instance ID: network:e0204ad63d43c949 [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\network.d\conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 12, Total: 420
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 1ms
ntp
---
Instance ID: ntp:d884b5186b651429 [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\ntp.d\conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 1, Total: 35
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 35
Average Execution Time : 0s
tcp_check (2.3.0)
-----------------
Instance ID: tcp_check:TCP-443_check_AAA.example.com:801b3307e5549a11 [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\tcp_check.d\conf.yaml
Total Runs: 35
Metric Samples: Last Run: 2, Total: 70
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 35
Average Execution Time : 8ms
Instance ID: tcp_check:TCP-80_check_BBB.example.com:cb01989ad30c1161 [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\tcp_check.d\conf.yaml
Total Runs: 35
Metric Samples: Last Run: 2, Total: 70
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 1, Total: 35
Average Execution Time : 19ms
uptime
------
Instance ID: uptime [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\uptime.d\conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 1, Total: 35
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 0s
windows_service (2.1.0)
-----------------------
Instance ID: windows_service:89b0f21a4c4f699f [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\windows_service.d\conf.yaml
Total Runs: 34
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 5, Total: 170
Average Execution Time : 50ms
winproc
-------
Instance ID: winproc [[32mOK[0m]
Configuration Source: file:C:\ProgramData\Datadog\conf.d\winproc.d\conf.yaml.default
Total Runs: 35
Metric Samples: Last Run: 2, Total: 70
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 1ms
========
JMXFetch
========
Initialized checks
==================
no checks
Failed checks
=============
no checks
=========
Forwarder
=========
Transactions
============
CheckRunsV1: 35
Dropped: 0
DroppedOnInput: 0
Events: 0
HostMetadata: 0
IntakeV1: 2
Metadata: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 72
TimeseriesV1: 35
API Keys status
===============
API key ending with 123ab: API Key valid
==========
Endpoints
==========
https://app.datadoghq.com - API Key ending with:
- 123ab
==========
Logs Agent
==========
Logs Agent is not running
=========
Aggregator
=========
Checks Metric Sample: 3,321
Dogstatsd Metric Sample: 2,360
Event: 1
Events Flushed: 1
Number Of Flushes: 35
Series Flushed: 3,393
Service Check: 959
Service Checks Flushed: 993
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 2,359
Metric Parse Errors: 0
Service Check Packets: 0
Service Check Parse Errors: 0
Udp Bytes: 152,117
Udp Packet Reading Errors: 0
Udp Packets: 2,360
Uds Bytes: 0
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 0
TCP疎通確認のmonitor を作成する
- Datadog Agentの設定が完了したら、TCP疎通確認のmonitor を作成します。
- Agentのconf.yaml 設定後、"tcp" が選択可能になります。続けて、conf.yaml に設定したエンドポイント(URL)を選択します。以下、monitor 作成時の画面サンプルです。
参考情報
network.tcp.response_timeメトリクスの説明
- The response time of a given host and TCP port, tagged with url, e.g. 'url:192.168.1.100:22’.
(Shown as second) - Returns DOWN if the Agent cannot connect to the configured
host
andport
, otherwise UP.
network.tcp.can_connectメトリクスの説明
- Value of 1 if the agent can successfully establish a connection to the URL, 0 otherwise
その他
- 下記資料を参照ください。