LogicMonitorとAnsibleを使用してモニタリングを調整する

screen-shot-2016-11-17-at-10-53-37-am
XNUMX月に、LogicMonitor CommunityAnsibleModuleのリリースを発表しました。 本日、私たちは最近のリリースでそれを発表できることを誇りに思います
Ansibleバージョン2.2、LogicMonitorモジュールはAnsibleのすべての公式ディストリビューションに含まれます! 祝福し、全員のスピードを上げるために、このブログでは、モジュールを使用するためのいくつかの使用例と例を共有します。

使用事例

以下に、Ansible用のLogicMonitorモジュールを使用するための実際のユースケースと付随するプレイブックをいくつか紹介します(LogicMonitorとAnsibleの統合に関する完全な情報は次のとおりです。 ここ、見つけることができる公式モジュールドキュメントに加えて ここ).

環境はそれぞれ異なるため、これらのプレイブックを直接コピーして貼り付けることはできませんが、LogicMonitorモジュールのさまざまな使用方法やさまざまな構文オプションの優れた例を提供します。

始める前に、これらのユースケースで使用するインベントリの例と、環境設定に関する情報を少し紹介します。 サンプルインベントリには、プレイブックで使用されているいくつかのモックグループと変数が含まれているだけです。モジュールを使用するために既存のインベントリを更新する必要はありません。 例の環境のセットアップは、LogicMonitor資格情報をモジュールに渡すための迅速で簡単な方法で構成されています。 独自の内部ベスト プラクティスを引き続き使用して処理してください 資格情報と Ansible.

最後に、各ユースケースの概要とそこに含まれる貴重な情報の要約を提供しますが、本質的な技術的な詳細と説明は、プレイブック自体にコメントとして含まれています。

棚卸

[linux_hosts:children]
collector_hosts
application_hosts

[collector_hosts]
collector01.logicmonitor.com
collector02.logicmonitor.com

[application_hosts:children]
application_foo_hosts
application_bar_hosts

[application_foo_hosts]
foo01.app.logicmonitor.com lm_display_name=foo1
foo02.app.logicmonitor.com lm_display_name=foo2
foo03.app.logicmonitor.com lm_display_name=foo3

[application_bar_hosts]
bar01.app.logicmonitor.com
bar02.app.logicmonitor.com
bar03.app.logicmonitor.com

環境設定

export LM_COMPANY="AnsibleTest"
export LM_USER="ansible"
export LM_PASSWORD="mypassword"


ユースケース1:監視を使用したアプリケーションスタックのプロビジョニング

このユースケースは、Ansibleを使用してアプリケーションスタックをプロビジョニングし、新しくプロビジョニングされたホストをLogicMonitorに追加する方法を示しています。 このユースケースは、既存の環境またはアプリケーションスタックに新しいホストを追加する場合にも適用できます。

クールなポイントと例:

‌•LogicMonitorコレクターのインストール
‌•LogicMonitorモニタリングへのホストの追加
‌•ボーナス:追加時にデバイスグループとデバイスプロパティを設定する
‌•LogicMonitorデバイスグループの作成
‌•LogicMonitorによってすでに監視されているデバイスを更新する

脚本
# This playbook provides an example use case for provisioning an application
# stack and orchestrating these hosts for monitoring within LogicMonitor. This
# includes provisioning LogicMonitor collectors, application-specific device
# groups, and monitoring application servers.
#
# NOTE: We're relying on shell environment variables for passing LogicMonitor
# credentials into the logicmonitor module. There are a variety of ways to
# achieve this goal, but for the purposes of this playbook, we're exporting the
# variables: LM_COMPANY, LM_USER, and LM_PASSWORD.
#
# Further documentation can be found here:
#   https://docs.ansible.com/ansible/logicmonitor_module.html

---
# Do some boilerplate, non-LogicMonitor orchestration tasks here. This task or
# tasks will obviously be specific to your own environment
- name: Provision hosts
  hosts: linux_hosts
  become: yes
  tasks:
    - name: Install telnet just for fun
      package:
        name=telnet
        state=present

# Install LogicMonitor collectors on designated hosts here.
# We need a collector before adding our provisioned hosts to monitoring
- name: Provision LogicMonitor Collectors
  hosts: collector_hosts
  become: yes
  tasks:
    - name: Install LogicMonitor collectors
      logicmonitor:
        target=collector
        action=add
        company="{{ lookup('env', 'LM_COMPANY') }}"
        user="{{ lookup('env', 'LM_USER') }}"
        password="{{ lookup('env', 'LM_PASSWORD') }}"

# Add all hosts into basic monitoring here. This is the baseline monitoring
#  config for all devices. We'll do app-specific customizations later.
- name: Add hosts to LogicMonitor monitoring
  hosts: linux_hosts
  become: no
  tasks:
    - name: Add all hosts into monitoring
      become: no
    # All tasks except for target=collector should use local_action
      local_action: >
        logicmonitor target=host
        action=add
        collector="collector01.logicmonitor.com"
        company="{{ lookup('env', 'LM_COMPANY') }}"
        user="{{ lookup('env', 'LM_USER') }}"
        password="{{ lookup('env', 'LM_PASSWORD') }}"
        groups="/servers/production,/test-datacenter"
        properties="{'snmp.community':'commstring','dc':'test', 'type':'prod'}"

# Create LogicMonitor device groups for different applications
#
# Note that there's some intelligence here when assigning the LogicMonitor device
# display name. In the inventory, I've assigned a host level variable
# lm_display_name for some hosts but not others. For the displayname parameter
# below, we're using a Jinja2 filter to set the displayname parameter using either
# the host variable lm_display_name or, if that variable isn't set, to default to
# using device's hostname.
#
# Also note that, since there's only one device group in LogicMonitor per app type,
# we don't need to run this task for every host, so we've set run_once to true.
- name: Create LogicMonitor device groups for applications
  hosts: collector_hosts
  become: yes
  vars:
    app_names: ['foo', 'bar']
  tasks:
    - name: Create a host group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=hostgroup
          action=add
          displayname="{{ lm_display_name | default(inventory_hostname) }}"
          fullpath='/applications/{{ item }}'
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"
          properties="{'app.name':'{{ item }}'}"
      with_items: "{{ app_names }}"
      run_once: true

# Add 'foo' application servers to the 'foo' device group in LogicMonitor.
# This will be useful for more convenient management of these servers in the
#  portal and allow for device group properties configured about to be inherited.
- name: Add foo application hosts to foo device group
  hosts: application_foo_hosts
  become: no
  tasks:
    - name: Add foo application hosts to foo device group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
        logicmonitor target=host
          action=update
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"
          collector="collector01.logicmonitor.com"
          groups="/applications/foo"

# Add 'bar' application servers to the 'bar' device group in LogicMonitor.
# This will be useful for more convenient management of these servers in the
#  portal and allow for device group properties configured about to be inherited.
#
# Note that we're also updating the collector field, thereby moving these hosts
# to a new collector. This isn't strictly necessary, but shows an example of
# this process, and for our hypothetical situation, allows us to isolate each
# application on its own collector.
- name: Add bar application hosts to bar device group
  hosts: application_bar_hosts
  become: no
  tasks:
    - name: Add bar application hosts to bar device group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
        logicmonitor target=host
          action=update
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"
          collector="collector02.logicmonitor.com"
          groups="/applications/bar"

プレイブックの実行

ansible-playbook -i Inventory use_case_1.yml
結果
$ ansible-playbook -i inventory use_case_1.yml

PLAY [Provision hosts] *********************************************************

TASK [setup] *******************************************************************
ok: [collector01.logicmonitor.com]
ok: [bar01.app]
ok: [bar02.app.logicmonitor.com]
ok: [collector02.logicmonitor.com]
ok: [foo01.app.logicmonitor.com]
ok: [foo02.app.logicmonitor.com]

TASK [Install telnet just for fun] *********************************************
changed: [collector01.logicmonitor.com]
changed: [bar01.app]
changed: [bar02.app.logicmonitor.com]
changed: [foo02.app.logicmonitor.com]
changed: [foo01.app.logicmonitor.com]
changed: [collector02.logicmonitor.com]

PLAY [Provision LogicMonitor Collectors] ***************************************

TASK [setup] *******************************************************************
ok: [collector01.logicmonitor.com]
ok: [collector02.logicmonitor.com]

TASK [Install LogicMonitor collectors] *****************************************
changed: [collector01.logicmonitor.com]
changed: [collector02.logicmonitor.com]

PLAY [Add hosts to LogicMonitor monitoring] ************************************

TASK [setup] *******************************************************************
ok: [foo01.app.logicmonitor.com]
ok: [collector01.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]
ok: [collector02.logicmonitor.com]
ok: [bar01.app]
ok: [foo02.app.logicmonitor.com]

TASK [Add all hosts into monitoring] *******************************************
changed: [foo01.app.logicmonitor.com -> localhost]
changed: [foo02.app.logicmonitor.com -> localhost]
changed: [collector02.logicmonitor.com -> localhost]
changed: [collector01.logicmonitor.com -> localhost]
changed: [bar02.app.logicmonitor.com -> localhost]
changed: [bar01.app -> localhost]

PLAY [Create LogicMonitor device groups for applications] **********************

TASK [setup] *******************************************************************
ok: [collector01.logicmonitor.com]
ok: [collector02.logicmonitor.com]

TASK [Create a host group] *****************************************************
changed: [collector01.logicmonitor.com -> localhost] => (item=foo)
changed: [collector01.logicmonitor.com -> localhost] => (item=bar)

PLAY [Add foo application hosts to foo device group] ***************************

TASK [setup] *******************************************************************
ok: [foo01.app.logicmonitor.com]
ok: [foo02.app.logicmonitor.com]

TASK [Add foo application hosts to foo device group] ***************************
changed: [foo01.app.logicmonitor.com -> localhost]
changed: [foo02.app.logicmonitor.com -> localhost]

PLAY [Add bar application hosts to bar device group] ***************************

TASK [setup] *******************************************************************
ok: [bar01.app]
ok: [bar02.app.logicmonitor.com]

TASK [Add bar application hosts to bar device group] ***************************
changed: [bar02.app.logicmonitor.com -> localhost]
changed: [bar01.app -> localhost]

PLAY RECAP *********************************************************************
bar01.app.logicmonitor.com     : ok=6    changed=3    unreachable=0    failed=0
collector01.logicmonitor.com   : ok=8    changed=4    unreachable=0    failed=0
bar02.app.logicmonitor.com     : ok=6    changed=3    unreachable=0    failed=0
collector02.logicmonitor.com   : ok=8    changed=4    unreachable=0    failed=0
foo02.app.logicmonitor.com     : ok=6    changed=3    unreachable=0    failed=0
foo01.app.logicmonitor.com     : ok=6    changed=3    unreachable=0    failed=0


ユースケース2:アプリケーションのデプロイ

このユースケースは、既存のアプリケーションデプロイメントプレイブックでLogicMonitorモジュールを使用して スケジュールダウンタイム(SDT) 影響を受けるアプリケーションとデバイスグループ。 LogicMonitorでこのメソッドを使用して、アプリケーションの更新をロールアウトするときの誤ったアラートを抑制します。

クールなポイントと例:

‌•SDTLogicMonitorデータソース
‌•SDTLogicMonitorデバイスグループ

脚本
# This playbook provides an example use case for using the LogicMonitor module
# to schedule downtime (SDT) for monitored hosts during an application deploy in
# order to eliminate superfluous LogicMonitor alerts.
#
# There are two examples showing different options for SDTing an application.
# The first example will apply an SDT to an application-specific datasource ID.
# Currently this does require a bit of initial legwork to retrieve the ID from
# the LogicMonitor portal. A benefit of using this approach rather than SDTing
# the entire host is that you will still be alerted to non-deploy related alerts
# during the SDT duration.
#
# The second example demonstrates setting devices' SDT at the LogicMonitor
# device group level. This is more of a hypothetical example and not something
# we'd necessarily recommend as a best practice.
#
# For example, SDTing at the device level using the
# Ansible inventory allows for finer grained control of SDT during deploys,
# while SDTing at the device group level potentially provides broader SDT
# coverage, especially in situations where deploying a particular application
# may trigger alerts in other applications that aren't actually relevant to the
# Ansible inventory.
#
# For the purposes of this example, our application deploy process will simply
# consist of downloading a war file and then copying it to an application
# directory. This playbook can be obviously be rearranged to suit your needs,
# but we recommend sequencing your SDT task as close to the first production-
# impacting task as possible. For example, there's no need to SDT your hosts
# while waiting for a deploy artifact to download; this increases the changes of
# missing legitimate alerts that aren't related to the deployment.
#
# NOTE: We're relying on shell environment variables for passing LogicMonitor
# credentials into the logicmonitor module. There are a variety of ways to
# achieve this goal, but for the purposes of this playbook, we're exporting the
# variables: LM_COMPANY, LM_USER, and LM_PASSWORD.
#
# Further documentation can be found here:
#   https://docs.ansible.com/ansible/logicmonitor_module.html

---
# This playbook will demonstrate an application deployment utilizing a
# LogicMonitor SDT at the device level.
- name: Deploy app foo
  hosts: application_foo_hosts
  become: yes
  tasks:
    # - name: Download deploy artifact from release artifact server to temp location
    #   get_url:
    #     url: https://releases.logicmonitor.com/applications/foo/foo.war
    #     dest: /tmp/foo.war
    - name: Download deploy artifact from release artifact server to temp location
      command: touch /tmp/foo.war

    # Schedule downtime for the foo application datasource, lasting 5 minutes,
    # starting now.
    # We want to sequence this task as close to the first production-impacting
    # task as possible.
    - name: Schedule Downtime for application datasource
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=datasource
          action=sdt
          id='123'
          duration=5
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # For the sake of simplicity, we're assuming that foo is a Tomcat application
    # and that Tomcat is configured to automatically explode and start new wars
    - name: Deploy application by moving to webapps dir
      copy:
        remote_src=True
        src=/tmp/foo.war
        dest=/usr/local/bar/webapps/foo.war

    - name: Remove temp war
      file:
        path=/tmp/foo.war
        state=absent

    # We always like to use Ansible to verify that our application deploys were
    # successful. For the sake of this example, since every application is
    # different, we're going to cheat a bit and pretend that we already have a
    # functional verification script installed alongside the application. There
    # are a variety of different ways to implement this functionality natively
    # within Ansible, but that's a topic for a whole different blog.
    - name: Verify application was deployed successfully
      shell: "/usr/local/foo/bin/verify.sh status"
      register: result

    - debug: var=result.stdout_lines

# This playbook will demonstrate an application deployment utilizing a
# LogicMonitor SDT at the group level
- name: Deploy app bar
  hosts: application_bar_hosts
  become: yes
  tasks:
    # - name: Download deploy artifact from release artifact server to temp location
    #   get_url:
    #     url: https://releases.logicmonitor.com/applications/bar/bar.war
    #     dest: /tmp/bar.war
    - name: Download deploy artifact from release artifact server to temp location
      command: touch /tmp/bar.war

    # Schedule downtime for the bar device group, lasting 5 minutes, starting now.
    # We want to sequence this task as close to the first production-impacting
    # task as possible.
    #
    # Note that we're using the same device group that we created in the first
    # use case.
    - name: Schedule Downtime for application device group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=hostgroup
          action=sdt
          fullpath="/applications/bar"
          duration=5
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # For the sake of simplicity, we're assuming that bar is a Tomcat application
    # and that Tomcat is configured to automatically explode and start new wars
    - name: Deploy application by moving to webapps dir
      copy:
        remote_src=True
        src=/tmp/bar.war
        dest=/usr/local/bar/webapps/bar.war

    - name: Remove temp war
      file:
        path=/tmp/bar.war
          state=absent

    # We always like to use Ansible to verify that our application deploys were
    # successful. For the sake of this example, since every application is
    # different, we're going to cheat a bit and pretend that we already has a
    # functional verification script installed alongside the application. There
    # are a variety of different ways to implement this functionality natively
    # within Ansible, but that's a topic for a whole different blog.
    - name: Verify application was deployed successfully
      shell: "/usr/local/bar/bin/verify.sh status"
      register: result

    - debug: var=result.stdout_lines
プレイブックの実行
ansible-playbook -i Inventory use_case_2.yml
結果
$ ansible-playbook -i inventory use_case_2.yml

PLAY [Deploy app foo] **********************************************************

TASK [setup] *******************************************************************
ok: [foo01.app.logicmonitor.com.app]
ok: [foo02.app.logicmonitor.com.app]

TASK [Download deploy artifact from release artifact server to temp location] **
changed: [foo01.app.logicmonitor.com.app]
changed: [foo02.app.logicmonitor.com.app]

TASK [Schedule Downtime for devices] *******************************************
changed: [foo02.app.logicmonitor.com.app -> localhost]
changed: [foo01.app.logicmonitor.com.app -> localhost]

TASK [Deploy application by moving to webapps dir] *****************************
changed: [foo01.app.logicmonitor.com.app]
changed: [foo02.app.logicmonitor.com.app]

TASK [Remove temp war] *********************************************************
changed: [foo02.app.logicmonitor.com.app]
changed: [foo01.app.logicmonitor.com.app]

TASK [Verify application was deployed successfully] ****************************
changed: [foo01.app.logicmonitor.com.app]
changed: [foo02.app.logicmonitor.com.app]

TASK [debug] *******************************************************************
ok: [foo01.app.logicmonitor.com.app] => {
    "result.stdout_lines": [
        "Success! App is serving."
    ]
}
ok: [foo02.app.logicmonitor.com.app] => {
    "result.stdout_lines": [
        "Success! App is serving."
    ]
}
PLAY [Deploy app bar] **********************************************************

TASK [setup] *******************************************************************
ok: [bar01.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]

TASK [Download deploy artifact from release artifact server to temp location] **
changed: [bar02.app.logicmonitor.com]
changed: [bar01.app.logicmonitor.com]

TASK [Schedule Downtime for application device group] **************************
changed: [bar01.app.logicmonitor.com -> localhost]
changed: [bar02.app.logicmonitor.com -> localhost]

TASK [Deploy application by moving to webapps dir] *****************************
changed: [bar02.app.logicmonitor.com]
changed: [bar01.app.logicmonitor.com]

TASK [Remove temp war] *********************************************************
changed: [bar01.app.logicmonitor.com]
changed: [bar02.app.logicmonitor.com]

TASK [Verify application was deployed successfully] ****************************
changed: [bar02.app.logicmonitor.com]
changed: [bar01.app.logicmonitor.com]

TASK [debug] *******************************************************************
ok: [bar01.app.logicmonitor.com] => {
    "result.stdout_lines": [
        "Success! App is serving."
    ]
}
ok: [bar02.app.logicmonitor.com] => {
    "result.stdout_lines": [
        "Success! App is serving."
    ]
}

PLAY RECAP *********************************************************************
bar01.app.logicmonitor.com      : ok=7    changed=5    unreachable=0    failed=0
foo02.app.logicmonitor.com.app  : ok=7    changed=5    unreachable=0    failed=0
foo01.app.logicmonitor.com.app  : ok=7    changed=5    unreachable=0    failed=0
bar02.app.logicmonitor.com      : ok=7    changed=5    unreachable=0    failed=0

ユースケース3:ソフトウェアの更新

このユースケースは、 スケジューリングダウンタイム(SDT) システムソフトウェア更新ワークフローの一部としてのLogicMonitorの場合。 このユースケースは、定期的なメンテナンスの一環としてLogicMonitorアラートを誘発する可能性が高いほとんどの状況で役立ちます。

クールなポイントと例:

‌•SDTLogicMonitorデバイス
使用する ジンジャー2 'host'アクションを実行するときにコレクターを指定しないようにするためのフィルター、Ansible変数、およびdisplaynameパラメーター
‌•Ansibleの実行を中断せずに、Ansibleプレイの一部としてホストを再起動する

脚本
# This playbook provides an example use case for using the LogicMonitor module
# to schedule downtime (SDT) for monitored hosts and collectors during system
# updates and perform a reboot.
#
# NOTE: We're relying on shell environment variables for passing LogicMonitor
# credentials into the logicmonitor module. There are a variety of ways to
# achieve this goal, but for the purposes of this playbook, we're exporting the
# variables: LM_COMPANY, LM_USER, and LM_PASSWORD.
#
# Further documentation can be found here:
#   https://docs.ansible.com/ansible/logicmonitor_module.html

---
- name: Perform system updates on application hosts and reboot
  hosts: application_hosts
  become: yes
  tasks:
    # For the sake of example, we're just going to update the telnet package we
    # installed in the first use case.
    - name: Update all packages on the system
      package:
        name=telnet
        state=latest

    # Schedule downtime for each host, lasting 15 minutes, starting now
    #
    # Note that we could also SDT at the device group level if we wanted to, but
    # for this use case, it's going to be easier and more reliable to apply this
    # at the device level. This ensures that all of the hosts being updated get
    # SDT and also ensures we don't unnecessarily SDT other hosts or mistakenly
    # miss SDTing an affected host.
    #
    # Also note that, since we're SDTing a wide range of hosts in our example
    # infrastructure, it becomes cumbersome to specify devices' collectors.
    # Instead, we'll specify the displayname, which allows the module to
    # dynamically lookup the correct collector for each host. As in the first
    # use case example, we're using a Jinja2 filter to add some intelligence to
    # the displayname parameter allowing us to use either the inventory variable
    # lm_display_name or default to the device's hostname.
    - name: Schedule Downtime for devices
      # All tasks except for target=collector should use local_action
      become: no
      local_action: >
          logicmonitor
          target=host
          action=sdt
          duration=15
          displayname="{{ lm_display_name | default(inventory_hostname) }}"
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # Here we're going to use a trick to reboot the systems without interrupting
    # our Ansible execution. To accomplish this, we'll asynchronously reboot the
    # hosts and then wait for them to become accessible again. We don't gain a
    # whole lot using this method in this particular example, but it's extremely
    # useful when performing additional tasks after the reboot.
    # Source: https://support.ansible.com/hc/en-us/articles/201958037-Reboot-a-server-and-wait-for-it-to-come-back
    - name: Reboot
      shell: sleep 2 && /sbin/shutdown -r now
      ignore_errors: true

    - name: Wait for server to reboot
      become: no
      local_action: >
        wait_for host=host={{ inventory_hostname }}
        port=22
        state=started
        delay=30
        timeout=300

# This playbook is very similar to the one above but handles the scenario of
# updating and rebooting hosts that have collectors on them. In order to prevent
# 'Collector Down' alerts, we'll also need to SDT the collectors.
#
# Note that we've already completed updated our other hosts before touching the
# collector hosts. This ensures that any issues encountered during the previous
# playbook are adequately detected by LogicMonitor monitoring before we begin
# touching our collectors.
- name: Perform system updates on collector hosts and reboot
  hosts: collector_hosts
  become: yes
  tasks:
    # For the sake of example, we're just going to naively update all of installed
    # packages. That's totally safe right and couldn't possibly have adverse effects,
    # could it? ;)
    - name: Update all packages on the system
      package:
        name=*
        state=latest

    # Schedule downtime for each host, lasting 15 minutes, starting now
    #
    # Note that we could also SDT at the device group level if we wanted to, but
    # for this use case, it's going to be easier and more reliable to apply this
    # at the device level. This ensures that all of the hosts being updated get
    # SDT and also ensures we don't unnecessarily SDT other hosts or mistakenly
    # miss SDTing an affected host.
    #
    # Also note that, since we're SDTing a wide range of hosts in our example
    # infrastructure, it becomes cumbersome to specify devices' collectors.
    # Instead, we'll specify the displayname, which allows the module to
    # dynamically lookup the correct collector for each host. As in the first
    # use case example, we're using a Jinja2 filter to add some intelligence to
    # the displayname parameter allowing us to use either the inventory variable
    # lm_display_name or default to the device's hostname.
    - name: Schedule Downtime for devices
      # All tasks except for target=collector should use local_action
      become: no
      local_action: >
          logicmonitor target=host
          action=sdt
          duration=15
          displayname="{{ lm_display_name | default(inventory_hostname) }}"
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # In order to prevent spurious 'Collector Down' alerts, we're also going to
    # SDT the collectors on these hosts.
    - name: Schedule Downtime for collectors
      logicmonitor:
        target=collector
        action=sdt
        duration=15
        company="{{ lookup('env', 'LM_COMPANY') }}"
        user="{{ lookup('env', 'LM_USER') }}"
        password="{{ lookup('env', 'LM_PASSWORD') }}"

    # Same as above, we're now going to reboot the hosts and wait for them to
    # come back up.
    - name: Reboot
      shell: sleep 2 && /sbin/shutdown -r now
      async: 1
      poll: 0
      ignore_errors: true

    - name: Wait for server to reboot
      become: no
      local_action: >
        wait_for host={{ inventory_hostname }}
        state=started
        delay=30
        timeout=300
プレイブックの実行
ansible-playbook -i Inventory use_case_3.yml

結果
$ ansible-playbook -i inventory use_case_3.yml

PLAY [Perform system updates on application hosts and reboot] ******************

TASK [setup] *******************************************************************
ok: [bar01.app.logicmonitor.com]
ok: [foo01.app.logicmonitor.com]
ok: [foo02.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]

TASK [Update all packages on the system] ***************************************
changed: [foo01.app.logicmonitor.com]
changed: [bar02.app.logicmonitor.com]
changed: [bar01.app.logicmonitor.com]
changed: [foo02.app.logicmonitor.com]

TASK [Schedule Downtime for devices] *******************************************
changed: [bar01.app.logicmonitor.com -> localhost]
changed: [foo01.app.logicmonitor.com -> localhost]
changed: [foo02.app.logicmonitor.com -> localhost]
changed: [bar02.app.logicmonitor.com -> localhost]

TASK [Reboot] ******************************************************************
fatal: [foo01.app.logicmonitor.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "", "msg": "MODULE FAILURE"}
...ignoring
fatal: [foo02.app.logicmonitor.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "", "msg": "MODULE FAILURE"}
...ignoring
fatal: [bar01.app.logicmonitor.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "", "msg": "MODULE FAILURE"}
...ignoring
fatal: [bar02.app.logicmonitor.com]: FAILED! => {"changed": false, "failed": true, "module_stderr": "", "module_stdout": "", "msg": "MODULE FAILURE"}
...ignoring

TASK [Wait for server to reboot] ***********************************************
ok: [bar01.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]
ok: [foo02.app.logicmonitor.com]
ok: [foo01.app.logicmonitor.com]

PLAY RECAP *********************************************************************
bar01.app.logicmonitor.com      : ok=5    changed=2    unreachable=0    failed=0
foo01.app.logicmonitor.com      : ok=5    changed=2    unreachable=0    failed=0
foo02.app.logicmonitor.com      : ok=5    changed=2    unreachable=0    failed=0
bar02.app.logicmonitor.com      : ok=5    changed=2    unreachable=0    failed=0

ユースケース4:デバイスの廃止

このユースケースは、アプリケーションホストを廃止し、それらのデバイスと対応するデバイスグループをLogicMonitorから削除する例を示しています。

クールなポイントと例:
‌•SDTLogicMonitorデバイスグループ
‌•LogicMonitorからデバイスを削除します
‌•LogicMonitorからデバイスグループを削除します
‌•Ansibleの実行を中断せずに、Ansibleプレイの一部としてホストをシャットダウンする

脚本

# This playbook provides an example use case for decommissioning hosts and
# using the LogicMonitor Ansible module to remove them from monitoring. We'll
# pretend that we need no longer need the bar application in our infrastructure
# and decommission all of those servers.
#
# NOTE: We're relying on shell environment variables for passing LogicMonitor
# credentials into the logicmonitor module. There are a variety of ways to
# achieve this goal, but for the purposes of this playbook, we're exporting the
# variables: LM_COMPANY, LM_USER, and LM_PASSWORD.
#
# Further documentation can be found here:
#   https://docs.ansible.com/ansible/logicmonitor_module.html

---
- name: Decommission all bar application hosts and remove from monitoring
  hosts: application_hosts
  become: yes
  tasks:
      # Schedule downtime for the bar device group, lasting 60 minutes,
      # starting now. Since we're decommissioning hosts, the exact length of the
      # SDT isn't critical, so an hour gives us a pretty big buffer with no
      # adverse consequences
    - name: Schedule Downtime for application device group
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=hostgroup
          action=sdt
          fullpath="/applications/bar"
          duration=60
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    # Do some boilerplate, non-LogicMonitor orchestration tasks here. This task or
    # tasks will obviously be specific to your own environment. For this example,
    # we'll pretend that a simple halt is sufficient.
    #
    # Note that there's nothing strictly wrong with skipping the SDT and just
    # removing the devices from LogicMonitor before decommissioning. In this
    # example, by leaving the hosts in LogicMonitor until after decommissioning,
    # we can potentially detect hosts that are stranded in an unstable state.
    #
    # As in the previous use case, we're using a bit of Ansible magic to allow us
    # to shut down the host without stopping Ansible execution.
    - name: Decommission host
      shell: sleep 2 && /sbin/shutdown -h now
      async: 1
      poll: 0
      ignore_errors: true

    # Note that, since we're SDTing a wide range of hosts in our example
    # infrastructure, it becomes cumbersome to specify devices' collectors.
    # Instead, we'll specify the displayname, which allows the module to
    # dynamically lookup the correct collector for each host. As in the first
    # use case example, we're using a Jinja2 filter to add some intelligence to
    # the displayname parameter allowing us to use either the inventory variable
    # lm_display_name or default to the device's hostname.
    #
    # Also note that, since there's only one device group in LogicMonitor, we
    # don't need to run this task for every host, so we've set run_once to true.
    - name: Remove devices from LogicMonitor
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=host
          action=remove
          displayname="{{ lm_display_name | default(inventory_hostname) }}"
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"

    - name: Remove bar application device group from LogicMonitor
      become: no
      # All tasks except for target=collector should use local_action
      local_action: >
          logicmonitor target=hostgroup
          action=remove
          fullpath="/applications/bar"
          company="{{ lookup('env', 'LM_COMPANY') }}"
          user="{{ lookup('env', 'LM_USER') }}"
          password="{{ lookup('env', 'LM_PASSWORD') }}"
      run_once: true
プレイブックの実行
ansible-playbook -i Inventory use_case_4.yml

結果
$ ansible-playbook -i inventory use_case_4.yml

PLAY [Decommission all bar application hosts and remove from monitoring] *******

TASK [setup] *******************************************************************
ok: [bar01.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]

TASK [Schedule Downtime for application device group] **************************
changed: [bar02.app.logicmonitor.com -> localhost]
changed: [bar01.app.logicmonitor.com -> localhost]

TASK [Decommission host] *******************************************************
ok: [bar01.app.logicmonitor.com]
ok: [bar02.app.logicmonitor.com]

TASK [Remove devices from LogicMonitor] ****************************************
changed: [bar01.app.logicmonitor.com -> localhost]
changed: [bar02.app.logicmonitor.com -> localhost]

TASK [Remove bar application device group from LogicMonitor] *******************
changed: [bar02.app.logicmonitor.com -> localhost]
changed: [bar01.app.logicmonitor.com -> localhost]

PLAY RECAP *********************************************************************
bar01.app.logicmonitor.com      : ok=5    changed=3    unreachable=0    failed=0
bar02.app.logicmonitor.com      : ok=5    changed=3    unreachable=0    failed=0

まとめ

IT自動化のスケーリングと複雑な環境の管理は、監視をインフラストラクチャと運用に合わせることが重要であることを意味します。 LogicMonitorとAnsibleモジュールを使用すると、ユーザーは、使い慣れたAnsible Playbookを使用して、真実と操作のソース(包括的な監視を含む)をXNUMXつの一貫した反復可能なプロセスに統合できます。