云监控
阿里云云监控为云上用户提供开箱即用的企业级开放型一站式监控解决方案。涵盖IT设施基础监控,外网网络质量拨测监控,基于事件、自定义指标、日志的业务监控。为您全方位提供更高效、更全面、更省钱的监控服务。
云监控提供了丰富的云产品系统事件监控,并且事件还在不断丰富完善中中,丰富的事件触发自定义处理的函数,可以实现更多相关云资源的自动化式的自定义处理。
示例场景
假设一台云服务器 ecs 发生因系统错误而重启,运维人员或者 ecs 用户可能会紧急响应,人工做一些验证或者创建快照的处理, 在本示例中,通过云监控触发函数,实现对一台因为系统错误实例重启或者因实例错误而重启的机器进行自动化处理,比如成功重启后自动创建快照。
准备知识
云服务器 ECS 系统事件
云产品系统事件监控
函数代码
这个示例展示来自云监控中 ecs 重启结束事件触发了函数执行,函数自动查找出ecs挂接的云盘,并给云盘自动创建了快照。
# -*- coding: utf-8 -*-
import logging
import json, random, string, time
from aliyunsdkcore import client
from aliyunsdkecs.request.v20140526.DeleteSnapshotRequest import DeleteSnapshotRequest
from aliyunsdkecs.request.v20140526.CreateSnapshotRequest import CreateSnapshotRequest
from aliyunsdkecs.request.v20140526.DescribeDisksRequest import DescribeDisksRequest
from aliyunsdkcore.auth.credentials import StsTokenCredential
LOGGER = logging.getLogger()
clt = None
def handler(event, context):
creds = context.credentials
sts_token_credential = StsTokenCredential(creds.access_key_id, creds.access_key_secret, creds.security_token)
'''
{
"product": "ECS",
"content": {
"executeFinishTime": "2018-06-08T01:25:37Z",
"executeStartTime": "2018-06-08T01:23:37Z",
"ecsInstanceName": "timewarp",
"eventId": "e-t4nhcpqcu8fqushpn3mm",
"eventType": "InstanceFailure.Reboot",
"ecsInstanceId": "i-bp18l0uopocfc98xxxx"
},
"resourceId": "acs:ecs:cn-hangzhou:12345678:instance/i-bp18l0uopocfc98xxxx",
"level": "CRITICAL",
"instanceName": "instanceName",
"status": "Executing",
"name": "Instance:SystemFailure.Reboot:Executing",
"regionId": "cn-hangzhou"
}
'''
evt = json.loads(event)
content = evt.get("content");
ecsInstanceId = content.get("ecsInstanceId");
regionId = evt.get("regionId");
global clt
clt = client.AcsClient(region_id=regionId, credential=sts_token_credential)
name = evt.get("name");
name = name.lower()
if name in [ 'Instance:SystemFailure.Reboot:Executing'.lower(), "Instance:InstanceFailure.Reboot:Executing".lower()]:
pass
# do other things
if name in ['Instance:SystemFailure.Reboot:Executed'.lower(), "Instance:InstanceFailure.Reboot:Executed".lower()]:
request = DescribeDisksRequest()
request.add_query_param("RegionId", "cn-shenzhen")
request.set_InstanceId(ecsInstanceId)
response = _send_request(request)
disks = response.get('Disks').get('Disk', [])
for disk in disks:
diskId = disk["DiskId"]
SnapshotId = create_ecs_snap_by_id(diskId)
LOGGER.info("Create ecs snap sucess, ecs id = %s , disk id = %s ", ecsInstanceId, diskId)
def create_ecs_snap_by_id(disk_id):
LOGGER.info("Create ecs snap, disk id is %s ", disk_id)
request = CreateSnapshotRequest()
request.set_DiskId(disk_id)
request.set_SnapshotName("reboot_" + ''.join(random.choice(string.ascii_lowercase) for _ in range(6)))
response = _send_request(request)
return response.get("SnapshotId")
# send open api request
def _send_request(request):
request.set_accept_format('json')
try:
response_str = clt.do_action_with_exception(request)
LOGGER.info(response_str)
response_detail = json.loads(response_str)
return response_detail
except Exception as e:
LOGGER.error(e)
操作步骤
-
创建函数(函数代码在文末),函数创建可参考函数计算 hello world
注:记得给函数的service的role设置操作ecs的权限
-
登录云监控控制台, 创建报警规则, 监控的事件为ecs 因实例错误或西戎错误重启开始和结束
-
mock调试
-
模拟真实的ecs事件请参考演练系统事件处理程序
原创文章,作者:网友投稿,如若转载,请注明出处:https://www.cloudads.cn/archives/33686.html