안녕하세요, 와탭랩스 데브옵스팀 최정민입니다. 이번 글에서는 AWS RDS의 Log와 Event를 와탭 로그 모니터링으로 볼 수 있게 해 주는 WhaTap RDS Log를 개발하게 된 계기와 동작 구조, 설치 스크립트에 대해 소개하겠습니다.
기존 와탭 DB 모니터링만으로 AWS RDS를 모니터링을 할 경우 아래 그림처럼 AWS RDS에 WhaTap DB모니터링을 설치하여 DB모니터링의 매트릭 등을 확인한 후, Log와 Event AWS Console에서 확인했어야 했습니다.
이에 대해 불편함을 느꼈고, 와탭 DB모니터링 환경에서 AWS RDS의 Log와 Event를 확인 할 수 있어야 와탭 DB 모니터링이 AWS RDS에 대한 완전한 모니터링을 제공한다 생각해 WhaTap AWS RDS LOG개발에 들어가게 되었습니다.
WhaTap AWS RDS LOG의 전체 구조는 아래 그림과 같습니다.
AWS RDS에서 생기는 LOG와 EVENT는 각각 AWS CloudWatch Log Subscription Filter, AWS EventBridge (Event Rule)를 통해 WhaTap RDS Log에 전달되게 됩니다.
WhaTap RDS Log는 AWS Lambda로 Log와 Event가 들어올 때마다 실행되며, 파싱 후 WhaTap으로 전송합니다.
이번에는 WhaTap RDS Log에서 들어오는 Log와 Event를 어떻게 처리하는지 Golang 코드와 함께 알아보겠습니다.
들어온 Event로 실행된 WhaTap RDS Log는 먼저 Event가 AWS RDS Log인지 AWS RDS Event인지 구분합니다.
AWS RDS LOG와 EVENT는 아래 예시와 같은 형식으로 WhaTap RDS Log에 전달되게 됩니다.
- AWS RDS LOG DATA EXAMPLE
{'awslogs': {'data': 'H4sIAAAAAAAAK15.....'}}
- AWS RDS EVENT DATA EXAMPLE
{"version": "0","id": "12a345b6-78c9-01d2-34e5-123f4ghi5j6k","detail-type": "RDS DB Instance Event","source": "aws.rds","account": "111111111111","time": "2021-03-19T19:34:09Z","region": "us-east-1","resources": ["arn:aws:rds:us-east-1:111111111111:db:testdb"],"detail": {"EventCategories": ["notification"],"SourceType": "DB_INSTANCE","SourceArn": "arn:aws:rds:us-east-1:111111111111:db:testdb","Date": "2021-03-19T19:34:09.293Z","Message": "DB instance stopped","SourceIdentifier": "testdb","EventID": "RDS-EVENT-0087"}}
이를 구분하기 위해 아래와 같이 EVENT 타입을 정의 후 Lambda Event Handler에 적용하였습니다.
type AwsRdsEventData struct {Versionstring`json:"version"`Idstring`json:"id"`Typestring`json:"detail-type"`Sourcestring`json:"source"`Accountstring`json:"account"`Timestring`json:"time"`Regionstring`json:"region"`Resources []string`json:"resources"`DetailAwsRdsEventDetail`json:"detail"`AwslogCloudwatchLogsRawData `json:"awslogs"`}
package mainimport ("log""rds-forwarder/model""rds-forwarder/whatap""github.com/aws/aws-lambda-go/lambda")func HandleLambdaEvent(event model.AwsRdsEventData) (model.Response, error) {log.SetFlags(log.LstdFlags | log.Lshortfile)//AWS RDS EVENTif event.Awslog.Data == "" {whatap.SendEvent(event)result := model.Response{Result: "Success to Send Whatap"}return result, nil} else { //AWS RDS LOGlogdata, err := model.ParserEvent(event)if err != nil {log.Panic(logdata)}result := model.Response{Result: "Success to Send Log"}whatap.SendLog(logdata)return result, nil}}func main() {lambda.Start(HandleLambdaEvent)}
들어온 Event가 AWS RDS Log일 경우 Awslog필드만 값이 있고 나머지는 빈값이고, AWS RDS EVENT일 경우 Awslog필드는 빈값이고 나머지는 값이 있습니다. 이를 이용해 데이터를 분류합니다.
AWS RDS EVENT의 경우 별도의 작업 없이 와탭 로그 데이터 형식에 맞춰서 보내면 됩니다. 다만 AWS RDS Log의 경우 base64 인코딩과 압축이 걸려있기에 디코딩, 압축해제 과정이 필요합니다.
- AWS RDS LOG json
{'awslogs': {'data': 'H4sIAAAAAAAAK15.....'}}
- data필드의 값을 복호화 및 압축 해제 한 결과
{"owner": "123456789012","logGroup": "CloudTrail","logStream": "123456789012_CloudTrail_us-east-1","subscriptionFilters": ["Destination"],"messageType": "DATA_MESSAGE","logEvents": [{"id": "31953106606966983378809025079804211143289615424298221568","timestamp": 1432826855000,"message": "{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"},{"id": "31953106606966983378809025079804211143289615424298221569","timestamp": 1432826855000,"message": "{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"},{"id": "31953106606966983378809025079804211143289615424298221570","timestamp": 1432826855000,"message": "{\"eventVersion\":\"1.03\",\"userIdentity\":{\"type\":\"Root\"}"}]}
WhaTap AWS RDS LOG에서도 복호화를 위해 다음과 같이 모델을 정의하였습니다.
type CloudwatchLogsRawData struct {Data string `json:"data"`}type CloudwatchLogsData struct {Ownerstring`json:"owner"`LogGroupstring`json:"logGroup"`LogStreamstring`json:"logStream"`SubscriptionFilters []string`json:"subscriptionFilters"`MessageTypestring`json:"messageType"`LogEvents[]CloudwatchLogEvent `json:"logEvents"`}type CloudwatchLogEvent struct {IDstring `json:"id"`Timestamp int64`json:"timestamp"`Messagestring `json:"message"`}
그 뒤 아래와 같은 과정으로 복호화를 진행하였습니다.
func ParserEvent(event AwsRdsEventData) (CloudwatchLogsData, error) {//빈 구조체 정의logdata := CloudwatchLogsData{}//복호화하여 이진 데이터로 저장rawDecodedText, err := base64.StdEncoding.DecodeString(event.Awslog.Data)if err != nil {return logdata, err}// 디코딩된 이진 데이터를 gzip.NewReader를 사용하여 압축 해제zipReader, err := gzip.NewReader(bytes.NewBuffer(rawDecodedText))if err != nil {return logdata, err}defer zipReader.Close()// 압축 해제된 데이터를 JSON 디코더를 사용하여 Go 데이터 구조체로 변환dec := json.NewDecoder(zipReader)err = dec.Decode(&logdata)return logdata, err}
Whatap RDS Log의 설치와 설정 스크립트는 AWS CloudFormaition로 작성하여 사용자분들이 빠르고 간단하게 진행할 수 있게 하였습니다.
Parameters:ProjectAccessKey:Description: "Enter your Project Access Key (Management > Project management > Project access key)"Type: StringAllowedPattern : ".+"Pcode:Description: "Enter your pcode (Management > Project management > pcode)"Type: StringAllowedPattern : ".+"Host:Description: "Enter Whatap Server IP (Management > Agent Installation > Whatap Server)"Type: StringDefault: "13.124.11.223/13.209.172.35"AllowedPattern : ".+"Port:Type: StringDefault: 6600TimeOut:Description: "Lambda runs your code for a set amount of time before timing out. Timeout is the maximum amount of time in seconds that a Lambda function can run. The default value for this setting is 3 seconds, but you can adjust this in increments of 1 second up to a maximum value of 15 minutes."Type: NumberDefault: 150MemorySize:Description: "Lambda allocates CPU power in proportion to the amount of memory configured. Memory is the amount of memory available to your Lambda function at runtime. You can increase or decrease the memory and CPU power allocated to your function using the Memory (MB) setting. To configure the memory for your function, set a value between 128 MB and 10,240 MB in 1-MB increments. At 1,769 MB, a function has the equivalent of one vCPU (one vCPU-second of credits per second)."Type: NumberDefault: 1024MinValue: 128MaxValue: 3000UseReservedConcurrency:Description: "Reserve concurrency for a function to set the maximum number of simultaneous executions for a function. Provision concurrency to ensure that a function can scale without fluctuations in latency. Reserved concurrency applies to the entire function, including all versions and aliases."AllowedValues:- true- falseType: StringDefault: falseReservedConcurrency:Type: NumberDescription: "Set Reserved Concurrency"Default: 10Conditions:useConcurrency: !Equals [ !Ref UseReservedConcurrency, true]noConcurrency: !Equals [ !Ref UseReservedConcurrency, false]Resources:LambdaZipsBucket:Type: AWS::S3::BucketCopyZips:Type: Custom::CopyZipsProperties:ServiceToken: !GetAtt "CopyZipsFunction.Arn"DestBucket: !Ref "LambdaZipsBucket"SourceBucket: !Sub "whatapforwarder"Objects:- !Sub "WhaTapRDSLog.zip"CopyZipsFunctionRole:Type: AWS::IAM::RoleProperties:AssumeRolePolicyDocument:Version: '2012-10-17'Statement:- Effect: AllowPrincipal:Service: lambda.amazonaws.comAction: sts:AssumeRoleManagedPolicyArns:- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRolePath: /Policies:- PolicyName: lambda-copierPolicyDocument:Version: '2012-10-17'Statement:- Effect: AllowAction:- s3:GetObjectResource:- !Sub "arn:aws:s3:::whatapforwarder/*"- Effect: AllowAction:- s3:PutObject- s3:DeleteObjectResource:- !Sub "arn:aws:s3:::${LambdaZipsBucket}/*"CopyZipsFunction:Type: AWS::Lambda::FunctionProperties:Description: Copies objects from a source S3 bucket to a destinationHandler: index.handlerRuntime: python3.9Role: !GetAtt 'CopyZipsFunctionRole.Arn'Timeout: 240Code:ZipFile: |import jsonimport loggingimport threadingimport boto3import cfnresponselogger = logging.getLogger()logger.setLevel(logging.INFO)s3 = boto3.client('s3')def copy_objects(source_bucket, dest_bucket, objects):"""Copy specified objects from source to destination bucket:param source_bucket: source bucket name:param dest_bucket: destination bucket name:param prefix: source bucket prefix:param objects: list of objects to copy:return: None"""for item in objects:key = itemcopy_source = {'CopySource': '/{}/{}'.format(source_bucket, key),'Bucket': source_bucket,'Key': key}logger.info('copy_source: [{}]'.format(copy_source))logger.info('dest_bucket: [{}]'.format(dest_bucket))logger.info('key: [{}]'.format(key))s3.copy_object(CopySource=copy_source, Bucket=dest_bucket, Key=key)def delete_objects(bucket,objects):"""Delete specified s3 objects:param bucket: bucket name:param prefix: bucket prefix:param objects: list of object names:return None"""objects = {'Objects': [{'Key': item} for item in objects]}s3.delete_objects(Bucket=bucket, Delete=objects)def timeout_handler(event, context):"""Timeout handling:param event: lambda function event:param context: lambda function context:return None"""logger.error('Execution is about to time out, sending failure response to CloudFormation')cfnresponse.send(event, context, cfnresponse.FAILED, {}, None)def handler(event, context):"""Lambda function handler:param event: lambda function event:param context: lambda function context:return None"""# make sure we send a failure to CloudFormation if the function# is going to timeouttimer = threading.Timer((context.get_remaining_time_in_millis()/ 1000.00) - 0.5, timeout_handler, args=[event, context])timer.start()logger.info('Event: [{}]'.format(event))status = cfnresponse.SUCCESStry:source_bucket = event['ResourceProperties']['SourceBucket']dest_bucket = event['ResourceProperties']['DestBucket']objects = event['ResourceProperties']['Objects']logging.info('SourceBucket=[{}], DestinationBucket=[{}],\Objects=[{}]'.format(source_bucket, dest_bucket, objects))if event['RequestType'] == 'Delete':delete_objects(dest_bucket, objects)else:copy_objects(source_bucket, dest_bucket, objects)except Exception as e:logger.error('Exception: %s' % e, exc_info=True)status = cfnresponse.FAILEDfinally:timer.cancel()cfnresponse.send(event, context, status, {}, None)LambdaExecutionRole:Type: "AWS::IAM::Role"Properties:AssumeRolePolicyDocument:Version: "2012-10-17"Statement:- Effect: AllowPrincipal:Service:- lambda.amazonaws.comAction:- "sts:AssumeRole"Policies:- PolicyDocument:Version: "2012-10-17"Statement:#Lambda Basic Excution Role- Effect: AllowAction:- "logs:CreateLogGroup"- "logs:CreateLogStream"- "logs:PutLogEvents"Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*"PolicyName: forwarderpolicyWhaTapRDSLogReserved:DependsOn: CopyZipsType: AWS::Lambda::FunctionCondition: useConcurrencyProperties:Handler: "main"Code:S3Bucket:Ref: 'LambdaZipsBucket'S3Key: "WhaTapRDSLog.zip"Runtime: "go1.x"MemorySize:Ref : MemorySizeTimeout:Ref : TimeOutReservedConcurrentExecutions:Ref : ReservedConcurrencyEnvironment:Variables:WHATAP_HOST:Ref : HostWHATAP_LICENSE:Ref : ProjectAccessKeyWHATAP_PCODE:Ref : PcodeWHATAP_PORT:Ref : PortRole: !GetAtt LambdaExecutionRole.ArnCWPermissionReserved:Type: AWS::Lambda::PermissionCondition: useConcurrencyProperties:Action: lambda:InvokeFunctionFunctionName: !Ref WhaTapRDSLogReservedPrincipal: !Sub "logs.${AWS::Region}.amazonaws.com"SourceAccount: !Ref "AWS::AccountId"EventBridgePermissionReserved:Type: AWS::Lambda::PermissionCondition: useConcurrencyProperties:Action: lambda:InvokeFunctionFunctionName: !Ref WhaTapRDSLogReservedPrincipal: events.amazonaws.comSourceAccount: !Ref "AWS::AccountId"WhaTapRDSLog:DependsOn: CopyZipsType: AWS::Lambda::FunctionCondition: noConcurrencyProperties:Handler: "main"Code:S3Bucket:Ref: 'LambdaZipsBucket'S3Key: "WhaTapRDSLog.zip"Runtime: "go1.x"MemorySize:Ref : MemorySizeTimeout:Ref : TimeOutEnvironment:Variables:WHATAP_HOST:Ref : HostWHATAP_LICENSE:Ref : ProjectAccessKeyWHATAP_PCODE:Ref : PcodeWHATAP_PORT:Ref : PortRole: !GetAtt LambdaExecutionRole.ArnCWPermission:Type: AWS::Lambda::PermissionCondition: noConcurrencyProperties:Action: lambda:InvokeFunctionFunctionName: !Ref WhaTapRDSLogPrincipal: !Sub "logs.${AWS::Region}.amazonaws.com"SourceAccount: !Ref "AWS::AccountId"EventBridgePermission:Type: AWS::Lambda::PermissionCondition: noConcurrencyProperties:Action: lambda:InvokeFunctionFunctionName: !Ref WhaTapRDSLogPrincipal: events.amazonaws.comSourceAccount: !Ref "AWS::AccountId"Outputs:WhaTapRDSLogArn:Condition: noConcurrencyDescription: "WhaTapRDSLog's Arn"Value: !GetAtt WhaTapRDSLog.ArnWhaTapRDSLogArnReserved:Condition: useConcurrencyDescription: "WhaTapRDSLog's Arn"Value: !GetAtt WhaTapRDSLogReserved.Arn
Description: "CloudFormation template for Setting WhaTapRDSLog"Parameters:WhaTapRDSLogArn:Description: "Enter your WhatapRdsForwarder ARN (if WhatapRdsForwarder isn't installed, you have to install it first)"Type: StringEventRuleName:Description: "Enter AWS EventBridge Rule Name"Type: StringDefault: "WhaTapRDSEventRule"AllowedPattern: ".+"AwsRdsNames:Description: "Enter AWS RDS Name (ex database-1, database-2)"Type: ListDefault: "none, none"AllowedPattern: ".+"RdsLogGroupName1:Description: "Enter your Cloudwatch LogGroup Name that you want to see (ex /aws/rds/database-1/error)"Default: "none"Type: StringRdsLogGroupName2:Description: "Enter your Cloudwatch LogGroup Name that you want to see"Type: StringDefault: "none"RdsLogGroupName3:Description: "Enter your Cloudwatch LogGroup Name that you want to see"Type: StringDefault: "none"RdsLogGroupName4:Description: "Enter your Cloudwatch LogGroup Name that you want to see"Type: StringDefault: "none"RdsLogGroupName5:Description: "Enter your Cloudwatch LogGroup Name that you want to see"Type: StringDefault: "none"RdsLogGroupName6:Description: "Enter your Cloudwatch LogGroup Name that you want to see"Type: StringDefault: "none"RdsLogGroupName7:Description: "Enter your Cloudwatch LogGroup Name that you want to see"Type: StringDefault: "none"RdsLogGroupName8:Description: "Enter your Cloudwatch LogGroup Name that you want to see"Type: StringDefault: "none"RdsLogGroupName9:Description: "Enter your Cloudwatch LogGroup Name that you want to see"Type: StringDefault: "none"Conditions:Rule: !Not [ !Equals [ !Select [0, !Ref AwsRdsNames], "none" ] ]RDS1 : !Not [ !Equals [ !Ref RdsLogGroupName1, "none" ] ]RDS2 : !Not [ !Equals [ !Ref RdsLogGroupName2, "none" ] ]RDS3 : !Not [ !Equals [ !Ref RdsLogGroupName3, "none" ] ]RDS4 : !Not [ !Equals [ !Ref RdsLogGroupName4, "none" ] ]RDS5 : !Not [ !Equals [ !Ref RdsLogGroupName5, "none" ] ]RDS6 : !Not [ !Equals [ !Ref RdsLogGroupName6, "none" ] ]RDS7 : !Not [ !Equals [ !Ref RdsLogGroupName7, "none" ] ]RDS8 : !Not [ !Equals [ !Ref RdsLogGroupName8, "none" ] ]RDS9 : !Not [ !Equals [ !Ref RdsLogGroupName9, "none" ] ]Resources:WhatapEventBridgeRule:Type: AWS::Events::RuleCondition: RuleProperties:Description: "Event Rule For Monitoring RDS event"EventBusName: defaultEventPattern:source:- aws.rdsdetail:SourceIdentifier: !Ref AwsRdsNamesState: ENABLEDName: !Ref EventRuleNameTargets:- Id : "EventID"Arn : !Ref WhaTapRDSLogArnConnectRDSLogwithWhatap1:Type: AWS::Logs::SubscriptionFilterCondition: RDS1Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName1"ConnectRDSLogwithWhatap2:Type: AWS::Logs::SubscriptionFilterCondition: RDS2Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName2"ConnectRDSLogwithWhatap3:Type: AWS::Logs::SubscriptionFilterCondition: RDS3Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName3"ConnectRDSLogwithWhatap4:Type: AWS::Logs::SubscriptionFilterCondition: RDS4Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName4"ConnectRDSLogwithWhatap5:Type: AWS::Logs::SubscriptionFilterCondition: RDS5Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName5"ConnectRDSLogwithWhatap6:Type: AWS::Logs::SubscriptionFilterCondition: RDS6Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName6"ConnectRDSLogwithWhatap7:Type: AWS::Logs::SubscriptionFilterCondition: RDS7Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName7"ConnectRDSLogwithWhatap8:Type: AWS::Logs::SubscriptionFilterCondition: RDS8Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName8"ConnectRDSLogwithWhatap9:Type: AWS::Logs::SubscriptionFilterCondition: RDS9Properties:DestinationArn: !Ref "WhaTapRDSLogArn"FilterPattern: " "LogGroupName: !Ref "RdsLogGroupName9"
DevOps 엔지니어로서 대부분 만들어진 모니터링을 사용하는 것에 포커스를 많이 두었는데, 실제로 모니터링을 개발해 볼 수 있어 뜻 깊은 경험이였습니다. 앞으로도 와탭 모니터링으로 서비스 운영 / 관리하는 고객의 관점에서 부족한 점을 관찰하고 보완해 나가도록 하겠습니다.