Hi, flyte team. I would like to ask a question ab...
# announcements
s
Hi, flyte team. I would like to ask a question about notification email. I have setup notification in values.yaml and build SQS & SNS in aws. I have checked logs that it is publishing message but it did not arriave in my mail box. would anyone can give some advice?
Copy code
2022/08/18 08:08:18 /go/src/github.com/flyteorg/flyteadmin/pkg/repositories/gormimpl/execution_repo.go:61
[21.243ms] [rows:1] UPDATE "executions" SET "id"=358,"created_at"='2022-08-18 08:05:44.14',"updated_at"='2022-08-18 08:08:18.501',"execution_project"='flytesnacks',"execution_domain"='development',"execution_name"='f985b6eddf4c43d5f000',"launch_plan_id"=114,"workflow_id"=106,"phase"='SUCCEEDED',"closure"='<binary>',"spec"='<binary>',"started_at"='2022-08-18 08:05:49.251',"execution_created_at"='2022-08-18 08:05:44.139',"execution_updated_at"='2022-08-18 08:08:18.497',"duration"='2m29.245844975s',"mode"=1,"inputs_uri"='<s3://my-s3-bucket/metadata/flytesnacks/development/f985b6eddf4c43d5f000/inputs>',"user_inputs_uri"='<s3://my-s3-bucket/metadata/flytesnacks/development/f985b6eddf4c43d5f000/user_inputs>',"user"='a57ebf23-a6e8-45e7-bcb1-52b69f508f67',"state"=0 WHERE "execution_project" = 'flytesnacks' AND "execution_domain" = 'development' AND "execution_name" = 'f985b6eddf4c43d5f000'
{"json":{"exec_id":"f985b6eddf4c43d5f000","src":"execution_manager.go:1588"},"level":"debug","msg":"publishing notifications for execution [project:\"flytesnacks\" domain:\"development\" name:\"f985b6eddf4c43d5f000\" ] in state [SUCCEEDED] for notifications [[phases:SUCCEEDED email:\u003crecipients_email:\"<my email>\" \u003e ]]","ts":"2022-08-18T08:08:18Z"}
{"json":{"exec_id":"f985b6eddf4c43d5f000","src":"publisher.go:30"},"level":"debug","msg":"Publishing the following message [recipients_email:\"<my mail>\" sender_email:\"<sender mail setup in SES>\" subject_line:\"Notice: Execution \\\"flyte_fixedrate.positive_wf\\\" has succeeded in \\\"development\\\".\" body:\"Execution \\\\\\\"flyte_fixedrate.positive_wf [f985b6eddf4c43d5f000]\\\\\\\" has succeeded in \\\\\\\"development\\\\\\\". View details at \u003ca href=\\\\<http://flow.qraftpilot.com/console/projects/flytesnacks/domains/development/executions/f985b6eddf4c43d5f000>\u003e http://<flyte url>/console/projects/flytesnacks/domains/development/executions/f985b6eddf4c43d5f000\u003c/a\u003e. \\n\" ]","ts":"2022-08-18T08:08:18Z"}



{"json":{"exec_id":"faf3d92fb8453e24d000","node":"n0","src":"noop_notifications.go:32"},"level":"debug","msg":"call to noop publish with notification type [flyteidl.admin.NodeExecutionEventRequest] and proto message [event:\u003cid:\u003cnode_id:\"n0\" execution_id:\u003cproject:\"flytesnacks\" domain:\"development\" name:\"faf3d92fb8453e24d000\" \u003e \u003e producer_id:\"propeller\" phase:SUCCEEDED occurred_at:\u003cseconds:1660810338 nanos:222102672 \u003e input_uri:\"<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-faf3d92fb8453e24d000/n0/data/inputs.pb>\" output_uri:\"<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-faf3d92fb8453e24d000/n0/data/0/outputs.pb>\" task_node_metadata:\u003c\u003e spec_node_id:\"n0\" node_name:\"be_positive\" event_version:1 deck_uri:\"<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-faf3d92fb8453e24d000/n0/data/0/deck.html>\" \u003e ]","ts":"2022-08-18T08:12:18Z"}
thank you!
p
It seems to be using noop notifier. Can you check notification type value in the config and if you have set it to aws
{"json":{"exec_id":"faf3d92fb8453e24d000","node":"n0","src":"noop_notifications.go:32"},"level":"debug","msg":"call to noop publish with notification type [flyteidl.admin.NodeExecutionEventRequest]
s
This is the setup my team did below.
Copy code
workflow_notifications:

    # enabled: false

    enabled: true

    config:

      notifications:

        type: aws

        region: "ap-northeast-2"

        publisher:

          topicName: "arn:aws:sns:ap-northeast-2:<account ID>:<sns name>"

        processor:

          queueName: "<sqs name>"

          accountId: "account id"

        emailer:

          subject: "Notice: Execution \"{{ workflow.name }}\" has {{ phase }} in \"{{ domain }}\"."

          sender:  "<ses vertified mail>"

          body: >

            Execution \"{{ workflow.name }} [{{ name }}]\" has {{ phase }} in \"{{ domain }}\". View details at

            <a href=\http://<flyte url>/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}>

            http://<flyte url>/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}</a>. {{ error }}
p
hmm that looks ok. can you verify this by checking the same on the admin configmap .
Copy code
kubectl get configmap -n <flyte-name-space> flyte-admin-base-config -o yaml
🙏 1
s
this is the configmap settings
Copy code
notifications.yaml: |
    notifications:
      emailer:
        body: |
          Execution \"{{ workflow.name }} [{{ name }}]\" has {{ phase }} in \"{{ domain }}\". View details at
          <a href=\http://<flyte  url>/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}>
          http://<flyte url>/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}</a>. {{ error }}
        sender: <ses vertified mail>
        subject: 'Notice: Execution "{{ workflow.name }}" has {{ phase }} in "{{ domain
          }}".'
      processor:
        accountId: "account id"
        queueName: <sqs name>
      publisher:
        topicName: arn:aws:sns:ap-northeast-2:<account id>:<sns name>
      region: ap-northeast-2
      type: aws
p
Ok looks correct . After adding this change did you restart the flyteadmin pod . Also you can grep this for me in the flyteadmin pods
Using default noop notifications processor implementation for config type
j
Hi, I am colleague of @SeungTaeKim and am answering your question on the behalf of him. There is no message says that it is going to use noop notifications processor. Below is my grepped log.
Copy code
(base) qraft@ai-server0:~/k8s/temp$ k logs -n flyte flyteadmin-cc87d6cf6-dxhn6 | grep noop
{"json":{"src":"factory.go:104"},"level":"info","msg":"Using default noop workflow executor implementation for cloud provider type [local]","ts":"2022-08-18T10:31:14Z"}
{"json":{"exec_id":"f4648f22385a18194000","src":"noop_notifications.go:32"},"level":"debug","msg":"call to noop publish with notification type [flyteidl.admin.WorkflowExecutionEventRequest] and proto message [event:\u003cexecution_id:\u003cproject:\"flytesnacks\" domain:\"development\" name:\"f4648f22385a18194000\" \u003e producer_id:\"propeller\" phase:RUNNING occurred_at:\u003cseconds:1660818728 nanos:781959937 \u003e \u003e ]","ts":"2022-08-18T10:32:08Z"}
p
Logs indicate that it’s somehow reading the type as local . Do you mind restarting the admin pod and grepping for same string
s
I have checked the notification 10 minutes ago, there is aws_processor and it's working now. Do I have set additional things up in AWS (SNS or SQS)?
p
cool . i believe this was after the restart that it got the right config . Are you still not receiving the email notification .
s
No, I haven't. So I consider where I have setup additional things in AWS
p
Can you share the updated logs from admin
s
here is the logs with grepping my mail
Copy code
> k logs -n flyte flyteadmin-6f5c568d57-2x68w | grep <mailto:seungtae.kim@qraftec.com|seungtae.kim@qraftec.com>
{"json":{"exec_id":"f2ede5902d2074e73000","src":"publisher.go:30"},"level":"debug","msg":"Publishing the following message [recipients_email:\"<my mail>\" sender_email:\"<ses vertified mail>\" subject_line:\"Flyte: 'examples.flyte_fixedrate.positive_wf' has succeeded in 'development'.\" body:\"Execution 'examples.flyte_fixedrate.positive_wf [f2ede5902d2074e73000]' has succeeded in 'development'. View details at\\n\u003ca href=\\\\http://<flyte url>/console/projects/flytesnacks/domains/development/executions/f2ede5902d2074e73000\u003e\\nhttp://<flyte url>/console/projects/flytesnacks/domains/development/executions/f2ede5902d2074e73000\u003c/a\u003e. \\n\" ]","ts":"2022-08-19T06:03:47Z"}
{"json":{"exec_id":"fbde63dfdd160676a000","src":"execution_manager.go:1602"},"level":"debug","msg":"publishing notifications for execution [project:\"flytesnacks\" domain:\"development\" name:\"fbde63dfdd160676a000\" ] in state [SUCCEEDED] for notifications [[phases:SUCCEEDED email:\u003crecipients_email:\"<my mail>\" \u003e ]]","ts":"2022-08-19T06:13:47Z"}
p
Can you paste more info without grepping around that time period
👍 1
s
oh, my team leader has restarted flyteadmin pod for notification debugging... I will send back soon
Copy code
Error from server (BadRequest): container "flyteadmin" in pod "flyteadmin-6f64bf5bc5-r7m8f" is waiting to start: PodInitializing
This is the new log after restart (I re-launched worflow)
Copy code
2022/08/19 07:04:33 /go/src/github.com/flyteorg/flyteadmin/pkg/repositories/gormimpl/task_execution_repo.go:84 SLOW SQL >= 200ms
[832.551ms] [rows:1] UPDATE "task_executions" SET "id"=545,"created_at"='2022-08-19 07:04:23.05',"updated_at"='2022-08-19 07:04:32.845',"deleted_at"=NULL,"phase"='RUNNING',"phase_version"=1,"input_uri"='<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a8p8cnvbtsw2b4j6bpn2/n0/data/inputs.pb>',"closure"='<binary>',"started_at"='2022-08-19 07:04:32.835',"task_execution_created_at"='2022-08-19 07:04:23.046',"task_execution_updated_at"='2022-08-19 07:04:32.835',"duration"='0s' WHERE "project" = 'flytesnacks' AND "domain" = 'development' AND "name" = 'noti_test.be_positive' AND "version" = 'fgllg7ge0r_5JKIdNVMGHQ==' AND "execution_project" = 'flytesnacks' AND "execution_domain" = 'development' AND "execution_name" = 'a8p8cnvbtsw2b4j6bpn2' AND "node_id" = 'n0' AND "retry_attempt" = 0
{"json":{"exec_id":"a8p8cnvbtsw2b4j6bpn2","node":"n0","src":"noop_notifications.go:32"},"level":"debug","msg":"call to noop publish with notification type [flyteidl.admin.TaskExecutionEventRequest] and proto message [event:\u003ctask_id:\u003cresource_type:TASK project:\"flytesnacks\" domain:\"development\" name:\"noti_test.be_positive\" version:\"fgllg7ge0r_5JKIdNVMGHQ==\" \u003e parent_node_execution_id:\u003cnode_id:\"n0\" execution_id:\u003cproject:\"flytesnacks\" domain:\"development\" name:\"a8p8cnvbtsw2b4j6bpn2\" \u003e \u003e phase:RUNNING producer_id:\"propeller\" logs:\u003curi:\"<https://logs.qraftpilot.com/app/discover#/?_g=(time:(from:now-1w,to:now))>\u0026_a=(columns:!(log),filters:!((query:(match_phrase:(kubernetes.namespace_name:flytesnacks-development))),(query:(match_phrase:(kubernetes.pod_name:a8p8cnvbtsw2b4j6bpn2-n0-0)))),sort:!(!('@timestamp',asc)))\" name:\"Kubernetes Logs (User)\" message_format:JSON \u003e occurred_at:\u003cseconds:1660892672 nanos:835657908 \u003e input_uri:\"<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a8p8cnvbtsw2b4j6bpn2/n0/data/inputs.pb>\" phase_version:1 task_type:\"python-task\" metadata:\u003cgenerated_name:\"a8p8cnvbtsw2b4j6bpn2-n0-0\" plugin_identifier:\"container\" \u003e event_version:1 \u003e ]","ts":"2022-08-19T07:04:33Z"}
{"json":{"exec_id":"a8p8cnvbtsw2b4j6bpn2","node":"n0","src":"task_execution_manager.go:219"},"level":"debug","msg":"Successfully recorded task execution event [task_id:\u003cresource_type:TASK project:\"flytesnacks\" domain:\"development\" name:\"noti_test.be_positive\" version:\"fgllg7ge0r_5JKIdNVMGHQ==\" \u003e parent_node_execution_id:\u003cnode_id:\"n0\" execution_id:\u003cproject:\"flytesnacks\" domain:\"development\" name:\"a8p8cnvbtsw2b4j6bpn2\" \u003e \u003e phase:RUNNING producer_id:\"propeller\" logs:\u003curi:\"<https://logs.qraftpilot.com/app/discover#/?_g=(time:(from:now-1w,to:now))>\u0026_a=(columns:!(log),filters:!((query:(match_phrase:(kubernetes.namespace_name:flytesnacks-development))),(query:(match_phrase:(kubernetes.pod_name:a8p8cnvbtsw2b4j6bpn2-n0-0)))),sort:!(!('@timestamp',asc)))\" name:\"Kubernetes Logs (User)\" message_format:JSON \u003e occurred_at:\u003cseconds:1660892672 nanos:835657908 \u003e input_uri:\"<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a8p8cnvbtsw2b4j6bpn2/n0/data/inputs.pb>\" phase_version:1 task_type:\"python-task\" metadata:\u003cgenerated_name:\"a8p8cnvbtsw2b4j6bpn2-n0-0\" plugin_identifier:\"container\" \u003e event_version:1 ]","ts":"2022-08-19T07:04:33Z"}
{"json":{"exec_id":"a8p8cnvbtsw2b4j6bpn2","node":"n0","src":"noop_notifications.go:32"},"level":"debug","msg":"call to noop publish with notification type [flyteidl.admin.TaskExecutionEventRequest] and proto message [event:\u003ctask_id:\u003cresource_type:TASK project:\"flytesnacks\" domain:\"development\" name:\"noti_test.be_positive\" version:\"fgllg7ge0r_5JKIdNVMGHQ==\" \u003e parent_node_execution_id:\u003cnode_id:\"n0\" execution_id:\u003cproject:\"flytesnacks\" domain:\"development\" name:\"a8p8cnvbtsw2b4j6bpn2\" \u003e \u003e phase:RUNNING producer_id:\"propeller\" logs:\u003curi:\"<https://logs.qraftpilot.com/app/discover#/?_g=(time:(from:now-1w,to:now))>\u0026_a=(columns:!(log),filters:!((query:(match_phrase:(kubernetes.namespace_name:flytesnacks-development))),(query:(match_phrase:(kubernetes.pod_name:a8p8cnvbtsw2b4j6bpn2-n0-0)))),sort:!(!('@timestamp',asc)))\" name:\"Kubernetes Logs (User)\" message_format:JSON \u003e occurred_at:\u003cseconds:1660892672 nanos:835657908 \u003e input_uri:\"<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a8p8cnvbtsw2b4j6bpn2/n0/data/inputs.pb>\" phase_version:1 task_type:\"python-task\" metadata:\u003cgenerated_name:\"a8p8cnvbtsw2b4j6bpn2-n0-0\" plugin_identifier:\"container\" \u003e event_version:1 \u003e ]","ts":"2022-08-19T07:04:33Z"}
{"json":{"src":"handlers.go:237"},"level":"debug","msg":"Running authentication gRPC interceptor","ts":"2022-08-19T07:04:47Z"}
{"json":{"src":"token.go:83"},"level":"debug","msg":"Could not retrieve bearer token from metadata rpc error: code = Unauthenticated desc = Request unauthenticated with Bearer","ts":"2022-08-19T07:04:47Z"}
{"json":{"src":"handlers.go:247"},"level":"info","msg":"Failed to parse Access Token from context. Will attempt to find IDToken. Error: [JWT_VERIFICATION_FAILED] Could not retrieve bearer token from metadata, caused by: rpc error: code = Unauthenticated desc = Request unauthenticated with Bearer","ts":"2022-08-19T07:04:47Z"}
{"json":{"src":"token.go:103"},"level":"debug","msg":"Could not retrieve id token from metadata rpc error: code = Unauthenticated desc = Request unauthenticated with IDToken","ts":"2022-08-19T07:04:47Z"}
{"json":{"src":"handlers.go:193"},"level":"debug","msg":"gRPC server info in logging interceptor []method [/flyteidl.service.AdminService/CreateTaskEvent]\n","ts":"2022-08-19T07:04:47Z"}
p
Its still calling into noop processor. Could you share the entire log this for admin from the start because earlier you shared a log with aws_processor being registered. Did the config get reverted?
Only one processor would be registered by the pod
s
Copy code
# -- Resource manager configuration
  resource_manager:
    # -- resource manager configuration
    propeller:
      resourcemanager:
        type: noop
should I change this type into aws in values.yaml?
p
yes
along with the rest of the field you already had. did you revert your config
s
Nope, nothing reverted, my team just restarted flyteadmin pod
p
this is the configmap settings
```notifications.yaml: |
notifications:
emailer:
body: |
Execution \"{{ workflow.name }} [{{ name }}]\" has {{ phase }} in \"{{ domain }}\". View details at
<a href=\http://<flyte url>/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}>
http://<flyte url>/console/projects/{{ project }}/domains/{{ domain }}/executions/{{ name }}</a>. {{ error }}
sender: <ses vertified mail>
subject: 'Notice: Execution "{{ workflow.name }}" has {{ phase }} in "{{ domain
}}".'
processor:
accountId: "account id"
queueName: <sqs name>
publisher:
topicName: arnawssnsap northeast 2<account id>:<sns name>
region: ap-northeast-2
type: aws```
This was the config you shared earlier which looked correct . Can you check you notifications config , the one you shared now is from resourcemanager
s
That setting is perfectly same you shared
j
I have one question, below log says that I have cloud provider type 'local', and I think it's correct because I am using 'on premise' cluster. However, I want to use aws notifications even though I am on a local cluster. Is this related to anything caused not to send emails? Thanks.
Copy code
{
  "json": {
    "src": "factory.go:104"
  },
  "level": "info",
  "msg": "Using default noop workflow executor implementation for cloud provider type [local]",
  "ts": "2022-08-19T07:33:01Z"
}
p
This one comes from different part of the configmap . Its part of workflowExecutor module and not notifications eg :
Copy code
workflowExecutor:
    scheme: local
    local:
You can specify a different scheme for each module. In case of notifications its defined with type field. Can we debug this through a call since we went back and forth on this
Or you can do these operations again and verify this
Also you can grep this for me in the flyteadmin pods
Using default noop notifications processor implementation for config type
At this point later in our conversation you had the right config
I have checked the notification 10 minutes ago, there is aws_processor and it’s working now.
Do I have set additional things up in AWS (SNS or SQS)?
Screenshot from 2022-08-19 10-40-46.png
👍 1
j
The previous problem was there was no policy attached to the aws sns topic to send messages to sqs queue, and it is sending messages. However, now, there are logs saying that flyteadmin cannot unmarshall JSON message. My logs following below.
Copy code
{
  "json": {
    "src": "aws_processor.go:52"
  },
  "level": "error",
  "msg": "failed to unmarshall JSON message [ChhzZXVuZ3RhZS5raW1AcXJhZnRlYy5jb20SEW1sb3BzQHFyYWZ0ZWMuY29tGk1GbHl0ZTogJ2V4YW1wbGVzLmZseXRlX2ZpeGVkcmF0ZS5wb3NpdGl2ZV93ZicgaGFzIHN1Y2NlZWRlZCBpbiAnZGV2ZWxvcG1lbnQnLiLgAkV4ZWN1dGlvbiAnZXhhbXBsZXMuZmx5dGVfZml4ZWRyYXRlLnBvc2l0aXZlX3dmIFtmZWUyMmVjZDM2MDE4NWMzZDAwMF0nIGhhcyBzdWNjZWVkZWQgaW4gJ2RldmVsb3BtZW50Jy4gVmlldyBkZXRhaWxzIGF0CjxhIGhyZWY9XGh0dHA6Ly9mbG93LnFyYWZ0cGlsb3QuY29tL2NvbnNvbGUvcHJvamVjdHMvZmx5dGVzbmFja3MvZG9tYWlucy9kZXZlbG9wbWVudC9leGVjdXRpb25zL2ZlZTIyZWNkMzYwMTg1YzNkMDAwPgpodHRwOi8vZmxvdy5xcmFmdHBpbG90LmNvbS9jb25zb2xlL3Byb2plY3RzL2ZseXRlc25hY2tzL2RvbWFpbnMvZGV2ZWxvcG1lbnQvZXhlY3V0aW9ucy9mZWUyMmVjZDM2MDE4NWMzZDAwMDwvYT4uIAo=] from processor with err: invalid character 'C' looking for beginning of value",
  "ts": "2022-08-22T08:48:26Z"
}
Because I saw there was an option in
flyteadmin/pkg/async/notifications/factory.go
which is
enable64decoding
, I changed it to false but no luck. Strange thing is the message is entirely encoded in base64, but the
flyteadmin
expects it to be JSON formatted at first, and then later it decodes messages using base64.
👀 2
p
Can you check one thing for me . What is the value of
Raw message delivery
on the sqs subscription from sns topic
I think the processor expects the json format of the message delivered on sqs but in your case its delivering the Raw message and doesn’t use the json format for delivery https://docs.aws.amazon.com/sns/latest/dg/sns-large-payload-raw-message-delivery.html
cc : @katrina
When i decoded the message i got this and hence i am thinking you have got raw message delivery enabled in your case.
Copy code
echo "ChhzZXVuZ3RhZS5raW1AcXJhZnRlYy5jb20SEW1sb3BzQHFyYWZ0ZWMuY29tGk1GbHl0ZTogJ2V4YW1wbGVzLmZseXRlX2ZpeGVkcmF0ZS5wb3NpdGl2ZV93ZicgaGFzIHN1Y2NlZWRlZCBpbiAnZGV2ZWxvcG1lbnQnLiLgAkV4ZWN1dGlvbiAnZXhhbXBsZXMuZmx5dGVfZml4ZWRyYXRlLnBvc2l0aXZlX3dmIFtmZWUyMmVjZDM2MDE4NWMzZDAwMF0nIGhhcyBzdWNjZWVkZWQgaW4gJ2RldmVsb3BtZW50Jy4gVmlldyBkZXRhaWxzIGF0CjxhIGhyZWY9XGh0dHA6Ly9mbG93LnFyYWZ0cGlsb3QuY29tL2NvbnNvbGUvcHJvamVjdHMvZmx5dGVzbmFja3MvZG9tYWlucy9kZXZlbG9wbWVudC9leGVjdXRpb25zL2ZlZTIyZWNkMzYwMTg1YzNkMDAwPgpodHRwOi8vZmxvdy5xcmFmdHBpbG90LmNvbS9jb25zb2xlL3Byb2plY3RzL2ZseXRlc25hY2tzL2RvbWFpbnMvZGV2ZWxvcG1lbnQvZXhlY3V0aW9ucy9mZWUyMmVjZDM2MDE4NWMzZDAwMDwvYT4uIAo=" |base64 --decode                                                        

seungtae.kim@qraftec.commlops@qraftec.comMFlyte: 'examples.flyte_fixedrate.positive_wf' has succeeded in 'development'."?Execution 'examples.flyte_fixedrate.positive_wf [fee22ecd360185c3d000]' has succeeded in 'development'. View details at
<a href=\<http://flow.qraftpilot.com/console/projects/flytesnacks/domains/development/executions/fee22ecd360185c3d000>>
<http://flow.qraftpilot.com/console/projects/flytesnacks/domains/development/executions/fee22ecd360185c3d000></a>.
k
interesting, do we capture this in the docs that we should not enable raw message delivery?
p
Nope we dont have this mentioned but wanted to confirm that is indeed the case
s
@Prafulla Mahindrakar many thanks to help me and @Jake Yoon to solve notification problem. We finally solve the issue now and it works successfully. there was user's access policy issue in IAM.
Copy code
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": "sqs:*",
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "sns:*",
      "Resource": "arn:aws:sns:<aws region>:<account>:<sns service name>"
    },
    {
      "Sid": "VisualEditor2",
      "Effect": "Allow",
      "Action": "ses:*",
      "Resource": "*"
    }
  ]
}
we have checked SNS & SQS work without problems before, but don't know why emails not arrived. I think the policy setting should be added in this docs, https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/lp_notifications.html Thank you!
p
Would you be able to help with doc contribution for this .that would really help the community . Also did you need to disable raw message delivery
👍 2
s
yeah, we have deactivated raw message delivery, the email comes without any base64 values
p
Cool would be great if you can mention this also in the doc contribution along with the gaps you saw.
👍 1
s
I have more better idea of notification with Grafana alert rule. prometheus collects metrics of flyte, which means grafana checks each pod's status and if pod status becomes pending, failure, or completed, we can send mail through grafana. I think this is the easiest way to send email.
p
Yeah thats also easier if you don’t want to setup notification infra . cc : @katrina to check if these metrics can be relied on for notification or can they be lossy .
184 Views