Hi :raised_hands:we are trying to implement workfl...
# ask-the-community
m
Hi 🙌we are trying to implement workflow Slack notifications. We have deployed Flyte in EKS with the flyte-binary chart. Our workflow has the notification set:
Copy code
project_launch_plan = LaunchPlan.create(
    name=f"{schedule_config.name}-{pipeline.name}",
    workflow=pipeline,
    default_inputs=schedule_config.default_inputs,
    fixed_inputs=schedule_config.fixed_inputs,
    schedule=schedule_config.schedule,
    notifications=[
        Slack(
            phases=[
                WorkflowExecutionPhase.FAILED,
                WorkflowExecutionPhase.SUCCEEDED,
                WorkflowExecutionPhase.ABORTED,
                WorkflowExecutionPhase.TIMED_OUT,
            ],
            recipients_email=["<slack channel email>"],
        )
    ],
)
But in our slack channel nothing appears when the launch plan succeeds. In the documentation there is this page that explains how to configure the FlyteAdmin, how we can do that with the flyte-binary chart? Also we have seen this pr to add webhook option, will this allow us to send notifications without creating extra infrastructure?
y
i would not rely on the pr for now. the existing notifications mechanism should work fine.
configuration is the same regardless of the chart, except for the place you put it.
hmm
and that key doesn’t appear to be in the config map template https://github.com/flyteorg/flyte/blob/master/charts/flyte-binary/templates/configmap.yaml
m
Thank youu! I will try this 🙌
also how can I debug the notifications, Just I run
kubectl logs <pod flyte bynari>
?
I have set the notifications config as follows:
Copy code
notifications:
      type: "aws"
      region: "eu-west-1"
      publisher:
        topicName: "<sns arn>"
      processor:
        queueName: "<sqs name>"
        accountId: "< aws account id>"
      emailer:
        subject: "Notice: Execution \"{{ workflow.name }}\" has {{ phase }} in \"{{ domain }}\"."
        sender:  "<email>"
        body: >
          Execution \"{{ workflow.name }} [{{ name }}]\" has {{ phase }} in \"{{ domain }}\". View details at
          <a href=\<http://flyte.company.com/console/projects/{{> project }}/domains/{{ domain }}/executions/{{ name }}>
          <http://flyte.company.com/console/projects/{{> project }}/domains/{{ domain }}/executions/{{ name }}</a>. {{ error }}
But when i update with helm:
parse error at (flyte-binary/templates/deployment.yaml:4): function "workflow" not defined
. It is possible that this is only available with the flyte-core chart?
p
I'm also curious how I can debug notifications, I don't see any logs in the flyte pod regarding workflow notification delivery failures, and the workflow completes yet I never receive any emails indicating its completion status. I have things set up very similarly to how @Marti Jorda Roca does here by the looks of it, and am seeing the same workflow is not defined error as them.
m
For what I have seen, notifications work only wiht the chart flyte-core an not the binary
d
Does anyone know the difference between
workflow_notifications
and
notifications
sections? Thanks! (I only see workflow_notifications here as an option in flyte-core)
p
Yeah, I see the same
parse error at (flyte-binary/templates/deployment.yaml:4): function "workflow" not defined
error with the
flyte-binary
chart, even putting it
inline
as @Yee suggested above yields this error (I tried both
workflow-notifications
and
notifications
). Is there any way to get notifications going with just the
flyte-binary
? Does anyone have a functioning example?
y
it should work in either case - from any chart, you need to make sure that ultimately the admin component can pick it up.
can you hop into the container image and cat the /etc/flyte/ config files just to confirm that it’s at least showing up as yaml correctly?
want to separate helm chart issues, from fundamental code/flyte config issues
also sns/sqs are configured correctly right? any indication on the aws console pages?
p
I'm able to successfully send test SNS/SQS messages via AWS, something tricky going on with
flyte-binary
and notifications though. Do you see something wrong with this indentation at first glance?
Copy code
configuration:
  database:
    password: ***************
    host: **************
    dbname: some-name
  storage:
    metadataContainer: some-name-dev-meta
    userDataContainer: some-name-dev-user
    provider: s3
    providerConfig:
      s3:
        region: "us-east-1"
        authType: "iam"
  inline:
    plugins:
      k8s:
        inject-finalizer: true
        default-env-vars:
          - AWS_METADATA_SERVICE_TIMEOUT: #
          - AWS_METADATA_SERVICE_NUM_ATTEMPTS: ##
    storage:
      cache:
        max_size_mbs: ###
        target_gc_percent: ###
    tasks:
      task-plugins:
        enabled-plugins:
          - container
          - sidecar
          - k8s-array
        default-for-task-types:
          - container: container
          - sidecar: sidecar
          - container_array: k8s-array
    notifications:
      enabled: true
      config:
        notifications:
          type: "aws"
          region: "us-east-1"
          publisher:
            topicName: "arn:aws:sns:us-east-1:############:some-name"
          processor:
            queueName: "some-name"
            accountId: "############"
          emailer:
            subject: 'Notice: Execution "{{ workflow.name }}" has {{ phase }} in "{{ domain }}".'
            sender: "<mailto:email@address.com|email@address.com>"
            body: >
              Execution \"{{ workflow.name }} [{{ name }}]\" has {{ phase }} in \"{{ domain }}\". View details at
              <a href=<http://localhost:8088/console/projects/{{> project }}/domains/{{ domain }}/executions/{{ name }}>
              <http://localhost:8088/console/projects/{{> project }}/domains/{{ domain }}/executions/{{ name }}</a>. {{ error }}

serviceAccount:
  create: true
  annotations:
    <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: "arn:aws:iam::############:role/#####-#########"
y
can you try deploying without the template for now?
body: “hi this is the email body”
same for subject
get the basic pattern working first.
p
I can see this for my inline config:
Copy code
# cat 010-inline-config.yaml
notifications:
  config:
    notifications:
      emailer:
        body: hi this is the email body
        sender: <mailto:email@address.com|email@address.com>
        subject: hi this is the email subject
      processor:
        accountId: "###"
        queueName: some-name
      publisher:
        topicName: arn:aws:sns:us-east-1:##:some-name
      region: us-east-1
      type: aws
  enabled: true
plugins:
  k8s:
    default-env-vars:
    - AWS_METADATA_SERVICE_TIMEOUT: #
    - AWS_METADATA_SERVICE_NUM_ATTEMPTS: ##
    inject-finalizer: true
storage:
  cache:
    max_size_mbs: ###
    target_gc_percent: ###
tasks:
  task-plugins:
    default-for-task-types:
    - container: container
    - sidecar: sidecar
    - container_array: k8s-array
    enabled-plugins:
    - container
    - sidecar
    - k8s-array
y
and still not working?
wait
inside the file
Copy code
notifications:
  config:
should not be there
dedent it two levels
p
This format does indeed at least attempt to access your queue and topic for `flyte-binary`:
Copy code
configuration:
  inline:
    notifications:
      type: "aws"
      region: "us-east-1"
      publisher:
        topicName: "arn:aws:sns:us-east-1:#########:some-name"
      processor:
        queueName: "some-name"
        accountId: "#########"
      emailer:
        sender: "<mailto:email@address.com|email@address.com>"
        subject: "hi this is the email subject"
        body: "hi this is the email body"
Unfortunately if we switch to the more complex syntax (with
{{ workflow }}
), we still see
function "workflow" not defined
errors. Still trying to debug that side of things. Thank you again for all of your help @Yee, truly appreciate your time and effort in this thread.
Even with the simple subject and body:
Copy code
subject: "hi this is the email subject"
        body: "hi this is the email body"
We don't see that in the notification. So something fishy is going on with the
emailer
config here. The subject line of the email I get is
[EXTERNAL]flyteidl.admin.EmailNotification
And the body is a base64 encoded string, decoding it yields this (seems like the two email addresses are the sender's email address and the recipient's email address):
"<mailto:email@address.com|email@address.com>"email@address.comhi this is the email subject"hi this is the email body
y
just stick to the normal email body for now.
can you confirm that sqs is receiving items correctly?
if you’re getting an email then it should be.
and what emailer are you using? ses?
p
Yes, I'm using SES for email.
SNS messages come to my inbox (and they look like the regular text I'm inputting, not base64 encoded):
aws sns publish --topic-arn "arn:aws:sns:us-east-1:#######:some-name" --message "Test message"
Can confirm that sending SQS test messages also sends emails to my inbox (also not base64 encoded):
aws sqs send-message --queue-url "<https://sqs.us-east-1.amazonaws.com/#####/some-name>" --message-body "Direct test message
y
any luck with this today?
i haven’t had time to think more yet
p
Still no luck unfortunately, thanks for continuing to dig into this! Yeah, I was looking at that code block, from what I can tell I don't see how it could get sent out without being decoded from base64 first. 🤔
Created an issue here for posterity: https://github.com/flyteorg/flyte/issues/4024
y
added some debug logging here: https://github.com/flyteorg/flyteadmin/pull/614, puling it into here: https://github.com/flyteorg/flyte/pull/4053 and building a single binary image from that branch here: https://github.com/flyteorg/flyte/actions/runs/6242719645
the gh action job should produce a new single-binary image can you can use to debug. the print statements should be enough to get to the bottom of this.
if it’s not we can add more.
@Eduardo Apolinario (eapolinario) the gh action still works as intended right? i’m not making changes to admin in flyte, i’m changing admin in admin, and doing a go get of admin
e
correct
p
Thanks for spinning up this debug build for me @Yee I'm pulling it in via this (correct me if this is wrong?):
Copy code
image:
  repository: <http://ghcr.io/flyteorg/flyte-sandbox-bundled|ghcr.io/flyteorg/flyte-sandbox-bundled>
  tag: sha-36d4f2fa621ac94028c060f4a45276817cee617f
In the logs I see this (weirdly nothing about base64, which makes me suspect that I might not have the write repository?):
Copy code
{"json":{"src":"composite_workqueue.go:88"},"level":"debug","msg":"Subqueue handler batch round","ts":"2023-09-20T17:15:24Z"}
{"json":{"src":"composite_workqueue.go:98"},"level":"debug","msg":"Dynamically configured batch size [-1]","ts":"2023-09-20T17:15:24Z"}
{"json":{"src":"composite_workqueue.go:129"},"level":"debug","msg":"Exiting SubQueue handler batch round","ts":"2023-09-20T17:15:24Z"}
{"json":{"src":"aws_emailer.go:63"},"level":"debug","msg":"Sent email to [email@address.com] sub: hi this is the subject","ts":"2023-09-20T17:15:24Z"}
{"json":{"src":"composite_workqueue.go:88"},"level":"debug","msg":"Subqueue handler batch round","ts":"2023-09-20T17:15:25Z"}
{"json":{"src":"composite_workqueue.go:98"},"level":"debug","msg":"Dynamically configured batch size [-1]","ts":"2023-09-20T17:15:25Z"}
{"json":{"src":"composite_workqueue.go:129"},"level":"debug","msg":"Exiting SubQueue handler batch round","ts":"2023-09-20T17:15:25Z"}
{"json":{"src":"composite_workqueue.go:88"},"level":"debug","msg":"Subqueue handler batch round","ts":"2023-09-20T17:15:26Z"}
{"json":{"src":"composite_workqueue.go:98"},"level":"debug","msg":"Dynamically configured batch size [-1]","ts":"2023-09-20T17:15:26Z"}
{"json":{"src":"composite_workqueue.go:129"},"level":"debug","msg":"Exiting SubQueue handler batch round","ts":"2023-09-20T17:15:26Z"}
The fact that this isn't base64 encoded makes me think it could be a problem on the AWS side? But that doesn't explain why sending test AWS messages do not come through as base64 encoded, they come through as the inputted message as expected...
y
that’s the correct image yeah - can you confirm that that’s what the pod is actually running?
get pod -o yaml | grep image
p
Is this image built the exact same way as the default images? With the default repo with
latest
tag we can deploy Flyte just fine but with that image we fail readiness checks:
Copy code
deployment:
  image:
    pullPolicy: IfNotPresent
    repository: "<http://cr.flyte.org/flyteorg/flyte-binary|cr.flyte.org/flyteorg/flyte-binary>" # default image works fine
    tag: "latest"
    # repository: <http://ghcr.io/flyteorg/flyte-sandbox-bundled|ghcr.io/flyteorg/flyte-sandbox-bundled>
    # tag: sha-36d4f2fa621ac94028c060f4a45276817cee617f
Copy code
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m39s                  default-scheduler  Successfully assigned am-flyte-dev/am-flyte-flyte-binary-8596c6cf56-v2ncj to ip-###.###.###.###.ec2.internal
  Normal   Pulled     3m38s                  kubelet            Container image "postgres:15-alpine" already present on machine
  Normal   Created    3m38s                  kubelet            Created container wait-for-db
  Normal   Started    3m38s                  kubelet            Started container wait-for-db
  Warning  BackOff    3m14s                  kubelet            Back-off restarting failed container
  Normal   Pulled     3m8s (x3 over 3m37s)   kubelet            Container image "<http://ghcr.io/flyteorg/flyte-sandbox-bundled:sha-36d4f2fa621ac94028c060f4a45276817cee617f|ghcr.io/flyteorg/flyte-sandbox-bundled:sha-36d4f2fa621ac94028c060f4a45276817cee617f>" already present on machine
  Normal   Created    3m8s (x3 over 3m36s)   kubelet            Created container flyte
  Normal   Started    3m8s (x3 over 3m36s)   kubelet            Started container flyte
  Warning  Unhealthy  2m58s (x9 over 3m35s)  kubelet            Readiness probe failed: Get "<http://192.168.108.142:8088/healthcheck>": dial tcp ###.###.###.###:8088: connect: connection refused
  Warning  Unhealthy  2m58s (x3 over 3m28s)  kubelet            Liveness probe failed: Get "<http://192.168.108.142:8088/healthcheck>": dial tcp ###.###.###.###:8088: connect: connection refused
y
built exactly the same way.
any logs?
oh.. no wrong image
the
repository
field shouldn’t change
use the flyte-binary image, not flyte-sandbox-bundled
p
Here are the additional logs, does anything stand out to you @Yee?:
Copy code
{
    "json": {
        "src": "aws_processor.go:46"
    },
    "level": "info",
    "msg": "debugb64 Original stringMsg [{\n  \"Type\" : \"Notification\",\n  \"MessageId\" : \"a85d47a2-1325-547d-ae72-1a2485c93d77\",\n  \"TopicArn\" : \"arn:aws:sns:us-east-1:##########:am-flyte\",\n  \"Subject\" : \"flyteidl.admin.EmailNotification\",\n  \"Message\" : \"CiJwZXRlci5rbGluZ2VsaG9mZXJAZW5lcmd5dmF1bHQuY29tEiJwZXRlci5rbGluZ2VsaG9mZXJAZW5lcmd5dmF1bHQuY29tGhZoaSB0aGlzIGlzIHRoZSBzdWJqZWN0IhNoaSB0aGlzIGlzIHRoZSBib2R5\",\n  \"Timestamp\" : \"2023-09-21T19:08:28.909Z\",\n  \"SignatureVersion\" : \"1\",\n  \"Signature\" : \"bIDTSNY2vjvRX7aX17oNTKH9Xh3ckqmy30XB/YU68TAlr42cIsuTDb0xS68V/n3KhrWlhxo4JPRHWLEdy4kP+jWO152M4tud5rx26jpKh0GH5tDhzt0rzebKrLdodNOnIyM5SKb8MzUZ36I699JFnIw8wtVkfUfgN6ciRIxTJAq9HqDq8f3DwLOu/OKl3SZSKkvRJecDxXgRdhHAA3oLCcpBVZvdG2UVikPNpHxJfzzl4V6IUJ7Ze30UWTr3BTHVWbBUsbcrR8J5jTMDjmSr+pUD8iU/ppJ3AXHc4PcM/3uUQCDYxMg3Pl6PXFLpg1C28v0ikKBBCbpUeC1/BrDtbQ==\",\n  \"SigningCertURL\" : \"<https://sns.us-east-1.amazonaws.com/SimpleNotificationService-#########.pem>\",\n  \"UnsubscribeURL\" : \"<https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe>\u0026SubscriptionArn=arn:aws:sns:us-east-1:##########:am-flyte:b######-860f-414d-bac4-5fe3e7dd2186\"\n}]",
    "ts": "2023-09-21T19:08:28Z"
}
{
    "json": {
        "src": "aws_processor.go:59"
    },
    "level": "info",
    "msg": "debugb64 snsJSONFormat [map[Message:CiJwZXRlci5rbGluZ2VsaG9mZXJAZW5lcmd5dmF1bHQuY29tEiJwZXRlci5rbGluZ2VsaG9mZXJAZW5lcmd5dmF1bHQuY29tGhZoaSB0aGlzIGlzIHRoZSBzdWJqZWN0IhNoaSB0aGlzIGlzIHRoZSBib2R5 MessageId:a85d47a2-1325-547d-ae72-1a2485c93d77 Signature:bIDTSNY2vjvRX7aX17oNTKH9Xh3ckqmy30XB/YU68TAlr42cIsuTDb0xS68V/n3KhrWlhxo4JPRHWLEdy4kP+jWO152M4tud5rx26jpKh0GH5tDhzt0rzebKrLdodNOnIyM5SKb8MzUZ36I699JFnIw8wtVkfUfgN6ciRIxTJAq9HqDq8f3DwLOu/OKl3SZSKkvRJecDxXgRdhHAA3oLCcpBVZvdG2UVikPNpHxJfzzl4V6IUJ7Ze30UWTr3BTHVWbBUsbcrR8J5jTMDjmSr+pUD8iU/ppJ3AXHc4PcM/3uUQCDYxMg3Pl6PXFLpg1C28v0ikKBBCbpUeC1/BrDtbQ== SignatureVersion:1 SigningCertURL:<https://sns.us-east-1.amazonaws.com/SimpleNotificationService-##########.pem> Subject:flyteidl.admin.EmailNotification Timestamp:2023-09-21T19:08:28.909Z TopicArn:arn:aws:sns:us-east-1:##########:am-flyte Type:Notification UnsubscribeURL:<https://sns.us-east-1.amazonaws.com/?Action=Unsubscribe>\u0026SubscriptionArn=arn:aws:sns:us-east-1:##########:am-flyte:######-860f-414d-bac4-5fe3e7dd2186]]",
    "ts": "2023-09-21T19:08:28Z"
}
{
    "json": {
        "src": "aws_processor.go:88"
    },
    "level": "info",
    "msg": "debugb64 Decoded valueString [CiJwZXRlci5rbGluZ2VsaG9mZXJAZW5lcmd5dmF1bHQuY29tEiJwZXRlci5rbGluZ2VsaG9mZXJAZW5lcmd5dmF1bHQuY29tGhZoaSB0aGlzIGlzIHRoZSBzdWJqZWN0IhNoaSB0aGlzIGlzIHRoZSBib2R5] to [[10 34 112 101 116 101 114 46 107 108 105 110 103 101 108 104 111 102 101 114 64 101 110 101 114 103 121 118 97 117 108 116 46 99 111 109 18 34 112 101 116 101 114 46 107 108 105 110 103 101 108 104 111 102 101 114 64 101 110 101 114 103 121 118 97 117 108 116 46 99 111 109 26 22 104 105 32 116 104 105 115 32 105 115 32 116 104 101 32 115 117 98 106 101 99 116 34 19 104 105 32 116 104 105 115 32 105 115 32 116 104 101 32 98 111 100 121]]",
    "ts": "2023-09-21T19:08:28Z"
}
{
    "json": {
        "src": "aws_emailer.go:54"
    },
    "level": "info",
    "msg": "debugb64 Sending email hi this is the body",
    "ts": "2023-09-21T19:08:28Z"
}
{
    "json": {
        "src": "aws_emailer.go:64"
    },
    "level": "debug",
    "msg": "Sent email to [email@address.com] sub: hi this is the subject",
    "ts": "2023-09-21T19:08:29Z"
}
m
Hi, I am using the flyte-core chart and the flyteadmin pods raises this error:
InvalidAddress: The address <https://sqs.eu-west-1.amazonaws.com/> is not valid for this endpoint.\n\tstatus code: 404
Any idea what is wrong? PD: sorry for this subject that it is not related to what your are currently talking.
d
@Marti Jorda Roca not sure if related, some recent changes to the notifications config for
flyte-core
in this PR, but I don't think there's been a new release with this included yet
m
thanks I will try to downgrade the helm version
y
@Marti Jorda Roca let’s start a different thread for that.
i think maybe separate, and even if not, good to have all that context separate
@Peter Klingelhofer i’m not really sure what’s happening. these log lines show the correct thing happening.
if you decode the base64 shown in the message, you can see the structure of the
admin.EmailMessage
object, and ultimately, this is the log line that is basically right before the information is sent off to SES, and it is correctly showing
hi this is the body
the only logic that happens after that is the
FlyteEmailToSesEmailInput
function, which just extracts the email.Body
we can add one more log line if you want, to just dump the final
ses.SendEmailInput
object before it’s sent, but I can’t see that being an issue
p
@Yee thanks for looking into it. Not too worried about base64 decoding it if I have to, but the fact that I can’t use ‘workflow’ and those other variables are undefined in the helm chart is the main thing blocking notifications being useful over here. Any ideas as to how we could debug that, since it seems to be appearing inline as we would expect?
y
@Peter Klingelhofer sorry what are you talking about? is there a missing helm chart configuration?
is that something you think you could help add or would you need us to?
let’s start a new thread for that. this one is getting a bit long.
p
@Yee it’s the same error that @Marti Jorda Roca was originally getting (hence why I chimed in in this thread 😀) - ‘function workflow is not defined’ if you try to use the ‘workflow’ (or any of the other variables in the docs) in the template literal (like the example in the docs).
Happy to start a new thread if that’s preferable, but that error is the same as the one the original poster was getting :)
y
new thread plz
always better to re-synthesize
p
Solution via escaping brackets here in the new thread: https://flyte-org.slack.com/archives/CP2HDHKE1/p1695826300502159