when running a `pytorch` job, the cloudwatch log l...
# flyte-support
c
when running a
pytorch
job, the cloudwatch log link is broken because it is missing some variables as you can see in the screenshot currently I'm defining the template like this:
Copy code
`<https://console.aws.amazon.com/cloudwatch/home?region=${props.cluster.env.region}#logEventViewer:group=/aws/containerinsights/${props.cluster.clusterName}/application;stream={{> .nodeName }}-application.var.log.containers.{{ .podName }}_{{ .namespace }}_{{ .containerName }}-{{ .containerId }}.log`
this works fine for other regular python tasks is there a way i can get the logs to work properly for pytorch?
I see the logs definitely exist for the pytorch jobs too with the same format as defined above - it just seems like the variables of
nodeName
,
containerName
, and
containerId
are missing for this job type
t
how is the template defined in the k8s propeller config?
c
If you are willing to contribute, you could refer to this PR I made a while ago to add support for some other templating vars in the kubeflow plugin.
If you are up to contributing but don’t know how to get started, I’m happy to jump on a call 🙂
These are the vars I would expect are available:
Copy code
if taskType == PytorchTaskType && hasMaster {
		masterTaskLog, masterErr := logPlugin.GetTaskLogs(
			tasklog.Input{
				PodName:              name + "-master-0",
				Namespace:            namespace,
				LogName:              "master",
				PodRFC3339StartTime:  RFC3999StartTime,
				PodRFC3339FinishTime: RFC3999FinishTime,
				PodUnixStartTime:     startTime,
				PodUnixFinishTime:    finishTime,
				TaskExecutionID:      taskExecID,
				TaskTemplate:         taskTemplate,
			},
		)
It is true that those are fewer than e.g. here.
c
awesome, I can give it a try and reach out to setup a quick call if I get blocked. for reference, what time zone are you on?
c
CET
Sounds good, ping me in case you are blocked