https://flyte.org logo
#ask-the-community
Title
# ask-the-community
t

Terence Kent

02/28/2024, 1:30 AM
👋 Two more sanity-check questions. Just want to be sure we're "doing it right". Q1: We have lots of tasks where there are two categories of information we want out of them: • The useful output for downstream tasks. Usually, this is just a
FlyteFile
or
FlyteDiretory
or
str
• Information about the task process & output. For example
records_scanned
,
batches_written
or
uncompressed_content_size
To handle this, we keep defining
NamedTuple
s. This works, but I'm suspicious there is a better way I just haven't come across. Q2: During task operations, it's often useful to see some stat sampling/logging. For longer running tasks especially, it saves a lot of time if we can see some status output of the task before moving onto some other work. To handle this, we've been writing out status information via stdout in our tasks and then viewing the container logs. This works too, but it does feel a bit odd. Are others doing about the same thing? Or, are there better solutions I've just been missing
s

Samhita Alla

02/28/2024, 10:15 AM
hi! Q1: when returning multiple outputs from a task, it's common to use namedtuple or dataclass. Q2: have you tried using gate nodes? https://docs.flyte.org/en/latest/user_guide/advanced_composition/waiting_for_external_inputs.html#waiting-for-external-inputs
t

Terence Kent

02/28/2024, 5:16 PM
@Samhita Alla - Thanks for the response! For Q2, I'm not sure I follow how gate nodes would help for what I'm referring too. It can see how that would help when passing data into tasks and especially approvals (very cool feature, btw). However, I'm more looking for per-task observability - specifically around the progress of a task. Flyte's API/UI does a great job of showing the DAG, how long tasks took, etc. But, I don't know of any way to track how a single task is progressing outside of looking at the pod logs. Today, we are mixing sampled stats & logs by writing them as JSON lines on stdout. This works great because logs are neatly separated per-task execution, but it can get a little hard to read and it feels like we're missing something. I've noticed the monitoring docs and even that
stats
object available from
current_context()
(here) - but these do not refer to per-task stats. They seem to be my "flyte general stats".
All this is to say: What we have is working ok, but I want to be sure there isn't a better pattern for it that we're just missing.