Calvin Leather
08/18/2022, 11:51 AMDataset does not exist key
warnings logs in datadatalog. We have other map tasks that also ran uncached a little earlier before this one, and they didn't encounter this error (i.e., we're pretty sure from that fact + the code that this isn't the normal "cache missed" message)
The map task in question takes a FlyteFile (i.e., a List[FlyteFile]
is passed to map_task()
) and it returns an int. Maybe this has something to do with flyte files and the data catalog?
Will put full panic trace in threadChris Antenesse
08/18/2022, 10:32 PMpod
that Flyte creates.” does that custom pod template get created in the flytepropeller
deployment? what should be included in that template? im assuming it should specify an image that has a docker config with the auth data. but that image would have to be public? so i think im missing something there.
• the docs then say “Update FlytePropeller about the pod created in the previous step.” does that mean to update the flyte-propeller-config
configmap value co-pilot.yaml.plugins.k8s.default-pod-template-name
to whatever the name
value was in the pod template defined in the previous step?Mücahit
08/19/2022, 9:17 AMEric Hsiao
08/19/2022, 2:28 PM@task(secret_requests=[
Secret(group='hello', key='my_key'),
Secret(group='hello2', key='my_key')
]):
def print_secret(group: str):
sm = current_context().secrets
secret = sm.get(group, 'my_key')
print(secret)
Rupsha Chaudhuri
08/22/2022, 5:36 PMflyte-pod-webhook
instead of flyte-pod-webhook-secret
. Is this expected? End result.. the pod running the task is unable to actually access the secrets and fails.Guilherme
08/23/2022, 9:01 PMsidecar
task and the pod can be terminated as expected using the pod plugin. However, when running a workflow with a notebook task for papermill, the _task_type_ changes to nb-sidecar
. After that, the pod for this task can not be terminated and I don't know why. It seems like the same issue when we launch a workflow without the proper settings for pod plugin.
A code snipet for the task:
def generate_por_spec_for_task():
primary_container = V1Container(name="primary")
pod_spec = V1PodSpec(containers=[primary_container])
return pod_spec
nb = NotebookTask(
name="simple-nb",
task_config=Pod(pod_spec=generate_por_spec_for_task(), primary_container_name="primary"),
notebook_path=os.path.join(
pathlib.Path(file).parent.absolute(), "nb-simple.ipynb"
),
inputs=kwtypes(v=float),
outputs=kwtypes(square=float),
)
A log message in flytepropeller pod is the following:
{"json":{"exec_id":"ahkm5jd76h8l2gn46zt6","node":"n1","ns":"flyteexamples-development","res_ver":"209788243","routine":"worker-2","tasktype":"nb-sidecar","wf":"flyteexamples:development:flyte.workflows.simple.nb_to_python_wf"},"level":"warning","msg":"No plugin found for Handler-type [nb-sidecar], defaulting to [container]","ts":"2022-edited"}
Does anyone has insights about what could possibly going on?James Evers
08/24/2022, 9:25 PM[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f167c07ba12604be28d7-n0-0] terminated with exit code (1). Reason [Error]. Message:
tar: Removing leading `/' from member names
Traceback (most recent call last):
File "/opt/venv/bin/pyflyte-fast-execute", line 8, in <module>
sys.exit(fast_execute_task_cmd())
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/flytekit/bin/entrypoint.py", line 502, in fast_execute_task_cmd
subprocess.run(cmd, check=True)
File "/usr/local/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['pyflyte-execute', '--inputs', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f167c07ba12604be28d7/n0/data/inputs.pb>', '--output-prefix', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-f167c07ba12604be28d7/n0/data/0>', '--raw-output-data-prefix', '<s3://my-s3-bucket/test/n1/f167c07ba12604be28d7-n0-0>', '--checkpoint-path', '<s3://my-s3-bucket/test/n1/f167c07ba12604be28d7-n0-0/_flytecheckpoints>', '--prev-checkpoint', '""', '--dynamic-addl-distro', '<s3://my-s3-bucket/ht/flytesnacks/development/XB5EEQT46XRRFMME7UC7ROF2JA======/scriptmode.tar.gz>', '--dynamic-dest-dir', '/root', '--resolver', 'flytekit.core.python_auto_container.default_task_resolver', '--', 'task-module', 'feast_integration.feast_workflow', 'task-name', 'dummy_thing']' died with <Signals.SIGKILL: 9>.
.
Rupsha Chaudhuri
08/25/2022, 5:53 AMcontinueOnError
suppress real errors?Sebastian
08/25/2022, 11:54 AMjobs:
register-flyte-workflows:
name: Register Flyte workflows
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Setup flytectl
uses: unionai-oss/flytectl-setup-action@v0.0.1
- name: Setup pyflyte
run: pip install flytekit==1.1.*
- name: Serialize project
run: |
mkdir serialized
pyflyte --pkgs flyte.workflows serialize --local-source-root . --image ${{ env.DOCKER_IMAGE }} workflows -f serialized
gives the following errors. The 'setup pyflyte' job prints
ERROR: googleapis-common-protos 1.56.4 has requirement protobuf<5.0.0dev,>=3.15.0, but you'll have protobuf 3.6.1 which is incompatible.
ERROR: grpcio-status 1.47.0 has requirement protobuf>=3.12.0, but you'll have protobuf 3.6.1 which is incompatible.
ERROR: cookiecutter 2.1.1 has requirement requests>=2.23.0, but you'll have requests 2.22.0 which is incompatible.
ERROR: responses 0.21.0 has requirement urllib3>=1.25.10, but you'll have urllib3 1.25.8 which is incompatible.
but the workflow still continues, then the 'serialize project' job crashes with
Traceback (most recent call last):
File "/home/runner/.local/bin/pyflyte", line 5, in <module>
from flytekit.clis.sdk_in_container.pyflyte import main
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/__init__.py", line 164, in <module>
from flytekit.core.base_sql_task import SQLTask
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/core/base_sql_task.py", line 4, in <module>
from flytekit.core.base_task import PythonTask, TaskMetadata
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/core/base_task.py", line 28, in <module>
from flytekit.core.context_manager import ExecutionParameters, FlyteContext, FlyteContextManager, FlyteEntities
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/core/context_manager.py", line 30, in <module>
from flytekit.clients import friendly as friendly_client # noqa
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/clients/friendly.py", line 20, in <module>
from flytekit.models import execution as _execution
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/models/execution.py", line 10, in <module>
from flytekit.models import security
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/models/security.py", line 11, in <module>
class Secret(_common.FlyteIdlEntity):
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/models/security.py", line 22, in Secret
class MountType(Enum):
File "/home/runner/.local/lib/python3.8/site-packages/flytekit/models/security.py", line 23, in MountType
ANY = _sec.Secret.MountType.ANY
AttributeError: 'EnumTypeWrapper' object has no attribute 'ANY'
Error: Process completed with exit code 1.
Can someone please help me figure out what's going on? The commands work when I run them locally using pyflyte 1.1.1.Rupsha Chaudhuri
08/25/2022, 5:57 PM<https://splunk.something.com/en-US/app/search/search?q=search%20index%3Dmy_index%20kubernetes.namespace_name%3D{{.namespace}}%20kubernetes.container_name%3D{{.containerName}}%20kubernetes.pod_name%3D{{.podName}}&display.page.search.mode=smart&dispatch.sample_ratio=1&earliest=-72h&latest=now>
However the actual splunk link generated on the flyte console has none of the templatized fields populatedSebastian
08/26/2022, 9:02 AMjobs:
register-flyte-workflows:
name: Register Flyte workflows
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
# flytekit needs newer version than 3.8 which ships with ubuntu-latest
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- run: pip install flytekit==1.1.*
- name: Setup flytectl
uses: unionai-oss/flytectl-setup-action@v0.0.1
- name: Package workflows
shell: bash
run: |
pyflyte \
--pkgs flyte.workflows package \
--image ${{ env.DOCKER_IMAGE }} \
--output ${{ env.FLYTE_PACKAGE }}
- name: Register workflows
uses: unionai-oss/flyte-register-action@v0.0.2
with:
project: ${{ env.FLYTE_PROJECT }}
version: ${{ env.VERSION }}
proto: ${{ env.FLYTE_PACKAGE }}
domain: ${{ env.FLYTE_DOMAIN }}
config: ${{ env.FLYTE_CONFIG }}
# OR
# - name: Register workflows
# shell: bash
# run: |
# flytectl register files \
# --archive ${{ env.FLYTE_ARCHIVE }} \
# --project ${{ env.FLYTE_PROJECT }} \
# --domain ${{ env.FLYTE_DOMAIN }} \
# --config ${{ env.FLYTE_CONFIG }} \
# --version ${{ env.VERSION }}
`Package workflows`reports success, but Register workflows
using the action fails with Error: input package have some invalid files. try to run pyflyte package again [flyte-package.tgz]
Running Register workflows
using flytectl
is even worse. It fails with a bunch of errors like Failed to unmarshal file /tmp/register789499772/00_flyte.workflows.workflow_name.pb
but it fails silently and still reports succeeding to register resources. A workflows IS indeed registered on the Flyte server, but it is broken and cannot be run. Packaging and registering work if I run locally. Please advice on how to proceed debugging this.Sebastian Schulze
08/26/2022, 9:10 AM@task(
requests=Resources(mem="512Mi", cpu="1"),
limits=Resources(mem="2Gi", cpu="1"),
task_config=Pod(
pod_spec=V1PodSpec(
containers=[V1Container(name="primary")],
service_account="<sa-name>",
service_account_name="<sa-name>"),
primary_container_name="primary",
),
)
However, when executing the workflow it seems that Flyte can no longer fetch the serialised Task inputs from the Flyte GCS bucket and fails with:
Error from command '['gsutil', 'cp', '<gs://flyte-store/metadata/propeller/default-development-fddb5e602ce594338828/n1/data/inputs.pb>', '/tmp/flyte-tz9k8etn/sandbox/local_flytekit/inputs.pb']':
...
raise exceptions.CommunicationError(\napitools.base.py.exceptions.CommunicationError: Could not reach metadata service: Forbidden\n
Interestingly when I put default
as sa-name
everything works fine and the two k8s service accounts are linked to the same gcp-service-account.
I would very much appreciate any pointers towards debugging this or other ways of setting up the Task to be executed with the new service account.
Cheers,
SebChris Antenesse
08/26/2022, 12:57 PMkubectl edit serviceaccount/default -n flyte
apiVersion: v1
imagePullSecrets:
- name: ghcr
kind: ServiceAccount
metadata:
creationTimestamp: "2022-08-18T20:42:23Z"
name: default
namespace: flyte
resourceVersion: "13612623"
uid: 5bc39079-a6a5-4455-ae27-31eaed46c368
secrets:
- name: default-token-mfw8r
then created a secret resource to look like this:
kubectl edit secret/ghcr -n flyte
apiVersion: v1
data:
.dockerconfigjson: <REDACTED>
kind: Secret
metadata:
annotations:
<http://kubectl.kubernetes.io/last-applied-configuration|kubectl.kubernetes.io/last-applied-configuration>: |
{"apiVersion":"v1","kind":"Secret","metadata":{"annotations":{},"name":"ghcr","namespace":"flyte"},"stringData":{".dockerconfigjson":"{\"auths\":{\"<http://ghcr.io|ghcr.io>\":{\"username\":\"<REDACTED>\",\"password\":\"<REDACTED>\",\"email\":\"<REDACTED>\",\"auth\":\"<REDACTED>\"}}}"},"type":"<http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>"}
creationTimestamp: "2022-08-25T21:02:59Z"
name: ghcr
namespace: flyte
resourceVersion: "13628687"
uid: 24dc477e-71cc-42eb-a0be-1bc61af20f5c
type: <http://kubernetes.io/dockerconfigjson|kubernetes.io/dockerconfigjson>
i used the config.json
locally and was able to push and pull images to and from the private registry. but when executing a workflow, i get an image pull error in the UI (below) and when i describe the pod, i get this:
Normal Scheduled 18m default-scheduler Successfully assigned flytesnacks-development/ajjf7v4swhdz7jxzvhwc-n0-0 to ip-10-15-147-196.ec2.internal
Normal Pulling 17m (x4 over 18m) kubelet Pulling image "<http://ghcr.io/predictap/symphony_hall:v0.0.14|ghcr.io/predictap/symphony_hall:v0.0.14>"
Warning Failed 17m (x4 over 18m) kubelet Failed to pull image "<http://ghcr.io/predictap/symphony_hall:v0.0.14|ghcr.io/predictap/symphony_hall:v0.0.14>": rpc error: code = Unknown desc = Error response from daemon: Head "<https://ghcr.io/v2/predictap/symphony_hall/manifests/v0.0.14>": unauthorized
Warning Failed 17m (x4 over 18m) kubelet Error: ErrImagePull
Warning Failed 17m (x6 over 18m) kubelet Error: ImagePullBackOff
Normal BackOff 3m26s (x65 over 18m) kubelet Back-off pulling image "<http://ghcr.io/predictap/symphony_hall:v0.0.14|ghcr.io/predictap/symphony_hall:v0.0.14>"
is there a good way to troubleshoot this? it seems like the docker config may not be present on the node that’s trying to pull the image (maybe one of the ec2 nodes associated with the EKS cluster?)Eric Hsiao
08/29/2022, 6:26 PMAndrew Achkar
08/30/2022, 3:03 PMRupsha Chaudhuri
08/31/2022, 4:44 PMRupsha Chaudhuri
08/31/2022, 10:28 PMJames Evers
09/01/2022, 2:17 AMflytescheduler
pod in the flyte
k8s namespaceClaudio Andres Gauna
09/02/2022, 4:07 AMTony Vec
09/02/2022, 8:37 PMSmriti Satyan
09/05/2022, 9:46 AMArshak Ulubabyan
09/06/2022, 1:39 PMAndrew Achkar
09/08/2022, 5:11 PMPythonTask
and PythonInstanceTask
. With the latter, I am having a difficult time writing a unit test that passes where I am trying to verify if my workflow serialized correctly. More in 🧵Hanno Küpers
09/09/2022, 10:23 AMnodeSelector
keyword in the values.yml
, is that an option?seunggs
09/10/2022, 8:38 PMpyflyte package
fails without it? For example, if task1's output it fed into task2 in the workflow code and task1 does not have return type hint, then I get this error:seunggs
09/10/2022, 8:39 PMAssertionError: Cannot pass output from task task1 that produces no outputs to a downstream task
seunggs
09/10/2022, 8:39 PMSamhita Alla
09/11/2022, 1:13 PMflytectl sandbox start -- source .
. He isn’t able to resolve the error on re-running the command.Raunak Chowdhuri
09/11/2022, 8:28 PMRaunak Chowdhuri
09/11/2022, 8:31 PMRaunak Chowdhuri
09/11/2022, 8:31 PMKetan (kumare3)
09/11/2022, 11:04 PMRaunak Chowdhuri
09/11/2022, 11:07 PMKetan (kumare3)
09/11/2022, 11:11 PM