<@U06RKAMK0UC> I am getting error but in my classc...
# ask-ai
m
@ancient-wolf-19325 I am getting error but in my classclass i am not specifying
DataClassJsonMixin
or
dataclass_json
Copy code
"Failed to extract schema for object <class 'ml_pipeline_components.data_processing.dataset.Image'>, error: m │
│ aximum recursion depth exceeded while calling a Python object\nPlease remove `DataClassJsonMixin` and `dataclass_json` decorator from the dataclass definition"}
j
The error you're encountering suggests that Flyte is having trouble extracting the schema for your dataclass, likely due to a recursion issue. According to the Flyte documentation, if you're using Flytekit version >= v1.11.1, you don't need to decorate your dataclass with
@dataclass_json
or inherit from
DataClassJSONMixin
. Would you like more details on how to resolve this issue? Sources: - Flyte Dataclass Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 how do i check my flytekit version?
j
To check your Flytekit version, you can use the Flytekit CLI. According to the Flytekit CLI documentation, you can run the following command:
Copy code
flytekit --version
This will print out the current version of Flytekit installed in your environment. Would you like more details on this? Sources: - Flytekit CLI Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 My loguru logs are not showing on the flyte console
j
To ensure that your Loguru logs are visible in the Flyte console, you need to configure Flyte to capture and display these logs. According to Flyte's logging documentation, you can set up Flyte to connect to an external logging system like CloudWatch, Stackdriver, etc. Additionally, you need to ensure that Loguru is properly configured to work with Flyte's logging system. Here is an example of how to configure Loguru to integrate with Flyte's logging:
Copy code
from loguru import logger
import os

# Configure Loguru to use Flyte's logging level
logger.remove()
logger.add(sys.stdout, level=os.getenv("FLYTE_SDK_LOGGING_LEVEL", "INFO"))

# Example task
@task
def my_task():
    logger.info("This is an info message")
    logger.warning("This is a warning message")
    logger.error("This is an error message")
Make sure to set the
FLYTE_SDK_LOGGING_LEVEL
environment variable to the desired logging level (e.g., "INFO", "DEBUG"). Would you like more details on configuring Flyte for external logging systems? Sources: - https://github.com/flyteorg/flyte/issues/4830 - https://github.com/flyteorg/flyte/issues/4829 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 my dataclass has another dataclass nested, does Flyte support this
j
Flyte currently has limited support for nested dataclasses. According to this GitHub issue, accessing attributes on nested dataclasses can lead to type mismatches during workflow compilation. This limitation forces users to create numerous wrappers for inputs and outputs, complicating the workflow. Would you like more details on potential workarounds or further explanation of the issue? Sources: - BUG: Accessing attributes fails on complex types - Docs: Explicitly note that nested dataclasses are not supported in the type system 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 If i want to pass around nested dataclass, how can I do that in Flyte?
j
To pass around nested dataclasses in Flyte, you can use a workaround by serializing the nested dataclass into a JSON string and then deserializing it back into a dataclass within your tasks. This approach bypasses the current limitations of Flyte's type system regarding nested dataclasses. Here's an example of how you can achieve this: 1. Define your nested dataclasses:
Copy code
from dataclasses import dataclass
    from mashumaro.mixins.json import DataClassJSONMixin

    @dataclass
    class InnerDataClass(DataClassJSONMixin):
        a: int
        b: str

    @dataclass
    class OuterDataClass(DataClassJSONMixin):
        x: int
        y: InnerDataClass
2. Serialize and deserialize the dataclasses in your tasks:
Copy code
from flytekit import task, workflow
    import json

    @task
    def serialize_dataclass(data: OuterDataClass) -> str:
        return data.to_json()

    @task
    def deserialize_dataclass(data_str: str) -> OuterDataClass:
        return OuterDataClass.from_json(data_str)

    @workflow
    def my_workflow() -> OuterDataClass:
        inner = InnerDataClass(a=1, b="test")
        outer = OuterDataClass(x=10, y=inner)
        serialized = serialize_dataclass(data=outer)
        deserialized = deserialize_dataclass(data_str=serialized)
        return deserialized
3. Run the workflow:
Copy code
if __name__ == "__main__":
        result = my_workflow()
        print(result)
This approach ensures that the nested dataclass is passed around as a JSON string, which Flyte can handle without issues. Would you like more details on this approach or any other specific part of the implementation? Sources: - BUG: Accessing attributes fails on complex types 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 I am trying to pass a enum in the workflow params and use the value and pass it to a task. But I am getting error
KeyError: "Error encountered while executing 'prepare_dataset_workflow':\n Promise(node:.dataset_type.['value'])"
j
The error you're encountering suggests that there might be an issue with how the enum is being passed and used within the workflow. Flyte supports using enums as inputs and outputs for tasks and workflows, but there are specific ways to define and use them. Here's a step-by-step example of how to correctly pass an enum in the workflow parameters and use it in a task: 1. Define the Enum:
Copy code
from enum import Enum
    from flytekit import task, workflow

    class DatasetType(Enum):
        IMAGE = "image"
        TEXT = "text"
2. Define the Task:
Copy code
@task
    def process_dataset(dataset_type: DatasetType) -> str:
        return f"Processing dataset of type: {dataset_type.value}"
3. Define the Workflow:
Copy code
@workflow
    def prepare_dataset_workflow(dataset_type: DatasetType) -> str:
        return process_dataset(dataset_type=dataset_type)
4. Run the Workflow:
Copy code
if __name__ == "__main__":
        result = prepare_dataset_workflow(dataset_type=DatasetType.IMAGE)
        print(result)
Make sure that the enum values are correctly referenced using
.value
when needed, and ensure that the enum is properly imported and used consistently across tasks and workflows. Would you like more details on this example or any specific part of the implementation? Sources: - Flyte Enum Type Documentation - Flytesnacks Enum Example 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 I am manually serializing the dataclass by doing json.dumps but getting error
RecursionError: maximum recursion depth exceeded while calling a Python object
j
The
RecursionError: maximum recursion depth exceeded while calling a Python object
error typically occurs when there is a circular reference or deeply nested structure that Python's default
json
module cannot handle. To avoid this, you can use the
mashumaro
library, which Flyte already uses for dataclass serialization and deserialization. Here's how you can use
mashumaro
to serialize and deserialize your dataclass: 1. Define your dataclasses with
mashumaro
:
Copy code
from dataclasses import dataclass
    from mashumaro.mixins.json import DataClassJSONMixin

    @dataclass
    class InnerDataClass(DataClassJSONMixin):
        a: int
        b: str

    @dataclass
    class OuterDataClass(DataClassJSONMixin):
        x: int
        y: InnerDataClass
2. Serialize and deserialize using
mashumaro
:
Copy code
from flytekit import task, workflow

    @task
    def serialize_dataclass(data: OuterDataClass) -> str:
        return data.to_json()

    @task
    def deserialize_dataclass(data_str: str) -> OuterDataClass:
        return OuterDataClass.from_json(data_str)

    @workflow
    def my_workflow() -> OuterDataClass:
        inner = InnerDataClass(a=1, b="test")
        outer = OuterDataClass(x=10, y=inner)
        serialized = serialize_dataclass(data=outer)
        deserialized = deserialize_dataclass(data_str=serialized)
        return deserialized
3. Run the workflow:
Copy code
if __name__ == "__main__":
        result = my_workflow()
        print(result)
By using
mashumaro
, you can avoid the recursion issues that arise with deeply nested structures or circular references, as it is designed to handle such cases efficiently. Would you like more details on this approach or any other specific part of the implementation? Sources: - Flyte Dataclass Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 I am getting this after serializing my dataclass and try to pass it to another task
Copy code
Workflow[krystal-test:development:workflows.prepare_dataset.prepare_dataset_workflow] failed. RuntimeExecutionError: max number of system retry attempts [11/10] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: output file @[<s3://flyte-metadata.dev-cluster.degould.com/metadata/propeller/krystal-test-development-ax2mzg5vrzh2rjvs9mxp/n0/data/0/outputs.pb>] is too large [3282219] bytes, max allowed [2097152] bytes
j
The error you're encountering indicates that the serialized output of your dataclass exceeds the maximum allowed size for Flyte's output files. By default, Flyte has a limit on the size of outputs that can be passed between tasks, which is typically set to 2MB. To handle large outputs, you can use Flyte's built-in support for offloading large data to an object store (e.g., S3) using
FlyteFile
or
FlyteDirectory
. This approach allows you to store large data externally and pass references to this data between tasks. Here's how you can modify your workflow to use
FlyteFile
for large outputs: 1. Modify your tasks to use
FlyteFile
:
Copy code
from flytekit import task, workflow
    from flytekit.types.file import FlyteFile
    from dataclasses import dataclass
    from mashumaro.mixins.json import DataClassJSONMixin
    import json

    @dataclass
    class InnerDataClass(DataClassJSONMixin):
        a: int
        b: str

    @dataclass
    class OuterDataClass(DataClassJSONMixin):
        x: int
        y: InnerDataClass

    @task
    def serialize_dataclass(data: OuterDataClass) -> FlyteFile:
        file_path = "/tmp/serialized_data.json"
        with open(file_path, "w") as f:
            f.write(data.to_json())
        return FlyteFile(file_path)

    @task
    def deserialize_dataclass(file: FlyteFile) -> OuterDataClass:
        with open(file.download(), "r") as f:
            data_str = f.read()
        return OuterDataClass.from_json(data_str)
2. Modify your workflow to use the updated tasks:
Copy code
@workflow
    def my_workflow() -> OuterDataClass:
        inner = InnerDataClass(a=1, b="test")
        outer = OuterDataClass(x=10, y=inner)
        serialized_file = serialize_dataclass(data=outer)
        deserialized = deserialize_dataclass(file=serialized_file)
        return deserialized
3. Run the workflow:
Copy code
if __name__ == "__main__":
        result = my_workflow()
        print(result)
By using
FlyteFile
, you can offload the large serialized data to an external storage system, avoiding the size limitations of Flyte's internal metadata storage. Would you like more details on this approach or any other specific part of the implementation? Sources: - https://discuss.flyte.org/t/13152245/hi-i-just-experienced-this-error-failed-at-node-n4-bindingre/ - https://github.com/flyteorg/flyte/pull/5607 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 can i not increase the output?
j
Yes, you can increase the maximum allowed output size in Flyte by modifying the
max-output-size-bytes
configuration parameter. This can be done by updating the Flyte configuration, either through the Helm values file or directly in the Flyte configuration files. Here are the steps to increase the output size limit: 1. If using Helm: - Update the
values.yaml
file for your Flyte deployment to include the
max-output-size-bytes
parameter under the
propeller
configuration.
Copy code
configmap:
      core:
        propeller:
          max-output-size-bytes: 20971520  # 20MB
- Apply the changes by upgrading your Helm release:
Copy code
helm upgrade -f values.yaml <release-name> flyteorg/flyte
2. If not using Helm: - Edit the Flyte Propeller configuration directly. Locate the
flyte-propeller-config
ConfigMap in your Kubernetes cluster and add the
max-output-size-bytes
parameter.
Copy code
apiVersion: v1
    kind: ConfigMap
    metadata:
      name: flyte-propeller-config
      namespace: <namespace>
    data:
      core.yaml: |
        propeller:
          max-output-size-bytes: 20971520  # 20MB
- Apply the changes and restart the Flyte Propeller pod to pick up the new configuration. 3. Example Configuration:
Copy code
apiVersion: v1
    kind: ConfigMap
    metadata:
      name: flyte-propeller-config
      namespace: flyte
    data:
      core.yaml: |
        propeller:
          max-output-size-bytes: 20971520  # 20MB
- Apply the changes:
Copy code
kubectl apply -f flyte-propeller-config.yaml
    kubectl rollout restart deployment/flyte-propeller -n <namespace>
By increasing the
max-output-size-bytes
parameter, you can allow larger outputs to be passed between tasks in your Flyte workflows. Would you like more details on this approach or any other specific part of the implementation? Sources: - https://discuss.flyte.org/t/12964146/hi-there-how-to-increase-the-allowed-output-size-of-the-node/ - https://discuss.flyte.org/t/16023116/hi-everyone-is-it-possible-to-change-https-github-com-flyteo/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 how about in flyte binary helm?
j
Sorry, there was an internal error when processing your request. Consider starting a new conversation? Please contact support if this problem persists. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 how do i change
max-output-size-bytes
in flyte-binary helmchart?
j
To change the
max-output-size-bytes
parameter in the Flyte-binary Helm chart, you need to update the
values.yaml
file to include this configuration under the
propeller
section. This can be done by adding the configuration inline. Here are the steps to achieve this: 1. Update the
values.yaml
file
: - Add the
max-output-size-bytes
parameter under the
propeller
configuration in the
values.yaml
file.
Copy code
configuration:
      inline:
        propeller:
          max-output-size-bytes: 20971520  # 20MB
2. Apply the changes by upgrading your Helm release: - Use the following command to upgrade your Helm release with the updated
values.yaml
file:
Copy code
helm upgrade -f values.yaml <release-name> flyteorg/flyte-binary
3. Example
values.yaml
file
:
Copy code
configuration:
      inline:
        propeller:
          max-output-size-bytes: 20971520  # 20MB
4. Apply the changes: - Run the Helm upgrade command to apply the changes:
Copy code
helm upgrade -f values.yaml <release-name> flyteorg/flyte-binary
By following these steps, you can increase the
max-output-size-bytes
parameter in the Flyte-binary Helm chart, allowing larger outputs to be passed between tasks in your Flyte workflows. Would you like more details on this approach or any other specific part of the implementation? Sources: - https://discuss.flyte.org/t/16023116/hi-everyone-is-it-possible-to-change-https-github-com-flyteo/ - https://github.com/flyteorg/flyte/issues/4951 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
Copy code
RuntimeExecutionError: failed during plugin execution, caused by: output file @[<s3://flyte-metadata.dev-cluster.degould.com/metadata/propeller/krystal-test-development-ansg74s78fmwshssndnp/n0/data/0/outputs.pb>] is too large [3282219] bytes, max allowed [2097152] bytes
I am still getting the same error @ancient-wolf-19325
j
If you are still encountering the same error after updating the
max-output-size-bytes
parameter, it is possible that the configuration changes have not been applied correctly or the Flyte Propeller pod has not been restarted to pick up the new configuration. Here are the steps to ensure the changes are applied correctly: 1. Verify the ConfigMap: - Check the
flyte-propeller-config
ConfigMap to ensure that the
max-output-size-bytes
parameter is correctly set.
Copy code
kubectl get configmap flyte-propeller-config -n <namespace> -o yaml
- Ensure that the output contains the
max-output-size-bytes
parameter with the correct value. 2. Restart the Flyte Propeller Pod: - Restart the Flyte Propeller pod to ensure it picks up the new configuration.
Copy code
kubectl rollout restart deployment/flyte-propeller -n <namespace>
3. Example
values.yaml
for Flyte-binary
: - Ensure your
values.yaml
file looks like this:
Copy code
configuration:
      inline:
        propeller:
          max-output-size-bytes: 20971520  # 20MB
4. Apply the Helm Upgrade: - Run the Helm upgrade command again to ensure the changes are applied:
Copy code
helm upgrade -f values.yaml <release-name> flyteorg/flyte-binary
5. Verify the Changes: - After restarting the Flyte Propeller pod, verify that the changes have been applied by checking the logs of the Flyte Propeller pod.
Copy code
kubectl logs <flyte-propeller-pod-name> -n <namespace>
If the issue persists, it might be helpful to increase the
max-output-size-bytes
value further to ensure it accommodates the size of your outputs. Would you like more details on this approach or any other specific part of the implementation? Sources: - https://discuss.flyte.org/t/16023116/hi-everyone-is-it-possible-to-change-https-github-com-flyteo/ - https://github.com/flyteorg/flyte/issues/4951 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 I have verified the configmap has been updated but still getting the same error when running workflow
j
If the
configmap
has been updated and the Flyte Propeller pod has been restarted but you are still encountering the same error, there might be additional configurations or steps that need to be addressed. Here are some additional steps to troubleshoot and resolve the issue: 1. Verify the Configuration in the Pod: - Ensure that the Flyte Propeller pod is using the updated configuration. You can check the pod's environment variables or configuration files to confirm that the
max-output-size-bytes
parameter is set correctly.
Copy code
kubectl exec -it <flyte-propeller-pod-name> -n <namespace> -- cat /etc/flyte/config/*.yaml
2. Check for Other Limits: - Ensure there are no other limits set in the Flyte configuration that might be causing the issue. For example, check the
flyteadmin
and
flyteconsole
configurations for any size limits. 3. Increase the Limit Further: - Try increasing the
max-output-size-bytes
value further to ensure it accommodates the size of your outputs. For example, set it to 50MB (52428800 bytes).
Copy code
configuration:
      inline:
        propeller:
          max-output-size-bytes: 52428800  # 50MB
4. Check Flyte Propeller Logs: - Check the logs of the Flyte Propeller pod for any errors or warnings that might provide more insight into why the limit is being exceeded.
Copy code
kubectl logs <flyte-propeller-pod-name> -n <namespace>
5. Ensure All Components are Updated: - Ensure that all Flyte components (Flyte Propeller, Flyte Admin, Flyte Console) are using the updated configuration. Sometimes, changes in one component might not propagate correctly to others. 6. Example
values.yaml
for Flyte-binary
: - Ensure your
values.yaml
file looks like this and reapply the Helm upgrade:
Copy code
configuration:
      inline:
        propeller:
          max-output-size-bytes: 52428800  # 50MB
Copy code
helm upgrade -f values.yaml <release-name> flyteorg/flyte-binary
7. Restart All Flyte Components: - Restart all Flyte components to ensure they pick up the new configuration.
Copy code
kubectl rollout restart deployment/flyte-propeller -n <namespace>
    kubectl rollout restart deployment/flyte-admin -n <namespace>
    kubectl rollout restart deployment/flyte-console -n <namespace>
If the issue persists after following these steps, it might be helpful to provide more details about your Flyte setup and configuration for further troubleshooting. Would you like more details on any specific part of the implementation or additional troubleshooting steps? Sources: - https://github.com/flyteorg/flyte/issues/4951 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
@ancient-wolf-19325 The Flyte console is not showing the logs, but I can see them in my pod in the k8s cluster
j
If the Flyte console is not showing the logs but you can see them in your Kubernetes pod, it indicates that the logs are not being correctly forwarded or linked to the Flyte console. Here are steps to ensure that logs are correctly configured and visible in the Flyte console: 1. Configure Logging Links: - Ensure that the Flyte configuration includes the correct logging links. This configuration allows Flyte to link to the logs stored in your logging system (e.g., CloudWatch, Stackdriver, etc.). Example configuration for CloudWatch:
Copy code
task_logs:
      plugins:
        logs:
          templates:
            - displayName: "CloudWatch Logs"
              templateUris:
                - "https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/flyte-production/kubernetes;stream=var.log.containers.{{.podName}}_{{.namespace}}_{{.containerName}}-{{.containerId}}.log"
2. Update the Flyte Configuration: - Update the Flyte configuration to include the logging links. This can be done in the
values.yaml
file for the Flyte Helm chart.
Copy code
configuration:
      inline:
        task_logs:
          plugins:
            logs:
              templates:
                - displayName: "CloudWatch Logs"
                  templateUris:
                    - "https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/flyte-production/kubernetes;stream=var.log.containers.{{.podName}}_{{.namespace}}_{{.containerName}}-{{.containerId}}.log"
3. Apply the Helm Upgrade: - Apply the changes by upgrading your Helm release:
Copy code
helm upgrade -f values.yaml <release-name> flyteorg/flyte-binary
4. Restart Flyte Components: - Restart the Flyte components to ensure they pick up the new configuration.
Copy code
kubectl rollout restart deployment/flyte-propeller -n <namespace>
    kubectl rollout restart deployment/flyte-admin -n <namespace>
    kubectl rollout restart deployment/flyte-console -n <namespace>
5. Verify the Configuration: - Verify that the Flyte console is now showing the logs. You can check the Flyte console UI to see if the logs are linked correctly. 6. Check Flyte Console Logs: - If the logs are still not showing, check the logs of the Flyte console pod for any errors or warnings that might provide more insight.
Copy code
kubectl logs <flyte-console-pod-name> -n <namespace>
7. Example
values.yaml
for Flyte-binary
: - Ensure your
values.yaml
file looks like this and reapply the Helm upgrade:
Copy code
configuration:
      inline:
        task_logs:
          plugins:
            logs:
              templates:
                - displayName: "CloudWatch Logs"
                  templateUris:
                    - "<https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/flyte-production/kubernetes;stream=var.log.containers.{{.podName}}_{{.namespace}}_{{.containerName}}-{{.containerId}}.log>"
Copy code
helm upgrade -f values.yaml <release-name> flyteorg/flyte-binary
By following these steps, you should be able to configure Flyte to correctly link and display logs in the Flyte console. Would you like more details on this approach or any other specific part of the implementation? Sources: - Configuring Logging Links in the UI 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.