<#3739 [BUG] error returned when using SparkDFExec...
# flyte-github
a
#3739 [BUG] error returned when using SparkDFExecutionEngine in Great Expectations and Flyte Issue created by XinEDprob Describe the bug According to the Flyte document here, Spark dataframe is supported by Flyte/Great expectations. However, when I use the following great_expectations.yml file as the configuration for great expectations
Copy code
datasource_raw_data:
    execution_engine:
      module_name: great_expectations.execution_engine
      class_name: SparkDFExecutionEngine
      force_reuse_spark_context: true
    module_name: great_expectations.datasource
    class_name: Datasource
    data_connectors:
      runtime_data_connector:
        class_name: RuntimeDataConnector
        module_name: great_expectations.datasource.data_connector
        batch_identifiers:
          - pipeline_stage
to validate a Spark dataframe, codes returned the following error │ 'Traceback (most recent call last):\n File "/Users/.pyenv/versions/3.9.8/envs/flyte/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py", line 555, in _process_direct_and_bundled_metric_computation_configurations\n ] = metric_computation_configuration.metric_fn( # type: ignore[misc] # F not callable\n File "/Users/.pyenv/versions/3.9.8/envs/flyte/lib/python3.9/site-packages/great_expectations/expectations/metrics/metric_provider.py", line 50, in inner_func\n return metric_fn(*args, **kwargs)\n File "/Users/.pyenv/versions/3.9.8/envs/flyte/lib/python3.9/site-packages/great_expectations/expectations/metrics/table_metrics/table_column_types.py", line 95, in _spark\n df.schema, include_nested=metric_value_kwargs["include_nested"]\n File "/Users/.pyenv/versions/3.9.8/envs/flyte/lib/python3.9/site-packages/pandas/core/generic.py", line 5902, in *getattr*\n return object.*getattribute*(self, name)\nAttributeError: 'DataFrame' object has no attribute 'schema'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/Users/.pyenv/versions/3.9.8/envs/flyte/lib/python3.9/site-packages/great_expectations/validator/validation_graph.py", line 276, in _resolve\n self._execution_engine.resolve_metrics(\n File "/Users/.pyenv/versions/3.9.8/envs/flyte/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py", line 290, in resolve_metrics\n return self._process_direct_and_bundled_metric_computation_configurations(\n File "/Users/.pyenv/versions/3.9.8/envs/flyte/lib/python3.9/site-packages/great_expectations/execution_engine/execution_engine.py", line 559, in _process_direct_and_bundled_metric_computation_configurations\n raise gx_exceptions.MetricResolutionError(\ngreat_expectations.exceptions.exceptions.MetricResolutionError: 'DataFrame' object has no attribute 'schema'\n', Expected behavior There should have no error like the one in the description. Additional context to reproduce No response Screenshots No response Are you sure this issue hasn't been raised already? ☑︎ Yes Have you read the Code of Conduct? ☑︎ Yes flyteorg/flyte