Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

<https://github.com/flyteorg/flyte/issues/1144|#1144 [Plugin][Flytekit] Support for TFRecord as loadable schema type>
Issue created by <https://github.com/kumare3|kumare3>
*Why would this plugin be helpful to the Flyte community*  
Often times users want to process data using Spark, but data is passed to a Tensorflow training process. Parquet or other columnar structures are highly in-efficient for training. To solve this problem, the TF community has done some work. It would be wonderful, if we could perform this conversion automatically depending on the context.

e.g. If the user accepts a TFRecord (data format) as a spark dataframe then we can convert, similarly if the User writes Spark dataframe, but somehow annotates it as TFRecord then we can auto-convert.  
Similarly, if the user reads the SparkDataframe into a process as TFRecords, we can do the conversion

This library provides this trait  
<https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector|https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector>  
LinkedIn has further updated this library to make it better in some ways  
<https://github.com/linkedin/spark-tfrecord|https://github.com/linkedin/spark-tfrecord>

*Type of Plugin*

☑︎ Python/Java interface only plugin
☐ Web Service (e.g. AWS Sagemaker, GCP DataFlow, Qubole etc...)
☐ Kubernetes Operator (e.g. TfOperator, SparkOperator, FlinkK8sOperator, etc...)
☐ Customized Plugin using native kubernetes constructs
☐ Other

*Can you help us with the implementation?*

☐ Yes
☐ No
<https://github.com/flyteorg/flyte|flyteorg/flyte>

<https://github.com/flyteorg/flyte/issues/1144|#1144 [Plugin][Flytekit] Support for TFRecord as loadable schema type>
Issue reopened by <https://github.com/eapolinario|eapolinario>
<https://github.com/flyteorg/flyte|flyteorg/flyte>