<#1144 [Plugin][Flytekit] Support for TFRecord as ...
# flytekit
a
#1144 [Plugin][Flytekit] Support for TFRecord as loadable schema type Issue created by kumare3 Why would this plugin be helpful to the Flyte community Often times users want to process data using Spark, but data is passed to a Tensorflow training process. Parquet or other columnar structures are highly in-efficient for training. To solve this problem, the TF community has done some work. It would be wonderful, if we could perform this conversion automatically depending on the context. e.g. If the user accepts a TFRecord (data format) as a spark dataframe then we can convert, similarly if the User writes Spark dataframe, but somehow annotates it as TFRecord then we can auto-convert. Similarly, if the user reads the SparkDataframe into a process as TFRecords, we can do the conversion This library provides this trait https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector LinkedIn has further updated this library to make it better in some ways https://github.com/linkedin/spark-tfrecord Type of Plugin ☑︎ Python/Java interface only plugin ☐ Web Service (e.g. AWS Sagemaker, GCP DataFlow, Qubole etc...) ☐ Kubernetes Operator (e.g. TfOperator, SparkOperator, FlinkK8sOperator, etc...) ☐ Customized Plugin using native kubernetes constructs ☐ Other Can you help us with the implementation? ☐ Yes ☐ No flyteorg/flyte