flat-australia-44060
09/13/2023, 1:43 PMendpoint_url = "<http://flyte-sandbox-minio.flyte:9000>"
s3 = boto3.client(
's3',
endpoint_url=endpoint_url,
aws_access_key_id="minio",
aws_secret_access_key="miniostorage",
use_ssl="false",
)
for doc in inputs:
doc["original_s3_path"] = doc["s3_path"]
bucket, key = split_s3_path(doc["s3_path"])
s3.copy_object(
CopySource = f"{bucket}/{key}",
Bucket = 'my-s3-bucket',
Key = f"flytesnacks/development/{key}"
)
doc["s3_path"] = f"<s3://my-s3-bucket/flytesnacks/development/{key}>"
return [
DocumentData(
document_id=doc["document_id"],
file=FlyteFile(path=f"{doc['s3_path']}"),
metadata=DocumentMetaData(),
)
for doc in inputs
]
The 2nd task then does the following
for doc in documents:
# download the file from s3 and read the data
doc.file.download()
file = open(doc.file, "r")
text = file.read()
# detect the language of the document and assign to the DocumentData.metadata.language_code
doc.metadata.language_code = detector.detect(text)
file.close()
When trying to open the file in the 2nd task I just get a No such file or directory
error. Any ideas?magnificent-teacher-86590
09/13/2023, 4:34 PMs3.copy_object
operation working? and what is DocumentData class, i would suggest inspecting the doc object before the download. You can run a local execution to test the behaviorflat-australia-44060
09/13/2023, 4:37 PMs3.copy_object
copies the file to the my-s3-bucket/flytesnacks/development/
location and can be seen in the minio browser.
DocumentClass is a custom dataclass which wraps the FlyteFile and some other metadata. I have since realised that I should probally use file.doc.open()
instead of the builtin open. e.g
with doc.file.open('r') as file:
text = file.read()
# detect the language of the document and assign to the DocumentData.metadata.language_code
doc.metadata.language_code = detector.detect(text)
magnificent-teacher-86590
09/13/2023, 4:39 PM.download
methodtall-lock-23197
flat-australia-44060
09/14/2023, 1:47 PM