Zach Palchick

    Zach Palchick

    7 months ago
    is there documentation on how to use container tasks beyond https://docs.flyte.org/projects/cookbook/en/latest/auto/core/containerization/raw_container.html . I'm trying with the sandbox env but not having luck. I've heard that it depends on the SYS_PTRACE capability, which may depend on one's K8s setup
    That might not even be the issue. I'm getting this screenshot when I try to run it
    ...with no logs
    the task definition looks like this
    fastqc_raw = ContainerTask(
        name="fastqc_container",
        image="<http://docker.io/staphb/fastqc:latest|docker.io/staphb/fastqc:latest>",
        command=[
            "fastqc", "--version", ">", "/var/outputs/version"
        ],
        outputs=kwtypes(version=str),
        output_data_dir="/var/outputs",
    )
    Eduardo Apolinario (eapolinario)

    Eduardo Apolinario (eapolinario)

    7 months ago
    @Zach Palchick, where did you hear about the need for
    SYS_PTRACE
    ? Can you replace the command you pass to the container with:
    command=[
            "sh",
            "-c",
            "fastqc --version | tee /var/outputs/version",
        ],
    Zach Palchick

    Zach Palchick

    7 months ago
    from Ketan, but I'll try what you have
    same deal. No luck
    Eduardo Apolinario (eapolinario)

    Eduardo Apolinario (eapolinario)

    7 months ago
    interesting. Is this the only
    ContainerTask
    in your workflow?
    Zach Palchick

    Zach Palchick

    7 months ago
    yeah
    Eduardo Apolinario (eapolinario)

    Eduardo Apolinario (eapolinario)

    7 months ago
    weird, I just tested this workflow in a brand new sandbox:
    from flytekit.core.task import task
    from flytekit.core.workflow import workflow
    from flytekit import ContainerTask, kwtypes, workflow
    
    fastqc_raw = ContainerTask(
        name="fastqc_container",
        image="<http://docker.io/staphb/fastqc:latest|docker.io/staphb/fastqc:latest>",
        command=[
            "sh",
            "-c",
            "fastqc --version | tee /var/outputs/version",
        ],
        outputs=kwtypes(version=str),
        output_data_dir="/var/outputs",
    )
    
    @workflow
    def wf() -> str:
        return fastqc_raw()
    Zach Palchick

    Zach Palchick

    7 months ago
    hmm, that is weird. Let me try verbatim what you are doing. It would be nice if there was something else going on
    ohh...interesting
    that did work
    that definitely raises some questions
    ok, I've made some good progress here. Is there a way to get the raw container input type to be a flytefile? I want the command I call in the container to operate on a file I generate in a previous task
    This doesn't work (perhaps unexpectedly) because the fast_qc file I think is just the s3 file path that I pass in from a previous task
    fastqc_raw = ContainerTask(
        name="fastqc_container",
        image="<http://docker.io/staphb/fastqc:latest|docker.io/staphb/fastqc:latest>",
        command=[
            "sh",
            "-c",
            "fastqc -t 1 -q -o /var/outputs/ /var/inputs/fastqc_file"
        ],
        input_data_dir="/var/inputs",
        inputs=kwtypes(fastqc_file=FlyteFile),
        outputs=kwtypes(out=FlyteDirectory),
        output_data_dir="/var/outputs",
    )
    
    @workflow
    def wf(path: FlyteFile) -> FlyteDirectory:
        return fastqc_raw(fastqc_file=path)
    Eduardo Apolinario (eapolinario)

    Eduardo Apolinario (eapolinario)

    7 months ago
    Unfortunately
    FlyteFile
    is a flytekit concept, so you'd have to pipe that into your raw container task (and use python to download the file locally). Another option is to have a script to download the file from s3 into your container prior to running
    fastqc
    .