I was working on an array job for a small pipeline, and I happened to need a way to execute a specific command based on file size. I found this post and similar which describe how to do it. At the moment I’m using the following:
find $d/*.fasta -size +100M -exec sh -c '
chromap -i -r $1 -o $1.index
chromap --preset hic -x $1.index -r $1 -1 $d/hi-c/${ID}_1.fq.gz -2 $d/hi-c/${ID}_2.fq.gz --SAM -o /dev/stdout -t 48 |
samtools view -bS -@ 48 | samtools sort -n -@ 48 | samtools view -h | sed -e 's//.//' | samtools view -bS -o ${ID}.bam -@ 48
' sh {} ;
which, aside from the bioinfo and tools used that all work, it appears to execute only the first command line — chromap -i -r $1 -o $1.index
. Then, for some reason, once it gets into the second set of instructions the script returns the following:
Cannot find sequence file /hi-c/_1.fq.gz
indicating it does not have knowledge of the environment variables I have successfully used so far, or it cannot compute more than two operations at the time? I have no clues… I tried also something more simple e.g.
mkdir $d/scaffolding
find $d/*.fasta -size +100M -exec sh -c '
chromap -i -r $1 -o $1.index && mv $1 $1.index $d/scaffolding
' sh {} ;
but Bash complains about: mv: the destination '/scaffolding' is not a directory
.
What should I do to get either one (or both) to work? Am I missing something, please if someone has some insights on this issue let me know! Thanks in advance.