A script to split a file tree into separate trees - one per file extension present in the original tree
Purpose
Have you ever had a tree of files from which you only needed certain types of file? For example, I had an iTunes library with some Apple files from another iTunes account combined with a large number of MP3s. I wanted to pull out the tree of MP3s only. You can make such a tree by passing a combination of flags to rsync that make it do an exclusive include.
How?
Pass the following flags to rsync to make it do an exclusive include for files fitting a certain globbing pattern. Fill in for the variables of course, if you want to use this line alone.
In particular, this rsync line:
rsync -av --include '*/' --include "*.${extension}" --exclude '*' ${source_directory}/ ${top_directory_of_results}/${extension}/
The script:
==========================================================
This tool reads a directory of files that have extensions and then copies each type of file to its own tree.
The location of each file in the subtree matches that file's location in the original tree.
Usage:
./split_by_file_extension.sh \ {-s source directory|--source-dir=source directory }\ {-t top directory of results|--top-directory-of-results=top directory of results}\ {-e comma,separated,list,of,extensions | --extensions=comma,separated,list,of,extensions}
==========================================================
#!/bin/bash set -e set -u find_of_files="./find.of.files.$$" usage () { echo "==========================================================" echo "This tool reads a directory of files that have extensions" echo "and then copies each type of file to its own tree." echo "" echo "The location of each file in the subtree matches that" echo "file's location in the original tree." echo "" echo "Usage: $0 {-s source directory|--source-dir=source directory} \ " echo " {-t top directory of results|--top-directory-of-results=top directory of results} \ " echo " {-e comma,separated,list,of,extensions | --extensions=comma,separated,list,of,extensions} " echo "==========================================================" } are_these_the_same_path () { original_directory="`pwd`" cd "$1" first_directory="`pwd`" cd "${original_directory}" cd "$2" second_directory="`pwd`" cd "${original_directory}" if [ "${first_directory}" = "${second_directory}" ] then echo true else echo false fi } if [ $# -eq 0 ] then usage exit 1 fi needed_number_of_arguments_set=0 while [ $# -gt 0 ] do case $1 in -s|--source-dir=*) if [ "$1" = "-s" ] then shift source_directory="$1" shift else source_directory="`echo $1| sed s,--source-dir=,,`" shift fi echo "Source Directory: ${source_directory}" if [ ! -d ${source_directory} ] then echo"" echo "source_directory is not a directory." echo "" usage exit 1 fi needed_number_of_arguments_set="`echo ${needed_number_of_arguments_set} + 1| bc`" ;; -e|--extensions=*) if [ "$1" = "-e" ] then shift extensions="$1" shift else extensions="`echo $1| sed s#--extensions=##`" shift fi echo "Extensions: ${extensions}" needed_number_of_arguments_set="`echo ${needed_number_of_arguments_set} + 1| bc`" ;; -t|--top-directory-of-results=*) if [ "$1" = "-t" ] then shift top_directory_of_results="$1" shift else top_directory_of_results="`echo $1| sed s,--top-directory-of-results=,,`" shift fi echo "Target Directory: ${top_directory_of_results}" if [ ! -d ${top_directory_of_results} ] then echo"" echo "top_directory_of_results is not a directory." echo "" usage exit 1 fi needed_number_of_arguments_set="`echo ${needed_number_of_arguments_set} + 1| bc`" ;; -h|--help) usage exit 0 ;; *) echo "" echo "Unrecognized flag." 1>&2 usage exit 1 ;; esac done if [ "${needed_number_of_arguments_set}" -ne "3" ] then echo"" echo "All of the options must be set." 1>&2 usage exit 1 fi are_source_directory_and_top_directory_of_results_the_same="`are_these_the_same_path ${source_directory} ${top_directory_of_results}`" if [ "${are_source_directory_and_top_directory_of_results_the_same}" = true ] then echo "" echo "source_directory and top_directory_of_results cannot be the same." 1>&2 echo "" usage exit 1 fi ####################################### # # Main Process. # # Do a find for files. # Check for files with extensions provided. # Get directory path for files with listed extensions. # Make the path for that file on the extension directory in the target directory. # Copy files from source tree to the specific path in the target tree with rsync. # ####################################### for extension in `echo "${extensions}" | sed s/,/\ /g` do if [ ! -d ${top_directory_of_results}/${extension} ] then mkdir ${top_directory_of_results}/${extension} fi done for extension in `echo "${extensions}" | sed s/,/\ /g` do rsync -av --include '*/' --include "*.${extension}" --exclude '*' ${source_directory}/ ${top_directory_of_results}/${extension}/ done
Comments
Post a Comment