Skip to main content

A script to split a file tree into separate trees - one per file extension present in the original tree

Purpose

Have you ever had a tree of files from which you only needed certain types of file? For example, I had an iTunes library with some Apple files from another iTunes account combined with a large number of MP3s. I wanted to pull out the tree of MP3s only. You can make such a tree by passing a combination of flags to rsync that make it do an exclusive include.

How?

Pass the following flags to rsync to make it do an exclusive include for files fitting a certain globbing pattern. Fill in for the variables of course, if you want to use this line alone.

In particular, this rsync line:

rsync -av --include '*/' --include "*.${extension}" --exclude '*' ${source_directory}/ ${top_directory_of_results}/${extension}/

The script:

==========================================================

This tool reads a directory of files that have extensions and then copies each type of file to its own tree.

The location of each file in the subtree matches that file's location in the original tree.

Usage:

 ./split_by_file_extension.sh \
{-s source directory|--source-dir=source directory }\
{-t top directory of results|--top-directory-of-results=top directory of results}\
{-e comma,separated,list,of,extensions | --extensions=comma,separated,list,of,extensions}


==========================================================

#!/bin/bash

set -e
set -u

find_of_files="./find.of.files.$$"

usage () {

 echo "=========================================================="
 echo "This tool reads a directory of files that have extensions"
 echo "and then copies each type of file to its own tree."
 echo ""
 echo "The location of each file in the subtree matches that"
 echo "file's location in the original tree."
 echo ""
 echo "Usage: $0 {-s source directory|--source-dir=source directory} \ "
 echo "          {-t top directory of results|--top-directory-of-results=top directory of results} \ "
 echo "          {-e comma,separated,list,of,extensions | --extensions=comma,separated,list,of,extensions} "
 echo "=========================================================="
}

are_these_the_same_path () {

 original_directory="`pwd`"
 cd "$1"
 first_directory="`pwd`"
 cd "${original_directory}"
 cd "$2"
 second_directory="`pwd`"
 cd "${original_directory}"

 if [ "${first_directory}" = "${second_directory}" ]
 then
  echo true
 else
  echo false
 fi

}

if [ $# -eq 0 ]
then
 usage
 exit 1
fi

needed_number_of_arguments_set=0

while [ $# -gt 0 ]
do
 case $1 in
  -s|--source-dir=*)
   if [ "$1" = "-s" ]
   then
    shift
    source_directory="$1"
    shift
   else
    source_directory="`echo $1| sed s,--source-dir=,,`"
    shift
   fi
   echo "Source Directory: ${source_directory}"
   if [ ! -d ${source_directory} ]
   then
    echo""
    echo "source_directory is not a directory."
    echo ""
    usage
    exit 1
   fi
   needed_number_of_arguments_set="`echo ${needed_number_of_arguments_set} + 1| bc`"
  ;;
  -e|--extensions=*)
   if [ "$1" = "-e" ]
   then
    shift
    extensions="$1"
    shift
   else
    extensions="`echo $1| sed s#--extensions=##`"
    shift
   fi
   echo "Extensions: ${extensions}"
   needed_number_of_arguments_set="`echo ${needed_number_of_arguments_set} + 1| bc`"
  ;;
  -t|--top-directory-of-results=*)
   if [ "$1" = "-t" ]
   then
    shift
    top_directory_of_results="$1"
    shift
   else
    top_directory_of_results="`echo $1| sed s,--top-directory-of-results=,,`"
    shift
   fi
   echo "Target Directory: ${top_directory_of_results}"
   if [ ! -d ${top_directory_of_results} ]
   then
    echo""
    echo "top_directory_of_results is not a directory."
    echo ""
    usage
    exit 1
   fi
   needed_number_of_arguments_set="`echo ${needed_number_of_arguments_set} + 1| bc`"
  ;;
  -h|--help)
   usage
   exit 0
  ;;
  *)
   echo ""
   echo "Unrecognized flag." 1>&2
   usage
   exit 1
  ;;
 esac
done

if [ "${needed_number_of_arguments_set}" -ne "3" ]
then
 echo""
 echo "All of the options must be set." 1>&2
 usage
 exit 1
fi

are_source_directory_and_top_directory_of_results_the_same="`are_these_the_same_path ${source_directory} ${top_directory_of_results}`"

if [ "${are_source_directory_and_top_directory_of_results_the_same}" = true ]
then
 echo ""
 echo "source_directory and top_directory_of_results cannot be the same." 1>&2
 echo ""
 usage
 exit 1
fi

#######################################
#
# Main Process.
#
# Do a find for files.
# Check for files with extensions provided.
# Get directory path for files with listed extensions.
# Make the path for that file on the extension directory in the target directory.
# Copy files from source tree to the specific path in the target tree with rsync. 
#
#######################################

for extension in `echo "${extensions}" | sed s/,/\ /g`
do
  if [ ! -d ${top_directory_of_results}/${extension} ]
  then
     mkdir ${top_directory_of_results}/${extension}
  fi
done

for extension in `echo "${extensions}" | sed s/,/\ /g`
do
  rsync -av --include '*/' --include "*.${extension}" --exclude '*' ${source_directory}/ ${top_directory_of_results}/${extension}/
done

Comments

Popular posts from this blog

PowerShell One-Liners

Introduction

PowerShell is Microsoft's shell for their product lines. It's now on version 3.0. If you miss the power of the command line while using Windows on either your laptop or servers, PowerShell provides that power.


Important concepts:
Almost all aspects of the Microsoft ecosystem are objects within an overarching structure. You query and manipulate this structure and its objects with PowerShell. This includes all aspects of SharePoint, Active Directory, and Exchange. Other companies, like VMware (see below) have also written PowerShell modules.This "object nature" means that PowerShell pipes pass objects and properties, not just text. Variables store data-structures of objects. 
One-liners

Note: Unwrap the code lines before you use them.

Get Help

Get the usage of the command "Select-Object":

Get-Help Select-Object

Built-in examples for the command "Select-Object":

Get-Help Select-Object -examples | more

Get the list of all commands and sort it:

How to fix this ssh error from a Cisco switch: ssh_rsa_verify: RSA modulus too small: 512 < minimum 768 bits

Problemssh user@cisco_switch returns:
ssh_rsa_verify: RSA modulus too small: 512 < minimum 768 bits key_verify failed for server_host_key Solution The modulus of the ssh RSA key pair on the switch is too small. If you have access, generate a new key pair on the switch with a larger modulus.
ProcedureLogin with ssh protocol version 1 (ssh space dash one): ssh -1 user@cisco_switch(On the switch): enable(On the switch): Authenticate to "Privileged Exec Mode" mode on the switch.(On the switch): conf t(On the switch): crypto key generate rsa general-keys modulus 1024(On the switch): Press enter to accept that the current key pair for the switch will be replaced. You now should be able to log into the switch with ssh protocol version 2.

How to play a video on a Raspberry Pi Desktop by double-clicking on a file...

The article describes how to open video, audio, and other media files in the Raspberry Pi desktop (the LXDE file manager) using the GPU-based player program.



Does double-clicking on a video file in Raspbian result in slow blocky playback in SMPlayer and VLC on your Raspberry Pi? The short answer is that those video players will not work because at this time (Nov. 2013), they do not make use of the GPU on the Raspberry Pi. You need to use the hardware accelerated player, omxplayer, that is used in XBMC Live and OpenELEC.  The problem is that omxplayer is a command line player that is designed to be embedded in the XBMC based distributions.  I present below a way to make it play videos, if you double-click them in the Raspbian Desktop. Others have presented this method, but I've added a little bit of abstraction to make management easier. To start, open LXTerminal and the follow the process below.
Step One - Get rid of the CPU-based media players
sudo aptitude remove vlc smplayer

St…