Blog / Post

Thoughts on Organizing Files By Storage Capacity

  • Adam Douglas
  • hacking

Late one night laying in bed I began thinking of a recent problem I was presented with to organize a bunch of files into a directory with a maximum storage usage that would fit on a DVD (single layer (4.7 GB)). Each directory would represent a DVD or any storage media. The storage medium used really doesn’t matter, the big issue is the fact that whatever storage medium is chosen it always has a maximum storage capacity. In the past I’ve always done this process manually, which in all honestly is not an enjoyable experience by any means. On my mobile phone I began searching online for a solution, however the search results didn’t really warrant much. I’m sure there is someone out there that has done this but then I thought why not figure this out myself. Could be a good learning experience and challenge to over come.

I decided I would write the script in BASH and would use the find command to collect the files for processing. One of my first issues that came to my mind was how would I determine which file to put where in order to optimize maximum storage consumption per storage medium. For now I think I will just handle the files as returned by the find command and think later how I would better optimize the files based upon each file size instead of just maximum storage usage per output directory. On another issue I just realized is I will need to convert kilobytes, megabytes, gigabytes and terabytes to bytes to make it easier for the user input for maximum storage medium capacity.

This is what I’ve come up with thus far.

Assumptions

  • All files processed will be moved
  • All files processed will be stored in the root of a subdirectory
  • You don’t care about present file organization
  • Intended to aid in transferring files to a storage medium such as USB flash drive, external storage, DVD, etc.
  • Output directory will contain subdirectories starting at 1 and incremented by one until completed
  • Maximum storage capacity input is in bytes
  • This is not intended as a backup solution

Environment

  • BASH
  • Linux
  • Command-line interface

Required parameters

  • Source directory
  • Output directory
  • Maximum storage capacity

General Logic

  • Validate
  • Get desired files
  • Loop files
    • If file doesn’t exceed maximum storage capacity
      • Move file to sub-directory
    • Else create sub-directory
      • Move file to sub-directory

Command Reference

Get total storage usage of a directory in bytes

$ du -shb dir_output | tr -d 'dir_output'"

Get all files recursively within a directory

$ find dir_source -type f -iname '*'

Get the size of a file in bytes

$ ls -l filename | cut -d' ' -f5

Pseudocode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#!/usr/bin/env bash
SET dependencies = ("cut" "du" "find" "mv" "ls" "tr")
SET dir_source = INPUT $1
SET dir_output_root = INPUT $2
SET dir_max_size = INPUT $3

prepare() {
    CALL checkDependencies

    IF NOT dir_source AND NOT dir_output_root AND NOT dir_max_size THEN
        CALL quit with "Error: no parameters provided. Must provide 3 parameters,
        directory source, directory output and maximum storage medium capacity
        (e.g. 4KB, 4MB, 4GB, 4TB)."
    ENDIF

    CALL validate
}

checkDependencies() {
    FOR i in "${dependencies[@]}"
        CALL checkCmdExists "$i"
    ENDFOR

    IF cmdNotFound != '' THEN
        CALL quit with "Error: The following command(s) where not found:${cmdNotFound::-1}."
    ENDIF
}

checkCmdExists() {
    IF NOT executable command -v $1 THEN
        SET cmdNotFound = "$cmdNotFound $1,"
    ENDIF
}

validate() {
    IF NOT dir_source exists THEN
        CALL quit with "Source directory doesn't exist or permission was denied."
    ENDIF

    IF NOT dir_output_root exist THEN
        CALL quit with "Destination directory doesn't exist or permission was denied."
    ENDIF

    IF dir_max_size >= 1 THEN
        CALL quit PRINT "Invalid maximum medium capacity size."
    ENDIF
}

run() {
    SET files = GET files from dir_source
    INIT dir_count = 1
    INIT dir_output = dir_output_root/dir_count/

    FOR file in files
        SET dir_size = GET storage used in dir_output
        SET file_size = GET size of file in bytes
        SET file_dir_size = file_size + dir_size

        IF file dir_size < dir_max_size THEN
            IF dir_output exists THEN
                move file to dir_output
            ELSE
                create dir_output
                move file to dir_output
            ENDIF
        ELSE
            INCREMENT dir_count
            SET dir_output = dir_output_root/dir_count/
            create dir_output
            move file to dir_output
        ENDIF
    ENDFOR
}

quit() {
    IF parameter1 == 0 THEN
        exit 0
    ELSE
        PRINT parameter1 error message
        exit 1
    ENDIF
}

prepare
run
quit 0;

I’m publishing this as part of 100 Days To Offload. You can join in yourself by visiting 100DaysToOffload.com.