18 October 2010

Strange Filenames

Inevitably, while working on a server, one will come across a file with a
strange name that seemingly cannot be easily operated upon.  Some examples
of commonplace files tend to be "!" and "@", wherein the "strange" file
was created while working perhaps in vi.  Other examples may be "--", "*",
"    .  ^   ", "     ", "$", "|", etc.  These file present somewhat of a
challenge as they have the potential to be misinterpreted by the user's
shell or by the program that is acting upon them, such as 'rm' or 'less'.
As an example of how this could be misinterpreted, say one comes across
file "*" in a directory listing and attempts to remove it with rm.  If the
user happens to be not euid root and attempts the following command:

        $ rm *

the most that this will do is simply wipe out the contents within that
directory that the user has write access to.  Should the command have
included flags "-rf", the command would have recursively attempted to
delete all files.  While this may pose some harm for the general user, had
this been executed as root, the user would have just destroyed the box.
The reason here is that * is a wildcard which would expand to anything.
Another example of how these files can be difficut would be in dealing
with file "--".  Becuase many commands accept '--' to signify the
beginning of optional parameters, when passed "--" as the file argument,
the command will not act upon "--" as a file, instead producing an error
about needing a file to act upon.  The following illustrates:

        $ less --
        Missing filename ("less --help" for help)

While it's nice to know that these files can exist, how do you make
them more manageable.  Fortunately, there are several ways, as is all
cases in UNIX.  That said, some ways work better than others depending
on the situation.  One means may be to simply escape the characters in
question with a backslash "\".  This would effectively negate the shell's
or the command's expansion of the character into something other than
simply being a character in the name of a file.  Similarly, using single
quotes around the file would prohibit shell expansion.  Another way,
handling a little better than escaping, would be to explicitly state the
file is not a parameter, such as the case of "--".  An example would be:

        $ less -- --

This allows less to open file "--" (the second parameter to less) as
the first '--' is used to signify the end of command options.

In cases wherein the filename contains unprintable characters, one can
determine the octal notation of them via the '-b' parameter to 'ls':

        $ /usr/bin/ls -b
        $          *          --           \015.  ^

Moreso representative may be to add additional flags to 'ls' to determine
delineation between files:

        $ /usr/bin/ls -mb
        $, *, --,   \015.  ^  ,

The parameter of '-m' tells 'ls' to list files comma-space delineated.
In the fourth file listed above, we can see that the filename is actually:

        'space','space','carriage return','period','space','space','circumflex','space','space'

To determin octal correlations, one simply needs to pull up the ascii
manpage:

        $ man ascii

Finally, a guaranteed means of working with these files is on the basis
of their inode number.  As each file has a unique inode number relevant
to the filesystem upon which the file resides, the inode number can be
used to act upon a file.  Using the above file listing, we can specify
flags '-li' to 'ls' to obtain the inode of the file in question:

        $ ls -lib
        total 0
        11984832 -rw-r--r-- 1 troy troy 0 2007-03-20 11:59 $
        11962616 -rw-r--r-- 1 troy troy 0 2007-03-20 11:56 *
        12982418 -rw-r--r-- 1 troy troy 0 2007-03-20 12:09 --
         4665190 -rw-r--r-- 1 troy troy 0 2007-03-20 11:58 \ \ \r.\ \ ^\ \
        12982610 -rw-r--r-- 1 troy troy 0 2007-03-20 11:57 \ \ \ \ \ \

Now it is evident that the fifth file is simply:

        'space','space','space','space','space','space'

We can also see that its inode number is 12982610 with a file size of
0 bytes.  Assuming that we would want to actually work with this file with
a more manageable name, we could use 'find' to rename the file for us:

        $ find /tmp -inum 12982610 -exec mv {} newfilename \;
        $ ls -lib
        total 0
        11984832 -rw-r--r-- 1 troy troy 0 2007-03-20 11:59 $
        11962616 -rw-r--r-- 1 troy troy 0 2007-03-20 11:56 *
        12982418 -rw-r--r-- 1 troy troy 0 2007-03-20 12:09 --
         4665190 -rw-r--r-- 1 troy troy 0 2007-03-20 11:58 \ \ \r.\ \ ^\ \
        12982610 -rw-r--r-- 1 troy troy 0 2007-03-20 11:57 newfilename

In the first line, we use 'find' to locate the file based on its inode
number (inum) and rename the file via mv to "newfilename".  The resulting
output from 'ls' shows that while the filename has been changed, the
inode number has stayed the same, as it should.  The file itself, once
located is essentially held as an anonymous variable being passed to
'find' via {}.  Going back to the first example, dealing with file "*",
as this file can provide risk in its handling with 'rm', we could simply
use 'find' again to remove this file:

        $ find /tmp -inum 11962616 -exec rm {} \;
        $ ls -lib
        total 0
        11984832 -rw-r--r-- 1 troy troy 0 2007-03-20 11:59 $
        12982418 -rw-r--r-- 1 troy troy 0 2007-03-20 12:09 --
         4665190 -rw-r--r-- 1 troy troy 0 2007-03-20 11:58 \ \ \r.\ \ ^\ \
        12982610 -rw-r--r-- 1 troy troy 0 2007-03-20 11:57 newfilename

The 'find' command has removed the file "*" for us via 'rm' without us
having to escape it or anything else, thus alleviating the potential of
destroying the filesystem should "*" have been expanded.

The above examples are just a few of the various means of dealing with
strangely named files.  At this point, additional means shall be left
as an exercise for the reader.