Globbing on the Command Line
Written By: Devin Carraway
Now, let's suppose that in your long-running correspondence with Mom,
sometimes you'd saved your files beginning with "letter_to_mom" and other
times with "letter_to_Mom" -- the difference being that capital M. Now, in
UNIX, filenames are case-sensitive; that means that 'a' and 'A' aren't the
same. If you try to specify the files on the commandline as
'letter_to_mom*', you'll miss the ones that have the capital M. In such a
case, the ? and [] meta-characters are useful:
mv letter_to_?om* mom/
... the ? means that any letter can be there -- either m or M included --
while the *, as before, means "anything or nothing." Thus, the
capitalization problem is quickly avoided. An even neater way to do the
same thing would be this:
mv letter_to_[mM]om* mom/
... in this case, [mM] means "either 'm' or 'M'." It's called a "character
class," or more simply a "character list" (actually, you can call it
whatever you like). When you use [], you can put as many letters in it as
you'd like -- even spaces or most punctuation -- and it will match any of
the letters you've listed. This is useful because it gives you more exact
control -- in the ? example earlier, ? matched M or m in Mom, but it also
would have matched "dom" and "tom" -- whereas [Mm] only matches "Mom" and
"mom", and that's it. You can also use "ranges" with [] -- that's where you
say "any character between these two characters" -- the easy example is
[0-9], which means "any number." You might also see [a-z], which means "any
lowercase letter." Any number of ranges can be included in a [], even
alongside other characters you've put in there -- [a-zA-Z] is another common
one, meaning "any upper- or lower-case letter"; [a-z13579] means "Any
lowercase letter, or the number 1, or 3, or 5, or 7, or 9." You'll probably
find the [] most useful when you want to extract fairly strictly-limited set
of files out of a long list. Returning to our correspondence example, you
might want to get letters to Mom #3 through #6 -- so, you'd use this:
mv letter_to_mom-[3-6].txt mom/
... in this example, the [3-6] means "3, 4, 5 or 6" -- thus
letter_to_mom-4.txt will be moved to the mom directory, but
letter_to_mom-2.txt will not.
Note that we've used - to indicate a range -- while the '-' character
generally acts like any other, inside a [] it's a meta-character -- if you'd
like to use a literal '-' in a range, you can "escape" it with a backslash
(\) character (many places on the commandline and elsewhere, \ means "take
the next letter literally." So if you had two files, "myfile-1" and
"myfile_2", you could match them both with myfile[_\-] -- the _ is a normal
character in the character class, and the \ indicates that the - should be
treated as one also -- that is to say, it isn't being used to indicate a
range, just a normal character.
One other trick about character classes -- they can be "negated" if the
first character is a caret (^). The caret changes the meaning of the class
from "any one of these letters" to "any letter other than these." So, while
[0-9] means "any number," [^0-9] means "anything but a number."
The problem with character classes is that they only refer to a single
letter, and it's often a pain to type in more of them, especially if they're
long and complicated. Often you just want to refer to one of a few
different words, and character classes are unwieldy. That's where the {},
or "alternative list" comes in. {} contains a list of words, separated by
commas, that should appear on any matches. Once again, let's say you have
your letters to Mom and Dad as letter_to_mom-NNN.txt and
letter_to_dad-NNN.txt. Also suppose you have a friend named Dominique, and
her letters are named letter_to_dom-NNN.txt. Now, if you were to use
character classes as above, you might try:
cp letter_to_[md][oa][md]* parents/ # note: this is wrong
... this is somewhat hard to read -- it means "any file whose name begins
with 'letter_to_', and then has either an 'm' or a 'd', then either an 'o'
or an 'a', then either an 'm' or a 'd', then any number of characters" It's
complicated, and beyond that, it doesn't work, because while it will indeed
copy all of the letter_to_mom* and letter_to_dad* files, the character
classes allow letter_to_dom* to match also (d, o and m from each class,
just as d, a and d and m, o and m worked). This is an excellent place to use
{} -- just specify {mom,dad} instead of the messy character classes, and you
have:
cp letter_to_{mom,dad}* parents/
... which is much more readable, and also has a simpler meaning -- "any file
beginning with 'letter_to_', then having either 'mom' or 'dad', then any
number of characters. You're also allowed to use the globbing characters
inside the alternative lists -- for example, suppose you did want to get
your letters to Dominique also:
cp letter_to_{[md]om,dad}* correspondence/
... This is the same as the previous example, except that instead of
matching 'mom', the shell will match either an 'm' or a 'd' followed by
'om'. Likewise, you're allowed to use * in alternative lists. Suppose that
once in a while you'd saved a letter to Dominique as
letter_to_dominique-NNN.txt instead of letter_to_dom-NNN.txt. (Most people
when they create a lot of files wind up trying to use some sort of
consistent scheme for naming them -- and most of those people find
themselves breaking their own scheme sometimes; shells help by making such
inconsistencies easier to cope with). If you wanted to collect your
first few letters to Mom, Dad and Dom, you could use:
cp letter_to_{mom,dom*,dad}-[1-3].txt
... The 'dom*' in the character class means "dom followed by zero or more
characters." You could also have written {mom,dom,dominique,dad}, but this
was terser.
One final quickie for holding out this long -- ~. Almost anytime you move
around a UNIX system, you'll be moving into or out of your home directory
(which is usually something like /home/yourname/). When used at the
beginning of a pathname, ~ means "my home directory." Suppose you had a
directory "myfiles" in your home directory, and wanted to move some files
there from /tmp. If you had already cd'ed to /tmp, you could then move the
files with a command such as:
mv letter_to_* ~/myfiles/
... in this example, ~ is replaced by the shell with the full path to your
home directory (e.g. /home/yourname/). Under the bash and tcsh shells, the
~ haracter can be followed by a username, in which case it will refer to
the home directory of that user, rather than your own home directory. For
example, if you were copying some files from a floppy disk as root to your
normal user home directory, you might use:
cp /mnt/floppy/* ~bob
... when the shell gets this commandline, it replaces ~bob with the path to
bob's home directory (e.g. /home/bob/).
Conclusion
This is pretty much all there is to using shell wildcards -- taken together,
wildcards set a good balance between simplicity and power in identifying
files precisely and quickly according to fairly straightforward rules.
These guidelines are summarized in the "Pathname Expansion" section of the
bash(1) manpage, and the "Filename substitution" section of the tcsh(1)
manpage.
Wildcards are a simplified form of what are called "regular expressions" --
often abbreviated "regexps," these are extremely powerful devices for text
matching (more powerful than are usually required for moving files around).
Regular expressions are very important in some areas of Linux, UNIX and the
Internet, especially if you find yourself learning UNIX or CGI programming.
For more on regexps, see
http://www.tuxedo.org/~esr/jargon/html/entry/regexp.html, the
sed(1) and Perl documentation, or O'Reilly's book, Mastering Regular Expressions.
What kind of globber
are you?
|