Globbing on the Command Line
Written By: Devin Carraway
This article explains the details of referring to multiple files on the
command line using wildcards -- particularly the places where the UNIX
wildcards differ from the simpler wildcards used by DOS and Windows.
Assumptions
This article assumes you know how to access the command line, and are
familiar with the meanings of files and directories. The commands ls(1),
cp(1) and mv(1) (list files, copy files, and move files, respectively) are
used as examples -- it's helpful if you know what they do, or have read
their manpages. The differences between the Linux shell and the DOS
command line are explained, but DOS experience isn't really required.
If you'd like to try out the examples given here, you'll find the touch(1)
and mkdir(1) commands useful -- see their manpages or run 'touch --help' or
'mkdir --help' to find out how to use them. You can use touch to create
empty files with the names you choose, and work through the examples with
them. The mkdir command can be used to make the directories for the copy
and move examples.
The examples given here work with both bash and csh, the most common shells
on Linux systems.
"Globbing" is like a lot of terms you hear from UNIX users -- it sounds
amusing, but it's meaning isn't necessarily apparent. Actually, globbing
isn't something you yourself need to do -- it's something you get to tell
the shell (e.g. bash or csh) to do for you. Globbing is done when you
specify filenames using "wildcard" contractions to refer to more than one
file using a shorter form for them on the commandline (see the Jargon File
definition of "glob" at http://www.tuxedo.org/~esr/jargon/html/entry/glob.html.
If you've used DOS before, you probably remember the expression "*.*"
(usually pronounced "star-dot-star.") That was how you told DOS that
whatever command you'd just given, you wanted it to apply to all the files
in the directory. It had to be written that way because DOS can only
conceive of files with names like MYFILE.TXT -- up to eight letters, then a
period, then up to three letters. It's a limitation that got put there a
long time ago.
Linux, like other kinds of UNIX, doesn't have the kinds of limitations that
DOS does. (You may notice that most of the DOS/Windows history has led to
limitations, like 8.3 filenames, whereas most of the Linux/UNIX heritage has
led to neat tools, like bash. If not, hang on and we'll see.) Under a
Linux shell, we use the * character (usually pronounced as "star") to mean
"anything or nothing" in a filename -- pretty much the same as DOS, but
without that confusing .* thing.
Using characters like * to indicate a list of files is a time-saving
measure; it's useful because it's much easier to write "ls myfile*" than it
is to write "ls myfile1.txt myfile2.txt myfile47.jpeg" and so forth. When
you use a * in that way, it means "all files beginning with 'myfile'" -- or,
put another way, "all files beginning with 'myfile' and having any other
characters, or none, after that." Okay, so the first way is clearer, but
the shell thinks of it the second way. To the shell, "*" means "zero or
more characters." So when you say "ls myfile*", before it runs the ls(1)
command, it goes looking for files whose names begin with "myfile". It
assembles a list of them, and then gives that list to ls -- so even though
you aren't typing "ls myfile1.txt myfile2.txt" and so on, that's what ls
sees -- the shell saved you the work.
By the way, the * in a commandline is called a "meta-character." That's
just a way of saying that it doesn't literally mean an asterisk, but that it
stands for something else. The meta-characters you'll probably find most
useful are, in order, *, ?, [] and {} (ah, you say -- but [] and {} are two
characters each -- true enough, but that will become clearer in a moment).
Because of their special meanings, many programs will try to discourage you
from using those characters when you name files -- because they have special
meanings to the shell, they can be tricky to access when you have to type in
their names.
Unlike DOS, a * can be used anywhere -- at the beginning of a filename, the
end of one, or in the middle. Also unlike DOS, you can use as many *s as
you like. Meta-characters are available to you in most any combination.
We might as well also explain that the ? meta-character means "any single
character." ? matches A, or q, or 6. However, unlike *, it doesn't match
nothing at all -- myfile* would match "myfile" and "myfile2", but myfile?
would only match myfile2.
Let's say you had a lot of files; some where named "letter_to_mom-NNN.txt",
where NNN was some number (you write to Mom a lot). Also suppose you had
some pictures named "photo_for_mom-NNN.jpeg" (again NNN being some number),
and some more files named things like "letter_to_dad-NNN.txt". Now suppose
you wanted to copy (using the cp(1) command) all the letters to a directory
called myletters:
cp letter* myletters/
That one's pretty simple -- when you say letter* to the shell, it means "all
the files whose names begin with 'letter'", and so all the letter* files
will get copied to the myletters directory. Now suppose you were making a
directory of correspondence with Mom -- now you need to copy not only the
letters to her, but also the photos you've been collecting. One simple way
would be:
cp letter_to_mom* mom/
cp photo_for_mom* mom/
... but that's way too much typing. A somewhat shorter way would be:
cp letter_to_mom* photo_for_mom* mom/
... that works too, because cp(1) can copy any number of files at once, so
long as the last thing on the commandline tells cp into what directory you'd
like the files put. But there's an even shorter way:
cp *_mom.* mom/
... this tells cp to copy all the files that have "_mom." in them anywhere
into the mom directory. The _ (underscore) and . are there because they
were used in the original filenames before and after the name -- that way
your correspondence with Cardamom and Electro-Mom won't get mixed in.
|