Linux/Unix naming conventions...
John Chambers,,,781-647-1813
jc at trillian.mit.edu
Thu Dec 23 11:38:19 EST 1999
Derek Atkins writes:
Newer GUI interfaces, in particular, may be more finicky about how
files are named. OTOH, a better way to figure out what a kind of data
is in a file is to run the "file" command. :)
It's probably worth pointing out that there's an important reason
that this generally won't work well, and also explains why Unix plays
fast and loose with "file extensions". One of the original motives
for developing Unix at Bell Labs back in the early 70's, was to solve
problems of computers that couldn't communicate with each other
because of incompatibile file systems. The very first Unix systems
were built with "generic" I/O so that they could act as middlemen or
adapters, connecting systems X and Y and translating between them.
The Unix guys figured out pretty fast that this meant that their new
system should have as few builtin prejudices as possible about file
formats, naming conventions, and so on.
Things haven't gotten any better since then. Even two machines that
are both running "standard" Microsoft Windows are likely to have
different file extensions for the same type of data, or different and
incompatible file formats for the same extension. Unix systems,
especially those with web servers, are often called on to download
such files exactly, with the same names, from several different and
incompatible systems. The fact that the Unix file system can handle
this with few problems is one of the reasons that most network
servers are Unix machines, despite a huge marketing budget from MS to
get people to use NT servers.
Since this is an important Unix niche, a bit of thought makes it
clear that building fixed suffix-to-type rules into Unix software is
generally not a very good idea. Software that does this will not be
good at handling files from all those other incompatible machines out
there on the Net.
You should understand that the Unix file system itself doesn't know
anything about these "suffixes". A dot is just another character, and
has no special meaning. The only characters that have special meaning
to the Unix file system are slash and null. Anything else is handled
at the application level. This is done so that Unix software can
handle the naming conventions of other computers. But it follows that
Unix software can implement any naming convention at all, and this
can sometimes result in a bit of confusion for the users.
One of the examples from a web package that I've been working on: It
has a number of directories containing files with names like:
206.147.162.62
sunsite.anu.edu.au
ftp-swiss.ai.mit.edu
comhlan.erin.krakow.pl
Now, its probably pretty obvious what's going on here. Are there any
non-Unix systems that could handle such a simple and obvious naming
scheme? I've had no trouble with it on the linux, FreeBSD and Solaris
machines that it's running on. Of course, there was a temporary
problem with the web server delivering those .au files as audio MIME
types and the .pl files as perl scripts. But I fixed that by linking
into the directories a .htaccess file that contains:
# Declare everything here to be plain text:
ForceType text/plain
The apache server now sends all these files as text/plain.
This is also self-explanatory, and understood by all derivatives of
the apache web server, as well as by a growing population of apps
that have simply copied the apache convention. The .htaccess file can
also do arbitrary suffix-to-type mappings, on a per-directory basis.
You may also find a .mimetypes file in your home directory that does
something very similar, and is used by browsers and some other apps.
In general, the ways that Unix systems are widely used preclude any
fixed suffix-to-type mapping scheme. You can do a mapping within an
application, but then that application can't handle files from
computers that use a different convention. You can organize files
into directories and implement some scheme like the .htaccess or
.mimetypes files. Or you can do like the 'file' command does, and
examine the first N bytes of the file. All of these are used on Unix
systems, with varying degrees of success.
-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).
More information about the Discuss
mailing list