Linux/Unix naming conventions...

Thu Dec 23 11:38:19 EST 1999

Derek Atkins writes:
	Newer GUI interfaces, in particular, may be more finicky about how
	files are named.  OTOH, a better way to figure out what a kind of data
	is in a file is to run the "file" command. :)

It's probably worth pointing out that  there's  an  important  reason
that this generally won't work well, and also explains why Unix plays
fast and loose with "file extensions".  One of the  original  motives
for developing Unix at Bell Labs back in the early 70's, was to solve
problems of computers  that  couldn't  communicate  with  each  other
because  of  incompatibile file systems.  The very first Unix systems
were built with "generic" I/O so that they could act as middlemen  or
adapters,  connecting  systems  X and Y and translating between them.
The Unix guys figured out pretty fast that this meant that their  new
system  should  have as few builtin prejudices as possible about file
formats, naming conventions, and so on.

Things haven't gotten any better since then.  Even two machines  that
are  both  running  "standard"  Microsoft  Windows are likely to have
different file extensions for the same type of data, or different and
incompatible  file  formats  for  the  same extension.  Unix systems,
especially those with web servers, are often called  on  to  download
such  files  exactly, with the same names, from several different and
incompatible systems.  The fact that the Unix file system can  handle
this  with  few  problems  is  one  of  the reasons that most network
servers are Unix machines, despite a huge marketing budget from MS to
get people to use NT servers.

Since this is an important Unix niche, a  bit  of  thought  makes  it
clear  that building fixed suffix-to-type rules into Unix software is
generally not a very good idea.  Software that does this will not  be
good at handling files from all those other incompatible machines out
there on the Net.

You should understand that the Unix file system itself  doesn't  know
anything about these "suffixes". A dot is just another character, and
has no special meaning. The only characters that have special meaning
to the Unix file system are slash and null.  Anything else is handled
at the application level.  This is done so  that  Unix  software  can
handle the naming conventions of other computers. But it follows that
Unix software can implement any naming convention at  all,  and  this
can sometimes result in a bit of confusion for the users.

One of the examples from a web package that I've been working on:  It
has a number of directories containing files with names like:
   206.147.162.62
   sunsite.anu.edu.au
   ftp-swiss.ai.mit.edu
   comhlan.erin.krakow.pl

Now, its probably pretty obvious what's going on here.  Are there any
non-Unix  systems  that could handle such a simple and obvious naming
scheme? I've had no trouble with it on the linux, FreeBSD and Solaris
machines  that  it's  running  on.   Of course, there was a temporary
problem with the web server delivering those .au files as audio  MIME
types and the .pl files as perl scripts.  But I fixed that by linking
into the directories a .htaccess file that contains:
   # Declare everything here to be plain text:
   ForceType text/plain
The apache server now sends all these files as text/plain.

This is also self-explanatory, and understood by all  derivatives  of
the  apache  web  server,  as well as by a growing population of apps
that have simply copied the apache convention. The .htaccess file can
also  do arbitrary suffix-to-type mappings, on a per-directory basis.
You may also find a .mimetypes file in your home directory that  does
something very similar, and is used by browsers and some other apps.

In general, the ways that Unix systems are widely used  preclude  any
fixed  suffix-to-type mapping scheme.  You can do a mapping within an
application, but  then  that  application  can't  handle  files  from
computers  that  use  a different convention.  You can organize files
into directories and implement some  scheme  like  the  .htaccess  or
.mimetypes  files.   Or  you can do like the 'file' command does, and
examine the first N bytes of the file.  All of these are used on Unix
systems, with varying degrees of success.

-
Subcription/unsubscription/info requests: send e-mail with
"subscribe", "unsubscribe", or "info" on the first line of the
message body to discuss-request at blu.org (Subject line is ignored).