Keyword indexing for Word documents (doc / docx)

Currently external applications for fulltext indexing are driven by MIME type, via settings in config.inc.php.

I’ve found that both doc and docx documents are matched as “application/msword” in our RHEL-based systems. I haven’t found an instance of catdoc or antiword which handles both doc and docx in the same executable. I’ve resorted to a shell script which runs an extended file test, and then passes processing to catdoc or to docx2txt, respectively.

How are others dealing with this (or avoiding the problem)?