scrcnt

This is a program I wrote a long time ago to break down the programming language composition of software projects, similar to the language meter on GitHub. It's a simple little self-contained C program, with a short GNU Makefile. I have been meaning to write a better version that operates on tar archives fed through the standard input, but have not gotten around to this yet.

The beauty of this program is that it actually does not use any complex data structures at runtime to map file extensions to languages; a build system script constructs a radix tree-like set of character comparisons directly into the C code that performs a one-to-one map from extension to language, and then ambiguities are resolved on a second pass by checking the first few lines of an ambiguous file against a few magic structures (e.g. the shebang in a shell script). This yields an approach that is both very resource efficient and very fast, even on a large directory tree (think the entire source code of Mozilla Firefox).

Usage is fairly simple. Run make to build it. The first argument is the directory tree to walk. The default behaviour is to print the most common language. If the second argument is --ignore, then you can specify a newline-delimited list of languages to ignore on the standard input (this can be the content of a .gitignore-ish file stored in a cloned Git repository, for example).