Some Useful Procmail Recipes
Scott Wiersdorf
<scott@perlcode.org>
Created: Thu Feb 20 13:50:56 MST 2003
Updated: $Date: 2005/04/25 16:11:49 $

****************************************************
Danger! Achtung! Aviso!

If you are unsure about changes you make to your recipes, always put
in a safety net at the top of your (first) recipe file:

    :0c:
    /var/mail/backup

This will put copies of all mail in ~/var/mail/backup. You may wish to
read the procmailex(5) manpage which gives an example of how to keep
only the last 32 (or whatever you specify) messages.

Careless procmail rules with no safety net (e.g., /dev/null has no
safety net) may result in lost email messages! 

I am not responsible for lost or damaged mail. If you doubt the safety
of these procmail recipes, do not use them.
****************************************************

Conventions

* SPAMMY

I use $SPAMMY as a location where I would put spam. Perhaps $HOME/spam
would work for you, or /dev/null if you're feeling audacious.

* WC

This denotes whitespace:

    WC="	"

This is a space followed by a tab.

* NL

This denotes a newline:

    NL="
    "

****************************************************


1) Retrieving Spam Assassin's "hits" score

==================================================
## define this as your highest Spam Assassin score you will tolerate
MAXSPAM=9.71

:0
* ^X-Spam-Status: Yes
* ^X-Spam-Report:[      ]+\/[0-9\.]+
{
        :0
        * $ -${MAXSPAM}^0
        * $ ${MATCH}^0
        $SPAMMY
}

## for Spam Assassin 2.55, this works (some changes to the syntax):

MAXSPAM=9.71
:0
* ^X-Spam-Flag: Yes
* ^X-Spam-Report:[      ]*.*results($|.)*[       ]*\/[0-9\.]+
{
    :0:
    * $ -${MAXSPAM}^0
    * $ ${MATCH}^0
    $SPAMMY
}

## for Spam Assassin 2.6x:

MAXSPAM=7.00
:0
* ^X-Spam-Flag: YES
* ^X-Spam-Status:.*hits=\/[0-9\.]+
{
    :0:
    * $ -${MAXSPAM}^0
    * $ ${MATCH}^0
    $SPAMMY
}


=========================
Discussion

This recipe should be placed after you have filtered spam with Spam
Assassin (e.g., |/usr/local/bin/spamassassin). This recipe looks for
messages that are flagged as spam (X-Spam-Status: Yes). It then
"grabs" the spam score from the X-Spam-Report line and saves it in the
MATCH procmail variable. A dummy recipe is then triggered and the
message is given negative MAXSPAM as a score. Procmail then adds the
message's real Spam Assassin score (which was saved in the MATCH
variable). If the Spam Assassin score is greater than MAXSPAM, the
mail message is delivered to /dev/null, otherwise it flows through to
any recipes after or is delivered as usual.

=========================
Variations (added Mon Mar 17 15:19:08 MST 2003)

How to send a spam with a score of 20 or higher to /dev/null and a
spam with a score between 4.5 to 20 to a "likely spam" folder:

    TWENTY=20.00
    FOUR_FIVE=4.50

    :0
    * ^X-Spam-Report:[      ]+\/[0-9\.]+
    {
	## if spam is > 20.00, trash it
	:0
	* $ -${TWENTY}^0
	* $ ${MATCH}^0
	/dev/null

        ## else if spam is > 4.50, send it to spam folder
	:0E:
	* $ -${FOUR_FIVE}^0
	* $ ${MATCH}^0
	$SPAMMY
    }

==================================================

2) Counting the number of Cc'd users in a mail message

From: "Dallman Ross" <dman@nomotek.com>
Date: Wed, 28 Aug 2002 06:02:43 -0400 (EDT)
Message-id: <200208281002.g7SA2hD20739@panix5.panix.com>

    TO=`formail -xTo:`
    CC=`formail -xCc:`

    MAXAT=4
    
    ## count the number of @ in To:
    :0
    * 1^1 TO ?? @
    { ATCOUNT = $= }
    
    ## count the number of @ in Cc: & add to To:
    :0
    * $ ${ATCOUNT}^0
    * 1^1 CC ?? @
    { ATCOUNT = $= }
    
    ## allow 4 @ in both To: and Cc:, otherwise spam
    :0:
    * $ ${ATCOUNT}^0
    * $ -${MAXAT}^0
    $SPAMMY

and

From: "Dallman Ross" <dman@nomotek.com>
Date: Thu, 29 Aug 2002 14:20:47 +0200
Message-id: <!~!UENERkVCMDkAAQACAAAAAAAAAAAAAAAAABgAAAAAAAAAfIuZ1uAZ0BGnLERFU1QAAMKAAAAQAAAADD1qqdEo1UONSVQpnFuhQQEAAAAA@nomotek.com>

 :0  # find and save value of Cc:(s), if such exist(s):
    * $ ^Cc:.*\/[^$WS].*(^Cc:.*)*
   { CC = $MATCH }
 :0 E  # else check for empty Cc:
    * $ ^Cc:[$WS]*$
   { CC = [empty] }


=========================
Discussion (added Mon Mar 17 17:14:32 MST 2003)

This recipe counts the number of email addresses in the To: and Cc:
lines of an email message.

We first set a variable 'MAXAT' to 4. We've decided arbitrarily that
any message with more than 4 people in the Cc line is likely a spam
(this is really not a good criterion by itself, but in conjunction
with other recipes might help yield a good profile).

Next we read the message 'To:', looking for '@' signs. Each '@' we
find in the 'To:' field we add to the total score 'ATCOUNT'. Note the
handy use of '$='. This is a built-in procmail variable (documented
in procmailrc(1)) that contains the score of the last recipe.

The second recipe counts the '@' signs in the 'Cc:' field and adds it
to our previous count in the 'To:' field. The first line of the recipe:

    * $ ${ATCOUNT}^0

is a little tricky for the uninitiated. The '*' starts a procmail
condition (as usual). The first '$' tells procmail to "evaluate the
remainder of this condition according to sh(1) substitution rules
inside double quotes, skip leading whitespace, then reparse it." (from
procmailrc(1)). That means that '${ATCOUNT}^0' will be "interpolated".
That is, any procmail variables found inside the expression (after
the '$') will be replaced with their current value, then passed back
to procmail for comparison against the message.

In our context, that means that the string '${ATCOUNT}' in this
condition line will be replaced by the score of the previous recipe
(which was probably 1, if the message was just addressed to us). Now
we have this (after sh(1)-style interpolation):

    * 1^0

which simply means "give this message one point" (or more precisely,
"give this message the score that it finished with in the previous
recipe"). It's how we pass along the previous recipe's score to the
current recipe.

The next line:

    * 1^1 CC ?? @

counts the '@' signs in the 'Cc:' field and increments our score
accordingly (one point for each '@' found). The final line:

    { ATCOUNT = $= }

assigns the score of this recipe to the $ATCOUNT variable again (as
we did in the first recipe). This time, however, we added the score
of the previous recipe in at the start of this second recipe so that
$ATCOUNT now has the cumulative score for both the first and second
recipes.

The final recipe works a lot like the second recipe. We assign right
off the bat the previous score (which is the cumulative score) to this
recipe:

    * $ ${ATCOUNT}^0

Next we subtract our maximum allowed number of '@' signs from the
score:

    * $ -${MAXAT}^0

According to procmailsc(1), anytime a message has a positive score,
the recipe is considered a "match" and the action line will be
executed. In this case, we deliver the message to $SPAMMY if $ATCOUNT
is more than $MAXAT. Q.E.D.

==================================================

3) Detecting a message with something to hide via Base64 encoding

    :0:
    ## the following recipe condition is all one line
    * 1^0 B ?? Content-Type: text/(html|plain)(;[   ]*$?[   ]*charset="iso-8859-1")?($Content-Disposition: inline)?$Content-Transfer-Encoding: base64
    $SPAMMY

=========================
Discussion (added Mon Mar 17 17:14:54 MST 2003)

We're looking for variations of:

    Content-Type: text/html; charset="iso-8859-1"
    Content-Transfer-Encoding: base64

including this (which has the charset on a separate line):

    Content-Type: text/html;
	    charset="iso-8859-1"
    Content-Transfer-Encoding: base64

This is heavy-duty regular expression work. The "english" explanation
of the procmail condition follows:

  *  a new procmail condition

    1^0  if the message matches the following criterion, add one point
         to the total score for this message.

	B ??  match the following pattern against the Body of the
              message (not the headers).

This takes us up to the beginning of the regular expression. Following
now is an "english" explanation of the regular expression:

    Content-Type: text/(html|plain)

match the literal string "Content-Type: text/" followed by either
"html" or "plain". This will match either of the following two lines:

    Content-Type: text/html
    Content-Type: text/plain

and it could likewise be written:

    (Content-Type: text/html|Content-Type: text/plain)

which is longer and somewhat more confusing, especially if you have
more regex (as we do). Remembering our original regular expression:

    * 1^0 B ?? Content-Type: text/(html|plain)(;[   ]*$?[   ]*charset="iso-8859-1")?($Content-Disposition: inline)?$Content-Transfer-Encoding: base64

we move on to the next part:

    (;[   ]*$?[   ]*charset="iso-8859-1")?

match a semi-colon followed by zero or more spaces or tabs (between
[ and ] is a space character and a tab character). This is followed
by an optional newline (the '$' is a newline in procmail regular
expressions). This is followed by zero or more spaces or tabs,
followed by the literal string 'charset="iso-8859-1"'. The trailing
'?' applies to the entire expression between parentheses, meaning this
entire part is optional. If it shows up in the email message, great;
if not, no problem because we have more regular expression following.

    ($Content-Disposition: inline)?

Match a newline, followed by the literal string "Content-Disposition:
inline". This entire expression is also optional (the trailing '?'
applies again to everything between the parentheses). Finally, we have:

    $Content-Transfer-Encoding: base64

Match a newline, followed by the literal string
"Content-Transfer-Encoding: base64"


Thus the original recipe:

    :0:
    * 1^0 B ?? Content-Type: text/(html|plain)(;[   ]*$?[   ]*charset="iso-8859-1")?($Content-Disposition: inline)?$Content-Transfer-Encoding: base64
    $SPAMMY

finds messages that encode their otherwise readable body with base64
encoding: a common technique among spammers.

==================================================

4) Implementing quotas (revised: Mon Apr 25 10:18:28 MDT 2005)


NL="
"
LOG="====================${NL}"
QUOTA=3042880
DROPPRIVS=yes
LOG="QUOTA:     $QUOTA${NL}"

## David W. Tamkin <3F1EA16E.7040102@panix.com>
## Recommendation to use :0i from Leow Hock Seng on 25 Apr 2005
:0i
INBOXSIZE=| set -- `ls -l $DEFAULT`; echo $5

LOG="INBOXSIZE: $INBOXSIZE${NL}"
:0
* $ -${INBOXSIZE}^0
* $ ${QUOTA}^0
{
    MAXMSG = $=
    LOG="MAXMSG:    $MAXMSG${NL}"
    :0
    * $ > ${MAXMSG}
    {
        LOG="Bouncing (message too big!)${NL}"
        EXITCODE=69
        HOST
    }
}

:0E
{
    LOG="Bouncing (inbox already full!)${NL}"
    EXITCODE=69
    HOST
}

=========================
Discussion (added Tue Apr  1 12:16:39 MST 2003)

Quotas are best handled at the filesystem level or at least the
MTA. However, occasionally neither of these are options for your
system. This recipe set implements a crude (yet effective) way to
implement mail quotas on systems that do not have them.

We begin with setting some commonly used variables for logging (e.g.,
NL). The QUOTA variable contains the maximum quota that this mailbox
should accept. INBOXSIZE contains the current size of this mailbox
(this could possibly be done more efficiently).

When we receive a message, we subtract $INBOXSIZE from its score. Then
we add $QUOTA to it. $MAXMSG contains the current score. For an inbox
that is not over quota, this number will be the difference of
$QUOTA-$INBOXSIZE (some positive number).

Next we compare the current message size to $MAXMSG with:

    * $ > ${MAXMSG}

If this succeeds (i.e., the message is bigger than $MAXMSG), this
means that we cannot accept the message because it will put us over
quota, so we bounce it back with:

        EXITCODE=69
        HOST

Your MTA (this is sendmail) might define different EXITCODEs. If, on
the other hand, the mailbox is already over quota, we jump to the last
bouncing recipe:

:0E
{
    LOG="Bouncing (inbox already full!)${NL}"
    EXITCODE=69
    HOST
}


==================================================

5) Decoding message body before filtering

==================================================
:0ic:
DBODY=|perl -p0777 -e 's{=\n}{}g'

:0
* DBODY ?? stringtomatch
matched_file

==================================================
FIXME: need discussion here

==================================================

6) Rejecting all HTML email, except for whitelisted senders

The format of the whitelist file is:

    joe@domain.tld
    otherdomain.tld

==================================================
WHITELIST=$HOME/whitelist

## simple HTML scanner
## Scott Wiersdorf
## Tue Aug 26 11:40:08 MDT 2003
##
## You can add more html tags as needed below.
##
## Note that this is EXTREMELY INEFFICIENT to scan the body for all
## incoming emails; the tag searching is also risky because some of
## these tags may be used legitimately in non-html messages. YMMV.
:0
* 9876543210^0 B ?? ^Content-Type: text/html
* 9876543210^0 B ?? ()<(html|body|head|h(1|2|3)|b|i|strong|em|font|a href|img).*>
{
    :0
    * ! $ ? formail -z -xFrom: -xSender: -xResent-From: | fgrep -iqf $WHITELIST
    {
        :0
        * ! ^FROM_DAEMON
        * ! ^X-HTML-Bounce: rejected
        {
            SUBJECT=`formail -zxSubject:`

            :0 h
            | (formail -r -I"Precedence: junk" -A"X-HTML-Bounce: rejected" \
              -I"Subject: HTML_REJECT (was $SUBJECT)"; \
              echo "Your mail was rejected because it contains HTML."; \
              echo "Please resend your message in plain text.") | $SENDMAIL -t
        }
    }
}

## trash bounces from our autoreply
:0
* ^FROM_DAEMON
* B ?? X-HTML-Bounced: rejected
/dev/null

=========================
Discussion

This set of recipes will enforce a non-HTML email acceptance for the
server or account it is added to. Don't count on this sort of thing
winning you any friends. Many email clients by default will send HTML
as well as plaintext in the same message (via a multipart/alternative
MIME message), allowing the MIME-aware email client to choose its
preferred message format.

==================================================

7) Email-to-shell gateway (or running shell commands from an email
message)

Properly authenticated, pipe the body of a message (stripping
signature) through sh then send a reply back to the sender with the
original commands with their results.

=========================

MY_XLOOP="$LOGNAME@$HOST"
FORMAIL=/usr/local/bin/formail
SHELL=/bin/sh

:0
*   H ?? ^From: .*joe@schmoe\.org
*   H ?? ^Subject: shell command$
*   H ?? ^X-Command: sekrit_passw0rd
* ! H ?? ^X-Loop: $LOGNAME@$HOST"
{
  ## store the body in COMMAND, stripping any signature
  :0 b
  COMMAND=| sed -e '/^-- /,$ d'

  ## reply to sender with original command and results
  :0
  | ($FORMAIL -rt                                \
     -I "$MY_XLOOP"                              \
     -I "Precedence: junk";                      \
     echo "Your command:"; echo "$COMMAND";      \
     echo "Results:"; echo "$COMMAND" | sh 2>&1; \
     ) | $SENDMAIL -oi -t
}

=========================
Discussion

While this should never be used under normal circumstances, it may be
useful in certain times and seasons. Being able to remotely administer
your server from a text-messaging cell phone or remote email account
can be a handy thing.

The recipe first makes sure the message is from you, though you must
remember that this can easily be forged. It is simply one extra step
that an attacker must guess before gaining access. We next check that
the subject line contains only the words "shell command". We check for
a custom header containing "sekrit_passw0rd" as an additional layer
of authentication. Finally, we abort the message if this is an
autoreply from ourselves.

In the action block, we pipe the body of the message through sed and
strip out any signature lines that may be there. We then send an
autoreply back to the sender containing the original command and the
command results; we pipe the command through the Bourne-shell to get
our work done. This means you can write any legitimate Bourne-shell
script in our email message.

Here is a sample message we might send to our email-to-shell gateway:

    pwd
    cd ~/calendar
    cat calendar-200311.txt
    test -d backups || mkdir backups
    mv calendar-200311.txt backups

We can do any series of commands as if we were at a shell prompt.

==================================================

8) Detect long subject lines

We want to trap emails with long subject lines.

=========================

MAX_SUBJECT_CHARS=25

## grab subject
:0
* ^Subject:\/.*
{
    SUBJECT = $MATCH

    ## strip off leading spaces, if any
    :0
    * SUBJECT ?? [      ]*\/[^  ].*
    { SUBJECT = $MATCH }

    ## count chars in subject
    :0
    * $ -${MAX_SUBJECT_CHARS}^0
    *                       1^1 SUBJECT ?? .
    {
        ## count total number of chars
        CHARS = $=
        :0
        * $             ${CHARS}^0
        * $ ${MAX_SUBJECT_CHARS}^0
        { LOG="Reason: Subject contains $= characters${NL}" }

        ## put this message somewhere special
        :0:
        $PMDIR/subject_too_big
    }
}

=========================
Discussion

This set of recipes uses MAX_SUBJECT_CHARS to determine how long is
too long for an email subject line.

We first grab all the characters after the word 'Subject:' (including
any spaces between Subject: and the actual subject line):

    ## grab subject
    * ^Subject:\/.*

The subject is assigned, then, to the SUBJECT procmail variable.

Next we strip off any leading spaces and put the results back into the
SUBJECT variable:

    ## strip off leading spaces, if any
    :0
    * SUBJECT ?? [      ]*\/[^  ].*
    { SUBJECT = $MATCH }

And finally we create a scoring recipe and assign the score to the
negative of our maximum allowed number of characters in a subject
(MAX_SUBJECT_CHARS), then add the total number of characters in the
SUBJECT variable to the recipe score:

    ## count chars in subject
    :0
    * $ -${MAX_SUBJECT_CHARS}^0
    *                       1^1 SUBJECT ?? .
    {

If the result is positive (>0), we move into the nested block and run
two more recipes. The first recipe simply logs the length of the
subject line in the log file currently assigned to LOGFILE (this
recipe is therefore optional):

        ## count total number of chars
        CHARS = $=
        :0
        * $             ${CHARS}^0
        * $ ${MAX_SUBJECT_CHARS}^0
        { LOG="Reason: Subject contains $= characters${NL}" }

The second recipe actually delivers the mail with long subject lines
to a special file, and we close our nested action block:

        ## put this message somewhere special
        :0:
        $PMDIR/subject_too_big
    }


==================================================