Some Useful Procmail Recipes Scott Wiersdorf Created: Thu Feb 20 13:50:56 MST 2003 Updated: $Date: 2005/04/25 16:11:49 $ **************************************************** Danger! Achtung! Aviso! If you are unsure about changes you make to your recipes, always put in a safety net at the top of your (first) recipe file: :0c: /var/mail/backup This will put copies of all mail in ~/var/mail/backup. You may wish to read the procmailex(5) manpage which gives an example of how to keep only the last 32 (or whatever you specify) messages. Careless procmail rules with no safety net (e.g., /dev/null has no safety net) may result in lost email messages! I am not responsible for lost or damaged mail. If you doubt the safety of these procmail recipes, do not use them. **************************************************** Conventions * SPAMMY I use $SPAMMY as a location where I would put spam. Perhaps $HOME/spam would work for you, or /dev/null if you're feeling audacious. * WC This denotes whitespace: WC=" " This is a space followed by a tab. * NL This denotes a newline: NL=" " **************************************************** 1) Retrieving Spam Assassin's "hits" score ================================================== ## define this as your highest Spam Assassin score you will tolerate MAXSPAM=9.71 :0 * ^X-Spam-Status: Yes * ^X-Spam-Report:[ ]+\/[0-9\.]+ { :0 * $ -${MAXSPAM}^0 * $ ${MATCH}^0 $SPAMMY } ## for Spam Assassin 2.55, this works (some changes to the syntax): MAXSPAM=9.71 :0 * ^X-Spam-Flag: Yes * ^X-Spam-Report:[ ]*.*results($|.)*[ ]*\/[0-9\.]+ { :0: * $ -${MAXSPAM}^0 * $ ${MATCH}^0 $SPAMMY } ## for Spam Assassin 2.6x: MAXSPAM=7.00 :0 * ^X-Spam-Flag: YES * ^X-Spam-Status:.*hits=\/[0-9\.]+ { :0: * $ -${MAXSPAM}^0 * $ ${MATCH}^0 $SPAMMY } ========================= Discussion This recipe should be placed after you have filtered spam with Spam Assassin (e.g., |/usr/local/bin/spamassassin). This recipe looks for messages that are flagged as spam (X-Spam-Status: Yes). It then "grabs" the spam score from the X-Spam-Report line and saves it in the MATCH procmail variable. A dummy recipe is then triggered and the message is given negative MAXSPAM as a score. Procmail then adds the message's real Spam Assassin score (which was saved in the MATCH variable). If the Spam Assassin score is greater than MAXSPAM, the mail message is delivered to /dev/null, otherwise it flows through to any recipes after or is delivered as usual. ========================= Variations (added Mon Mar 17 15:19:08 MST 2003) How to send a spam with a score of 20 or higher to /dev/null and a spam with a score between 4.5 to 20 to a "likely spam" folder: TWENTY=20.00 FOUR_FIVE=4.50 :0 * ^X-Spam-Report:[ ]+\/[0-9\.]+ { ## if spam is > 20.00, trash it :0 * $ -${TWENTY}^0 * $ ${MATCH}^0 /dev/null ## else if spam is > 4.50, send it to spam folder :0E: * $ -${FOUR_FIVE}^0 * $ ${MATCH}^0 $SPAMMY } ================================================== 2) Counting the number of Cc'd users in a mail message From: "Dallman Ross" Date: Wed, 28 Aug 2002 06:02:43 -0400 (EDT) Message-id: <200208281002.g7SA2hD20739@panix5.panix.com> TO=`formail -xTo:` CC=`formail -xCc:` MAXAT=4 ## count the number of @ in To: :0 * 1^1 TO ?? @ { ATCOUNT = $= } ## count the number of @ in Cc: & add to To: :0 * $ ${ATCOUNT}^0 * 1^1 CC ?? @ { ATCOUNT = $= } ## allow 4 @ in both To: and Cc:, otherwise spam :0: * $ ${ATCOUNT}^0 * $ -${MAXAT}^0 $SPAMMY and From: "Dallman Ross" Date: Thu, 29 Aug 2002 14:20:47 +0200 Message-id: :0 # find and save value of Cc:(s), if such exist(s): * $ ^Cc:.*\/[^$WS].*(^Cc:.*)* { CC = $MATCH } :0 E # else check for empty Cc: * $ ^Cc:[$WS]*$ { CC = [empty] } ========================= Discussion (added Mon Mar 17 17:14:32 MST 2003) This recipe counts the number of email addresses in the To: and Cc: lines of an email message. We first set a variable 'MAXAT' to 4. We've decided arbitrarily that any message with more than 4 people in the Cc line is likely a spam (this is really not a good criterion by itself, but in conjunction with other recipes might help yield a good profile). Next we read the message 'To:', looking for '@' signs. Each '@' we find in the 'To:' field we add to the total score 'ATCOUNT'. Note the handy use of '$='. This is a built-in procmail variable (documented in procmailrc(1)) that contains the score of the last recipe. The second recipe counts the '@' signs in the 'Cc:' field and adds it to our previous count in the 'To:' field. The first line of the recipe: * $ ${ATCOUNT}^0 is a little tricky for the uninitiated. The '*' starts a procmail condition (as usual). The first '$' tells procmail to "evaluate the remainder of this condition according to sh(1) substitution rules inside double quotes, skip leading whitespace, then reparse it." (from procmailrc(1)). That means that '${ATCOUNT}^0' will be "interpolated". That is, any procmail variables found inside the expression (after the '$') will be replaced with their current value, then passed back to procmail for comparison against the message. In our context, that means that the string '${ATCOUNT}' in this condition line will be replaced by the score of the previous recipe (which was probably 1, if the message was just addressed to us). Now we have this (after sh(1)-style interpolation): * 1^0 which simply means "give this message one point" (or more precisely, "give this message the score that it finished with in the previous recipe"). It's how we pass along the previous recipe's score to the current recipe. The next line: * 1^1 CC ?? @ counts the '@' signs in the 'Cc:' field and increments our score accordingly (one point for each '@' found). The final line: { ATCOUNT = $= } assigns the score of this recipe to the $ATCOUNT variable again (as we did in the first recipe). This time, however, we added the score of the previous recipe in at the start of this second recipe so that $ATCOUNT now has the cumulative score for both the first and second recipes. The final recipe works a lot like the second recipe. We assign right off the bat the previous score (which is the cumulative score) to this recipe: * $ ${ATCOUNT}^0 Next we subtract our maximum allowed number of '@' signs from the score: * $ -${MAXAT}^0 According to procmailsc(1), anytime a message has a positive score, the recipe is considered a "match" and the action line will be executed. In this case, we deliver the message to $SPAMMY if $ATCOUNT is more than $MAXAT. Q.E.D. ================================================== 3) Detecting a message with something to hide via Base64 encoding :0: ## the following recipe condition is all one line * 1^0 B ?? Content-Type: text/(html|plain)(;[ ]*$?[ ]*charset="iso-8859-1")?($Content-Disposition: inline)?$Content-Transfer-Encoding: base64 $SPAMMY ========================= Discussion (added Mon Mar 17 17:14:54 MST 2003) We're looking for variations of: Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: base64 including this (which has the charset on a separate line): Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: base64 This is heavy-duty regular expression work. The "english" explanation of the procmail condition follows: * a new procmail condition 1^0 if the message matches the following criterion, add one point to the total score for this message. B ?? match the following pattern against the Body of the message (not the headers). This takes us up to the beginning of the regular expression. Following now is an "english" explanation of the regular expression: Content-Type: text/(html|plain) match the literal string "Content-Type: text/" followed by either "html" or "plain". This will match either of the following two lines: Content-Type: text/html Content-Type: text/plain and it could likewise be written: (Content-Type: text/html|Content-Type: text/plain) which is longer and somewhat more confusing, especially if you have more regex (as we do). Remembering our original regular expression: * 1^0 B ?? Content-Type: text/(html|plain)(;[ ]*$?[ ]*charset="iso-8859-1")?($Content-Disposition: inline)?$Content-Transfer-Encoding: base64 we move on to the next part: (;[ ]*$?[ ]*charset="iso-8859-1")? match a semi-colon followed by zero or more spaces or tabs (between [ and ] is a space character and a tab character). This is followed by an optional newline (the '$' is a newline in procmail regular expressions). This is followed by zero or more spaces or tabs, followed by the literal string 'charset="iso-8859-1"'. The trailing '?' applies to the entire expression between parentheses, meaning this entire part is optional. If it shows up in the email message, great; if not, no problem because we have more regular expression following. ($Content-Disposition: inline)? Match a newline, followed by the literal string "Content-Disposition: inline". This entire expression is also optional (the trailing '?' applies again to everything between the parentheses). Finally, we have: $Content-Transfer-Encoding: base64 Match a newline, followed by the literal string "Content-Transfer-Encoding: base64" Thus the original recipe: :0: * 1^0 B ?? Content-Type: text/(html|plain)(;[ ]*$?[ ]*charset="iso-8859-1")?($Content-Disposition: inline)?$Content-Transfer-Encoding: base64 $SPAMMY finds messages that encode their otherwise readable body with base64 encoding: a common technique among spammers. ================================================== 4) Implementing quotas (revised: Mon Apr 25 10:18:28 MDT 2005) NL=" " LOG="====================${NL}" QUOTA=3042880 DROPPRIVS=yes LOG="QUOTA: $QUOTA${NL}" ## David W. Tamkin <3F1EA16E.7040102@panix.com> ## Recommendation to use :0i from Leow Hock Seng on 25 Apr 2005 :0i INBOXSIZE=| set -- `ls -l $DEFAULT`; echo $5 LOG="INBOXSIZE: $INBOXSIZE${NL}" :0 * $ -${INBOXSIZE}^0 * $ ${QUOTA}^0 { MAXMSG = $= LOG="MAXMSG: $MAXMSG${NL}" :0 * $ > ${MAXMSG} { LOG="Bouncing (message too big!)${NL}" EXITCODE=69 HOST } } :0E { LOG="Bouncing (inbox already full!)${NL}" EXITCODE=69 HOST } ========================= Discussion (added Tue Apr 1 12:16:39 MST 2003) Quotas are best handled at the filesystem level or at least the MTA. However, occasionally neither of these are options for your system. This recipe set implements a crude (yet effective) way to implement mail quotas on systems that do not have them. We begin with setting some commonly used variables for logging (e.g., NL). The QUOTA variable contains the maximum quota that this mailbox should accept. INBOXSIZE contains the current size of this mailbox (this could possibly be done more efficiently). When we receive a message, we subtract $INBOXSIZE from its score. Then we add $QUOTA to it. $MAXMSG contains the current score. For an inbox that is not over quota, this number will be the difference of $QUOTA-$INBOXSIZE (some positive number). Next we compare the current message size to $MAXMSG with: * $ > ${MAXMSG} If this succeeds (i.e., the message is bigger than $MAXMSG), this means that we cannot accept the message because it will put us over quota, so we bounce it back with: EXITCODE=69 HOST Your MTA (this is sendmail) might define different EXITCODEs. If, on the other hand, the mailbox is already over quota, we jump to the last bouncing recipe: :0E { LOG="Bouncing (inbox already full!)${NL}" EXITCODE=69 HOST } ================================================== 5) Decoding message body before filtering ================================================== :0ic: DBODY=|perl -p0777 -e 's{=\n}{}g' :0 * DBODY ?? stringtomatch matched_file ================================================== FIXME: need discussion here ================================================== 6) Rejecting all HTML email, except for whitelisted senders The format of the whitelist file is: joe@domain.tld otherdomain.tld ================================================== WHITELIST=$HOME/whitelist ## simple HTML scanner ## Scott Wiersdorf ## Tue Aug 26 11:40:08 MDT 2003 ## ## You can add more html tags as needed below. ## ## Note that this is EXTREMELY INEFFICIENT to scan the body for all ## incoming emails; the tag searching is also risky because some of ## these tags may be used legitimately in non-html messages. YMMV. :0 * 9876543210^0 B ?? ^Content-Type: text/html * 9876543210^0 B ?? ()<(html|body|head|h(1|2|3)|b|i|strong|em|font|a href|img).*> { :0 * ! $ ? formail -z -xFrom: -xSender: -xResent-From: | fgrep -iqf $WHITELIST { :0 * ! ^FROM_DAEMON * ! ^X-HTML-Bounce: rejected { SUBJECT=`formail -zxSubject:` :0 h | (formail -r -I"Precedence: junk" -A"X-HTML-Bounce: rejected" \ -I"Subject: HTML_REJECT (was $SUBJECT)"; \ echo "Your mail was rejected because it contains HTML."; \ echo "Please resend your message in plain text.") | $SENDMAIL -t } } } ## trash bounces from our autoreply :0 * ^FROM_DAEMON * B ?? X-HTML-Bounced: rejected /dev/null ========================= Discussion This set of recipes will enforce a non-HTML email acceptance for the server or account it is added to. Don't count on this sort of thing winning you any friends. Many email clients by default will send HTML as well as plaintext in the same message (via a multipart/alternative MIME message), allowing the MIME-aware email client to choose its preferred message format. ================================================== 7) Email-to-shell gateway (or running shell commands from an email message) Properly authenticated, pipe the body of a message (stripping signature) through sh then send a reply back to the sender with the original commands with their results. ========================= MY_XLOOP="$LOGNAME@$HOST" FORMAIL=/usr/local/bin/formail SHELL=/bin/sh :0 * H ?? ^From: .*joe@schmoe\.org * H ?? ^Subject: shell command$ * H ?? ^X-Command: sekrit_passw0rd * ! H ?? ^X-Loop: $LOGNAME@$HOST" { ## store the body in COMMAND, stripping any signature :0 b COMMAND=| sed -e '/^-- /,$ d' ## reply to sender with original command and results :0 | ($FORMAIL -rt \ -I "$MY_XLOOP" \ -I "Precedence: junk"; \ echo "Your command:"; echo "$COMMAND"; \ echo "Results:"; echo "$COMMAND" | sh 2>&1; \ ) | $SENDMAIL -oi -t } ========================= Discussion While this should never be used under normal circumstances, it may be useful in certain times and seasons. Being able to remotely administer your server from a text-messaging cell phone or remote email account can be a handy thing. The recipe first makes sure the message is from you, though you must remember that this can easily be forged. It is simply one extra step that an attacker must guess before gaining access. We next check that the subject line contains only the words "shell command". We check for a custom header containing "sekrit_passw0rd" as an additional layer of authentication. Finally, we abort the message if this is an autoreply from ourselves. In the action block, we pipe the body of the message through sed and strip out any signature lines that may be there. We then send an autoreply back to the sender containing the original command and the command results; we pipe the command through the Bourne-shell to get our work done. This means you can write any legitimate Bourne-shell script in our email message. Here is a sample message we might send to our email-to-shell gateway: pwd cd ~/calendar cat calendar-200311.txt test -d backups || mkdir backups mv calendar-200311.txt backups We can do any series of commands as if we were at a shell prompt. ================================================== 8) Detect long subject lines We want to trap emails with long subject lines. ========================= MAX_SUBJECT_CHARS=25 ## grab subject :0 * ^Subject:\/.* { SUBJECT = $MATCH ## strip off leading spaces, if any :0 * SUBJECT ?? [ ]*\/[^ ].* { SUBJECT = $MATCH } ## count chars in subject :0 * $ -${MAX_SUBJECT_CHARS}^0 * 1^1 SUBJECT ?? . { ## count total number of chars CHARS = $= :0 * $ ${CHARS}^0 * $ ${MAX_SUBJECT_CHARS}^0 { LOG="Reason: Subject contains $= characters${NL}" } ## put this message somewhere special :0: $PMDIR/subject_too_big } } ========================= Discussion This set of recipes uses MAX_SUBJECT_CHARS to determine how long is too long for an email subject line. We first grab all the characters after the word 'Subject:' (including any spaces between Subject: and the actual subject line): ## grab subject * ^Subject:\/.* The subject is assigned, then, to the SUBJECT procmail variable. Next we strip off any leading spaces and put the results back into the SUBJECT variable: ## strip off leading spaces, if any :0 * SUBJECT ?? [ ]*\/[^ ].* { SUBJECT = $MATCH } And finally we create a scoring recipe and assign the score to the negative of our maximum allowed number of characters in a subject (MAX_SUBJECT_CHARS), then add the total number of characters in the SUBJECT variable to the recipe score: ## count chars in subject :0 * $ -${MAX_SUBJECT_CHARS}^0 * 1^1 SUBJECT ?? . { If the result is positive (>0), we move into the nested block and run two more recipes. The first recipe simply logs the length of the subject line in the log file currently assigned to LOGFILE (this recipe is therefore optional): ## count total number of chars CHARS = $= :0 * $ ${CHARS}^0 * $ ${MAX_SUBJECT_CHARS}^0 { LOG="Reason: Subject contains $= characters${NL}" } The second recipe actually delivers the mail with long subject lines to a special file, and we close our nested action block: ## put this message somewhere special :0: $PMDIR/subject_too_big } ==================================================