QP/Base64 decoding of RFC 2822 header fields
As I mentioned previously, I have a distain for UCE. On my home Exchange server I'm constantly trying to come up with new ways to block this junk email. For quite some time I've relied on RBL's which is now very easy to use with Exchange 2003. The only problem with RBL's is that the speed that worms/viruses create new open proxies usually allows spammers a couple of hours to exploit a new slave machine before it is black listed. Being an insider lets me test drive new technologies like IMF.
One cool aspect of IMF is that it can be setup to archive messages that exceed a certain spammines. Of course then you are left with a bunch of messages in a directory that you have to browse through.
Being the geek that I am, I decided to write my own browser to parse the messages so I could decide to resubmit them, delete them, or report them to SpamCop. Everything was going great until I found that quite a few spammers have started either QP or Base64 encoding the headers per RFC 1342. I quickly figured out how to solve the Base64 issue with C#, but there didn't seem to be any support for QP decoding. Searching online only generated results of other people looking for the same solution. So after going through the spec I ended rolling my own solution as follows:
//using System;
//using System.Collections;
//using System.Globalization;
//using System.Text;
//using System.Text.RegularExpressions;
private static string DecodeField(string field)
{
string result = field;
int isoStart = field.IndexOf("=?");
int isoEnd = field.IndexOf("?=") - isoStart -2;
if (isoStart > -1 && isoEnd > -1)
{
string text = field.Substring(isoStart + 2, isoEnd);
string[] parts = field.Substring(isoStart + 2, isoEnd).Split('?');
// Only continue parsing if there are 3 parts
// charset?encoding?text
if (parts.Length == 3)
{
Encoding coder = Encoding.GetEncoding(parts[0]);
byte[] bSubject;
if (parts[1].ToLower().Equals("b"))
bSubject = Convert.FromBase64String(parts[2]);
else
bSubject = QPGetString(parts[2]);
result = coder.GetString(bSubject);
if (isoStart + isoEnd + 4 <>
result += field.Substring(isoStart + isoEnd + 4);
if (isoStart > 0)
result = field.Substring(0, isoStart) + result;
}
else
throw new ArgumentException(String.Format("Invalid number of parameters! Expected 3, found {0}. Correct format is =?charset?encoding?text?=", parts.Length));
}
else if (isoStart > -1 isoEnd > -1)
throw new ArgumentException("Invalid encoding detected! Correct format is =?charset?encoding?text?=");
return result;
}
private static byte[] QPGetString(string line)
{
if (line == null)
throw new ArgumentNullException();
ArrayList byteArray = new ArrayList();
// look for =XX where xx is hex
Regex reg = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.IgnoreCase);
for (int i = 0; i <>
if(line.Length >= i + 3 && Char.Equals(line[i],'=') && reg.IsMatch(line.Substring(i,3)))
{
byteArray.Add(Convert.ToByte(int.Parse(line.Substring(i+1,2), NumberStyles.HexNumber)));
// skip two chars
i += 2;
}
else
{
byteArray.Add(Convert.ToByte(char.Parse(line.Substring(i,1))));
}
}
return (byte[]) byteArray.ToArray(typeof(byte));
}
Updated with error checking
3 Comments:
Hi James,
I wanted to let you know, I too have a Base64 app available on my blog, Artificial Intelligence-One Man's Quest. Its in VBScript and is implemented as a Windows Script component. Simple to use for either encoding or decoding.
Charles
Thank you very much for the example,
but this would be more precise:
int isoEnd = field.LastIndexOf( "?=") - isoStart -2;
No, but it does bring up the issue of detecting errors. The format for encoding is:
=?charset?encoding?text?=
Ideally the function would throw if the string is not in this format. As it is it will just return the junk back to you.
If you had a properly formated string with multiple encodings, the function would work as is by recursively calling the function until the entire string was decoded -
While (string.Contains("=?"))
{
string = DecodeField(string);
}
I'll see about editing the original funtion for error catching.
Post a Comment
<< Home