John, the biggest problem with that would be anyone trying to send a message in a language that uses a different character set, or perhaps special punctuation characters (also names from other languages) even in English. It would be secure but overkill.
Mat, if you are the only recipient of the emails and you are using gmail, then I honestly don't see any potential problems, at all, even using raw data. Gmail protects you well enough and if you're not using HTML anyway, it should be fine. Gmail probably strips malicious JS (most of it anyway) even if you ARE using HTML. They scan every attachment for viruses before allowing you to download it, so I'd expect they'd be just as careful with the text of the message.
One warning: don't click any links in the email unless you feel that you can trust the sender; or open them on a secured computer (for example, don't use windows [which is the target of most viruses] and use a new browser session so that your cookies won't have any important information).
But realistically you're worrying way too much about this if you are not planning to automatically post the information publicly.
htmlentities() will be enough to stop any type of XSS, for any situation in which it may be relevant. No complex Regex is needed.
But again, XSS is just one type of malicious code that can be injected, but it is the only* kind of malicious code that is a problem related to HTML-- the rest is when you are using databases or doing something else-- you're not, so don't worry about it (for now).
(*Malicious HTML in general, not just XSS, but XSS is more or less the only dangerous kind-- the rest is just annoying such as something that would change the content of your page, perhaps inserting an advertisement or breaking the code so the page doesn't display properly, but not something that is truly "dangerous" like XSS that's a possible security threat. Regardless ALL HTML problems will be solved by htmlentities()-- that entirely disables the operator characters that make HTML do anything-- < > and " &. In fact, you only really need to disable < > for security. " and & just create code-instability problems and errors, but can't in themselves generate XSS or other dangerous code.)
In summary, use htmlentities() if you want to be extra careful and account for all situations including those where you would actually be using this text WITHIN HTML. If you never use it within HTML (including email, unless you're specifically sending it as HTML) then you don't even need to do that.
Getting back to the big picture, ask yourself this question: what characters make HTML work? The answer is < and >, and nothing else. You cannot write HTML (and thereby JS/CSS) without < and >, so those are the ONLY characters you need to disable. htmlentities() is one way to do that. There are other ways if you prefer.
(The only exception to that would be a very unusual case where you have incorrectly mixed character encodings so that multibyte characters, such as Chinese characters [typically 4 bytes each], are read individually, in which case one byte of the multi-byte character MIGHT be interpreted as < or >, but the circumstances for this occuring are so bizarre that you can basically ignore it. You would need to have mixed character encodings, a problem in itself, setup in exactly such a way that your checking for < and > would recognize the full multibyte characters then the final output would be displaying them separately and recognize individual bytes as < and >, but more importantly for someone to actually exploit this, they would need to predict all of this-- almost impossible and certainly not worth the effort. But if you want to protect yourself specifically from this, ALWAYS use unicode [UTF8] for everything. That's a good idea anyway in almost all cases. To be clear, though, worrying about this is a waste of time, in my opinion.)

