Regex-matching the contents of an HTML tag

09/16/2008

I have a folder in my Mail.app client dedicated to code snippets that I've emailed to myself. It's usually stuff that I don't use a whole lot, or stuff that I did at the end of a day that I know I won't remember tomorrow. There's a whole load of mod_rewrite and shell stuff in there! A blog seems like a pretty sensible place to keep this sort of thing, so every time I get something I might need, I'll add it here as a pastebin type thing. I could always use pastebin, but that's just one more site to keep track of!

So, the following matches the value of the href attribute in an a tag:

<a(?!href).href=["']{1}([^"'])["']{1}[^>]>
Obviously if you wanted to match the src attribute of an img tag, you'd change it up in the following way:
<img(?!src).src=["']{1}([^"'])["']{1}[^>]>
The former matches the href value in the following tags:
<a href="awesome">
<a class="brilliant" href="awesome">
<a href="awesome" id="excellent">
<a class="brilliant" href="awesome" id="excellent">
<a href='awesome'>
<a href=''>
Being awesome on each row, except for the last row, which is an empty string.

This revelation is inspired by the following post on statiksoft. I really needed this the other day!