So you wanna learn Regex? - Part 2

Welcome to So You Wanna Learn Regex? Part 2. In our last exercise, we looked at a simple way to add a new attribute to an HTML tag. This was accomplished by making a pattern, defining a group and using a back reference. This time we will look at a slightly more complicated use case.

Assume this set of declarations:

product.setColor(arguments.color);
product.setSize(arguments.size);
product.setCondition(arguments.condition);
product.setRating(arguments.rating);
product.setReliability(arguments.reliability);
product.setNeedsBatteries(arguments.needsBatteries);

What we want, is to turn: product.setColor(arguments.color); into: product.setColor( htmlEditFormat(arguments.color) );

Normally, this would be a forearm/wrist fatiguing flail on the keyboard, furiously cutting/pasting and generally flapping about. Not so with Regular Expressions. A Regex is a pattern matcher, and it can do stuff. We can see our code is repetitive and the pattern we want is: Take Everything Inside The Parenthesis, and Wrap It In A htmlEditFormat() Function. (Same stuff we'd do over and over via cut/paste/etc, isn't it?)

We can define this pattern in the gobbledegook defining a regular expression. When read one chunk at a time, these actually make sense. We'll go through the exercise, then look at why it worked.

In Eclipse, perform the following:

Editors Note:

Simply reading these blog posts aren't going to help you. Open eclipse, and copy/paste this stuff into your find/replace dialog. You'll learn more, or your money back!

  1. Open a new file and paste the above set of declarations: ( remember the chunk above starting with product.setColor(arguments.color);...)
  2. Open the find dialogue (I use CTRL+F) and make sure the Regular Expression option is ticked
  3. Enter the following in the Find: Input (\([^\)]+\))
  4. Enter the following in the Replace: Input ( htmlEditFormat$1 )
  5. Press Find and make sure the pattern matches what we want
  6. Lastly, press Replace All

You Should Have This:

product.setColor( htmlEditFormat( htmlEditFormat(arguments.color) ) );
product.setSize( htmlEditFormat(arguments.size) );
product.setCondition( htmlEditFormat(arguments.condition) );
product.setRating( htmlEditFormat(arguments.rating) );
product.setReliability( htmlEditFormat(arguments.reliability) );
product.setNeedsBatteries( htmlEditFormat(arguments.needsBatteries) );

(if not, you missed a step. Look at the image and compare with what you have in your Find/Replace dialog. Make sure there is no extra whitespace in the find expression)

Blamo! Your code is now all properly HTMLEditFormatted and you didn't even get carpal tunnel syndrome! Let's decode the code, shall we?

Here is the find portion of the regular expression: (\([^\)]+\))

  • (  The first character chunk is an open parenthesis. This basically defines a group. You can see the entire expression is surrounded by parenthesis, so we will be treating what is found as a group.
  • \(  The next chunk is a backslash. Most often this is an escaping character, which means treat the next character as a literal character we want to find in our string. Taken with the next character, we can see we want to find an open parenthesis.
  • [^\)]  The next chunk defines any character that is not a close parenthesis. Note it starts with an open bracket, used to define a set. Inside the open bracket is a carat. This means it is opposite day and our set should NOT INCLUDE the whatever follows. What follows is a backslash and a close parenthesis, regexese for a literal ( Then the close bracket.
  • +\)  The next chunk is a plus symbol, followed by a backslash and close parenthesis. A plus symbol defines 1 or more of the next character in the expression, which is really the next next character, since we need a backslash to escape the close parenthesis.
  • )  Last chunk, the closing parenthesis defining the end of our group.
All of that defines boundaries for a character walking regular expression gnome to take the stuff inside the parenthesis and hold on to it.

Then in the Replace section, we used: ( htmlEditFormat$1 )

  • The surrounding parenthesis are literal, as is the htmlEditFormat.
  • The $1 refers to the group we defined in the Find input. (remember the term backreference?)

So in plain English, we asked the regular expression find/replace gnome to: Take the stuff inside the parenthesis, and wrap it with ( HTMLEditFormat+GROUPTEXT+ ).

I'm sure you can agree this was much easier than a copy/paste extravaganza... Stay tuned for part three...

Related Blog Entries

There are no comments for this entry.

Add Comment Subscribe to Comments