So you wanna learn Regex? - Part 5

Welcome to So You Wanna Learn Regex? Part 5. This is our last part of this series, mostly because I don't know a whole lot more than this. If you want more tutorials about regex, go bug Ben. He knows more about this than I ever will and I hear he has a blog...

In our last exercise, we looked at a simple way to add cfqueryparam to a bunch of queries. This was accomplished by making a pattern consisting of 3 groups then using one of the groups to populate a literal string.

In this exercise, we are going to clean up some data scripts. Let's suppose you are generating database scripts and your script generator puts the primary key in there. For whatever reason, you want to remove this.

Assume this set of declarations:

INSERT INTO `memberchallenge` VALUES ('11', '1', '19', null, '2008-11-14 14:07:59', '2008-11-14 14:07:59', '1', '2008-11-14 14:07:59', '0');
INSERT INTO `memberchallenge` VALUES ('12', '2', '19', null, '2008-11-14 15:40:51', '2008-11-14 15:40:51', '1', '2008-11-14 15:40:51', '0');
INSERT INTO `memberchallenge` VALUES ('14', '5', '19', null, '2008-11-14 20:14:26', '2008-11-14 20:14:26', '5', '2008-11-14 20:14:26', '0');
INSERT INTO `memberchallenge` VALUES ('15', '1', '20', null, '2008-11-23 18:19:31', '2008-11-23 18:19:31', '1', '2008-11-23 18:19:30', '0');
INSERT INTO `memberchallenge` VALUES ('16', '2', '20', null, '2008-11-23 18:20:09', '2008-11-23 18:20:09', '1', '2008-11-23 18:20:09', '0');
INSERT INTO `memberchallenge` VALUES ('17', '1', '21', null, '2008-11-25 20:32:44', '2008-11-25 20:32:44', '1', '2008-11-25 20:32:44', '0');
INSERT INTO `memberchallenge` VALUES ('18', '2', '21', null, '2008-11-25 20:33:01', '2008-11-25 20:33:01', '1', '2008-11-25 20:33:01', '0');

What we want, is to turn:INSERT INTO `memberchallenge` VALUES ('11', '1', '19', null, '2008-11-14 14:07:59', '2008-11-14 14:07:59', '1', '2008-11-14 14:07:59', '0'); into: INSERT INTO `memberchallenge` VALUES ('1', '19', null, '2008-11-14 14:07:59', '2008-11-14 14:07:59', '1', '2008-11-14 14:07:59', '0');

Note the first value in the VALUES statement has vanished.. this would be the primary key in our dataload script.

So as you know, we define this pattern in the gobbledegook of regular expressions. When read one chunk at a time, these actually make sense. We'll go through the exercise, then look at why it worked.

In Eclipse, perform the following:

  1. Open a new file and paste the above set of declarations: ( the whole query, yes, the whole thing)
  2. Open the find dialogue (I use CTRL+F) and make sure the Regular Expression option is ticked
  3. Enter the following in the Find: Input \('[0-9]+',\s
  4. Enter the following in the Replace: Input (
  5. Press Find and make sure the pattern matches what we want
  6. Lastly, press Replace All

You Should Have This:

INSERT INTO `memberchallenge` VALUES ('1', '19', null, '2008-11-14 14:07:59', '2008-11-14 14:07:59', '1', '2008-11-14 14:07:59', '0');
INSERT INTO `memberchallenge` VALUES ('2', '19', null, '2008-11-14 15:40:51', '2008-11-14 15:40:51', '1', '2008-11-14 15:40:51', '0');
INSERT INTO `memberchallenge` VALUES ('5', '19', null, '2008-11-14 20:14:26', '2008-11-14 20:14:26', '5', '2008-11-14 20:14:26', '0');
INSERT INTO `memberchallenge` VALUES ('1', '20', null, '2008-11-23 18:19:31', '2008-11-23 18:19:31', '1', '2008-11-23 18:19:30', '0');
INSERT INTO `memberchallenge` VALUES ('2', '20', null, '2008-11-23 18:20:09', '2008-11-23 18:20:09', '1', '2008-11-23 18:20:09', '0');
INSERT INTO `memberchallenge` VALUES ('1', '21', null, '2008-11-25 20:32:44', '2008-11-25 20:32:44', '1', '2008-11-25 20:32:44', '0');
INSERT INTO `memberchallenge` VALUES ('2', '21', null, '2008-11-25 20:33:01', '2008-11-25 20:33:01', '1', '2008-11-25 20:33:01', '0');

(if not, you missed a step. Look at the image and compare with what you have in your Find/Replace dialog. Make sure there is no extra whitespace in the find expression)

Blamo! The primary key section of each SQL statement has gone bye bye. The cool thing is you didn't even get carpal tunnel syndrome! Let's decode the code, shall we?

Here is the find portion of the regular expression: \('[0-9]+',\s

  • \('  The first character chunk is an escaped open parenthesis along with a single quote.
  • [0-9]+  The next chunk is a character set, defined by the square brackets, consisting of all numbers from 0-9. We also have a plus sign indicating we are expecting one or more of the character class.
  • ',  Next is a single quote literal along with a comma literal.
  • \s  Last chunk, is the whitespace symbol set.
All of that defines boundaries for a character walking regular expression gnome to start at the beginning of value for each SQL statement, and remove what is inside the initial single quote boundary.

Then in the Replace section, we used: (

  • This replaces the open parenthesis in the found set with an open parenthesis. We could also have used a group here, if we wanted to, it would just be a few more characters in the find expression.

So in plain English, we asked the regular expression find/replace gnome to: Define the parenthesis and the first set of single quotes. Then in the replace pass, wipe it out an replace the opening parenthesis.

I'm sure you can agree this was much easier than a copy/paste extravaganza... I hope you enjoyed this blog series on Regex. If you want more of these, go bug Ben Nadel... His brain is a mobius strip of interesting regular expression patterns...

Related Blog Entries

There are no comments for this entry.

Add Comment Subscribe to Comments