Regex and Long Running Page in ASP.NET

Saturday, 12 May 2007 20:41 by Admin

Recently at work, I ran into a problem where a page appeared to be hanging the whole asp.net process.  Being a person with very little patience, I ran the handy dandy iisreset after about 1 minute.  As it turns out, it wasn't actually hanging, it merely was taking 3-5 minutes to load a page!  So, I dropped into debug mode and hit the pause button on the debugger.  Behold, the Regex object was the culprit.

We have a complex web app that parses the links of each page and appends certain querystring variables.  The page that was bombing out appeared to have some characters that was throwing the regex object off. 

I put my detective hat on and went Googling.  An excellent post here by Scott Hanselman lead me to a clue.  It appears Scott's application didn't like the absense of a missing bracket in his regex.  However, my problem seemed a little more difficult to diagnose.  My string to parse was an entire page worth of HTML!  I looked through the code and found that it is supposed to strip off viewstate first.  Makes sense, why match on that stuff.  A co-worker sent the html output through the HTML Tidy app at the W3c.  Well, that was no help since our HTML had a LOT of issues.  We obviously don't run strict mode within our pages.

Another co-worker suggested I save the output of the page and put it in a sandbox.  I wrote a local winforms app that took the source of the page, opened the contents in a string reader and applied the regex expression on each line of text.  It immediately shot my processors to 50% each.  It hit pause to find that the culprit was VIEWSTATE.  What??? I thought that was stripped out!  The web app uses a proprietary framework used by all of our software for common tasks.  In this case, it uses a class written originally in 1.1 to format the links.  In the step where it strips out the viewstate, it was matching on this regex pattern: 

Dim rgxViewState As New Regex("<input type=""hidden"" name=""__VIEWSTATE"" value=""[\w\+/\s=]+""")

As it turns out, ASP.NET 2.0 adds an extra Dim rgxViewState As New  id=""__VIEWSTATE"" to the hidden viewstate field.  Thus, viewstate was not stripped out.  Something about the viewstate on this particular page was causing the viewstate to puke.  I added the id=""__VIEWSTATE"" to the pattern and that solved the problem. 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5
Categories:  
Actions:   E-mail | Permalink | Comments (0) | Comment RSSRSS comment feed
Comments are closed