Wednesday, August 4, 2010

Extracting tags from xhtml content

There are two ways of doing it. One is the dirty way, using Mid function in VB.Net or IndexOf string method but the more appropriate way would be to use regular expressions.

Following code will get you title using regular expression
Regex regex = new Regex("<title>(?<title>.*?)</title>", RegexOptions.IgnoreCase);
Match titleMatch = regex.Match(html);
string title = titleMatch.Groups["title"].Value;
Following code will get you meta tag My_Meta

Regex regex = new Regex("<META +NAME=\"(?<name>My_Meta*?)\" +CONTENT=\"(?<content>.*?)\" */?>", RegexOptions.IgnoreCase);
Match metaMatch = regex.Match(html);

title = metaMatch.Groups["content"].Value;

No comments: