Monday, 21 November 2016

.net - Regular expression for parsing links from a webpage?



I'm looking for a .NET regular expression extract all the URLs from a webpage but haven't found one to be comprehensive enough to cover all the different ways you can specify a link.



And a side question:



Is there one regex to rule them all? Or am I better off using a series of less complicated regular expressions and just using mutliple passes against the raw HTML? (Speed vs. Maintainability)


Answer



((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)



I took this from regexlib.com



[editor's note: the {1} has no real function in this regex; see this post]


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...