Can you provide some examples of why it is hard to parse XML and HTML with a regex?

One mistake I see people making over and over again is trying to parse XML or HTML with a regex. Here are a few of the reasons parsing XML and HTML is hard:

People want to treat a file as a sequence of lines, but this is valid:

attr="5"
/>

People want to treat < or

People often want to match starting tags to ending tags, but XML and HTML allow tags to contain themselves (which traditional regexes cannot handle at all):

foo

People often want to match against the content of a document (such as the famous "find all phone numbers on a given page" problem), but the data may be marked up (even if it appears to be normal when viewed):

(703)
348-3020

Comments may contain poorly formatted or incomplete tags:

foo

bar

What other gotchas are you aware of?

Answer

Here's some fun valid XML for you:

b"> ]>

    
    b b 
      d

And this little bundle of joy is valid HTML:

    
    ">
]>
    x

    
    
    &

     < -->
    &e link

Not to mention all the browser-specific parsing for invalid constructs.

Good luck pitting regex against that!

EDIT (Jörg W Mittag): Here is another nice piece of well-formed, valid HTML 4.01:

  "http://www.w3.org/TR/html4/strict.dtd"> 
      /<br/>    <p/><br/></code></pre><br/>    </div>
<div style='clear: both;'></div>
</div>
<div class='post-footer'>
<div class='post-footer-line post-footer-line-1'>
<span class='post-author vcard'>
</span>
<span class='post-timestamp'>
-
<meta content='https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html' itemprop='url'/>
<a class='timestamp-link' href='https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html' rel='bookmark' title='permanent link'><abbr class='published' itemprop='datePublished' title='2016-09-29T13:27:00-07:00'>September 29, 2016</abbr></a>
</span>
<span class='post-comment-link'>
</span>
<span class='post-icons'>
<span class='item-control blog-admin pid-1083048888'>
<a href='https://www.blogger.com/post-edit.g?blogID=8010773932506618868&postID=6139975210327932969&from=pencil' title='Edit Post'>
<img alt='' class='icon-action' height='18' src='https://resources.blogblog.com/img/icon18_edit_allbkg.gif' width='18'/>
</a>
</span>
</span>
<div class='post-share-buttons goog-inline-block'>
<a class='goog-inline-block share-button sb-email' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=email' target='_blank' title='Email This'><span class='share-button-link-text'>Email This</span></a><a class='goog-inline-block share-button sb-blog' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=blog' onclick='window.open(this.href, "_blank", "height=270,width=475"); return false;' target='_blank' title='BlogThis!'><span class='share-button-link-text'>BlogThis!</span></a><a class='goog-inline-block share-button sb-twitter' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=twitter' target='_blank' title='Share to X'><span class='share-button-link-text'>Share to X</span></a><a class='goog-inline-block share-button sb-facebook' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=facebook' onclick='window.open(this.href, "_blank", "height=430,width=640"); return false;' target='_blank' title='Share to Facebook'><span class='share-button-link-text'>Share to Facebook</span></a><a class='goog-inline-block share-button sb-pinterest' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=pinterest' target='_blank' title='Share to Pinterest'><span class='share-button-link-text'>Share to Pinterest</span></a>
</div>
</div>
<div class='post-footer-line post-footer-line-2'>
<span class='post-labels'>
</span>
</div>
<div class='post-footer-line post-footer-line-3'>
<span class='post-location'>
</span>
</div>
</div>
</div>
<div class='comments' id='comments'>
<a name='comments'></a>
<h4>No comments:</h4>
<div id='Blog1_comments-block-wrapper'>
<dl class='avatar-comment-indent' id='comments-block'>
</dl>
</div>
<p class='comment-footer'>
<div class='comment-form'>
<a name='comment-form'></a>
<h4 id='comment-post-message'>Post a Comment</h4>
<p>
</p>
<a href='https://www.blogger.com/comment/frame/8010773932506618868?po=6139975210327932969&hl=en-GB&saa=47563' id='comment-editor-src'></a>
<iframe allowtransparency='true' class='blogger-iframe-colorize blogger-comment-from-post' frameborder='0' height='410px' id='comment-editor' name='comment-editor' src='' width='100%'></iframe>
<script src='https://www.blogger.com/static/v1/jsbin/1839367302-comment_from_post_iframe.js' type='text/javascript'></script>
<script type='text/javascript'>
      BLOG_CMT_createIframe('https://www.blogger.com/rpc_relay.html');
    </script>
</div>
</p>
</div>
</div>

        </div></div>
      
</div>
<div class='blog-pager' id='blog-pager'>
<span id='blog-pager-newer-link'>
<a class='blog-pager-newer-link' href='https://stklowf.blogspot.com/2016/09/c-how-would-i-run-async-task-method.html' id='Blog1_blog-pager-newer-link' title='Newer Post'>Newer Post</a>
</span>
<span id='blog-pager-older-link'>
<a class='blog-pager-older-link' href='https://stklowf.blogspot.com/2016/09/how-to-get-get-query-string-variables.html' id='Blog1_blog-pager-older-link' title='Older Post'>Older Post</a>
</span>
<a class='home-link' href='https://stklowf.blogspot.com/'>Home</a>
</div>
<div class='clear'></div>
<div class='post-feeds'>
<div class='feed-links'>
Subscribe to:
<a class='feed-link' href='https://stklowf.blogspot.com/feeds/6139975210327932969/comments/default' target='_blank' type='application/atom+xml'>Post Comments (Atom)</a>
</div>
</div>
</div><div class='widget FeaturedPost' data-version='1' id='FeaturedPost1'>
<div class='post-summary'>
<h3><a href='https://stklowf.blogspot.com/2017/06/c-does-curly-brackets-matter-for-empty_20.html'>c++ - Does curly brackets matter for empty constructor?</a></h3>
<p>
Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...
</p>
</div>
<style type='text/css'>
    .image {
      width: 100%;
    }
  </style>
<div class='clear'></div>
</div><div class='widget PopularPosts' data-version='1' id='PopularPosts1'>
<div class='widget-content popular-posts'>
<ul>
<li>
<div class='item-content'>
<div class='item-title'><a href='https://stklowf.blogspot.com/2017/03/javascript-create-multidimensional.html'>javascript - Create multidimensional array from string</a></div>
<div class='item-snippet'>          I want to create an options array from a string. How can i create an array as {     width : 100,     height : 200 } from a string ...</div>
</div>
<div style='clear: both;'></div>
</li>
<li>
<div class='item-content'>
<div class='item-title'><a href='https://stklowf.blogspot.com/2017/02/c-how-to-fix-body-of-cannot-be-iterator.html'>c# - How to fix "The body of 'display(List)' cannot be an iterator
block because 'string' is not an iterator interface type"?</a></div>
<div class='item-snippet'>I'm new to Programming. I would like to implement a program with a yield keyword . So That, I have created a new List  and ask the user ...</div>
</div>
<div style='clear: both;'></div>
</li>
<li>
<div class='item-content'>
<div class='item-title'><a href='https://stklowf.blogspot.com/2016/10/gradle-cannot-find-android-support.html'>Gradle cannot find the Android Support Repository - Eclipse Neon,
Gradle 3.5, javafxports</a></div>
<div class='item-snippet'>This is my first post.  I have searched extensively for four days through Stackoverflow and other sources for the problem and have yet to fi...</div>
</div>
<div style='clear: both;'></div>
</li>
</ul>
<div class='clear'></div>
</div>
</div></div>
</div>
</div>
<div class='column-left-outer'>
<div class='column-left-inner'>
<aside>
</aside>
</div>
</div>
<div class='column-right-outer'>
<div class='column-right-inner'>
<aside>
<div class='sidebar section' id='sidebar-right-1'><div class='widget BlogSearch' data-version='1' id='BlogSearch1'>
<h2 class='title'>Search This Blog</h2>
<div class='widget-content'>
<div id='BlogSearch1_form'>
<form action='https://stklowf.blogspot.com/search' class='gsc-search-box' target='_top'>
<table cellpadding='0' cellspacing='0' class='gsc-search-box'>
<tbody>
<tr>
<td class='gsc-input'>
<input autocomplete='off' class='gsc-input' name='q' size='10' title='search' type='text' value=''/>
</td>
<td class='gsc-search-button'>
<input class='gsc-search-button' title='search' type='submit' value='Search'/>
</td>
</tr>
</tbody>
</table>
</form>
</div>
</div>
<div class='clear'></div>
</div><div class='widget BlogArchive' data-version='1' id='BlogArchive1'>
<h2>Blog Archive</h2>
<div class='widget-content'>
<div id='ArchiveList'>
<div id='BlogArchive1_ArchiveList'>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2017/'>
2017
</a>
<span class='post-count' dir='ltr'>(2404)</span>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2017/06/'>
June 2017
</a>
<span class='post-count' dir='ltr'>(276)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2017/05/'>
May 2017
</a>
<span class='post-count' dir='ltr'>(434)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2017/04/'>
April 2017
</a>
<span class='post-count' dir='ltr'>(433)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2017/03/'>
March 2017
</a>
<span class='post-count' dir='ltr'>(450)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2017/02/'>
February 2017
</a>
<span class='post-count' dir='ltr'>(379)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2017/01/'>
January 2017
</a>
<span class='post-count' dir='ltr'>(432)</span>
</li>
</ul>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate expanded'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy toggle-open'>

        ▼ 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/'>
2016
</a>
<span class='post-count' dir='ltr'>(3825)</span>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/12/'>
December 2016
</a>
<span class='post-count' dir='ltr'>(446)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/11/'>
November 2016
</a>
<span class='post-count' dir='ltr'>(421)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/10/'>
October 2016
</a>
<span class='post-count' dir='ltr'>(458)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate expanded'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy toggle-open'>

        ▼ 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/09/'>
September 2016
</a>
<span class='post-count' dir='ltr'>(374)</span>
<ul class='posts'>
<li><a href='https://stklowf.blogspot.com/2016/09/get-int-value-from-enum-in-c.html'>Get int value from enum in C#</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/r-faq-how-to-make-great-r-reproducible.html'>r faq - How to make a great R reproducible example</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-typeerror-is-not-function.html'>javascript - TypeError: "this..." is not a function</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/iterating-javascript-object-properties.html'>Iterating a JavaScript object's properties using j...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-how-to-start-programming-from-scratch.html'>c# - How to start programming from scratch?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/how-to-set-limits-for-axes-in-ggplot2-r.html'>How to set limits for axes in ggplot2 R plots?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/assembly-how-does-division-by-constant.html'>assembly - How does division by constant work in a...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-where-and-why-do-i-have-to-put-and.html'>c++ - Where and why do I have to put the "template...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/security-best-practices-salting.html'>security - Best Practices: Salting & peppering pas...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/how-to-get-today-date-in-java-in.html'>How to get today's Date in java in the following p...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/android-how-to-change-font-on-textview.html'>android - How to change the font on the TextView?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/function-what-is-scope-of-variables-in.html'>function - What is the scope of variables in JavaS...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/unit-testing-what-is-mocking.html'>unit testing - What is Mocking?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/plot-explanation-why-did-grandfather.html'>plot explanation - Why did Grandfather insist on A...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/a-quick-and-easy-way-to-join-array.html'>A quick and easy way to join array elements with a...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/angular-what-is-difference-between.html'>angular - What is the difference between Promises ...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/please-explain-use-of-javascript.html'>Please explain the use of JavaScript closures in l...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-fatal-error-allowed-memory-size-of.html'>php - Fatal Error: Allowed Memory Size of 13421772...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/vba-deleting-duplicate-copy-of-chart.html'>VBA deleting a duplicate copy of chart object fail...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/floating-point-general-way-of-comparing.html'>floating point - General way of comparing numerics...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-parse-error-syntax-error-unexpected.html'>php - Parse error: syntax error, unexpected 'endif...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/excel-select-multiple-ranges-with-vba.html'>excel - Select multiple ranges with VBA</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-how-would-i-run-async-task-method.html'>c# - How would I run an async Task method synchron...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html'>Can you provide some examples of why it is hard to...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/how-to-get-get-query-string-variables.html'>How to get GET (query string) variables in Express...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/java-onpostexecute-is-only-sometimes.html'>java - onPostExecute is only sometimes called in A...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/breaking-bad-why-is-walter-jr-being.html'>breaking bad - Why is Walter Jr. being called "Fly...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/python-unboundlocalerror-at-inversing.html'>python - UnboundLocalError at inversing a string</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/what-is-meant-by-ems-android-textview_29.html'>What is meant by Ems? (Android TextView)</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/interleave-lists-in-r.html'>Interleave lists in R</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-moveuploadedfile-wont-move-file-to.html'>php move_uploaded_file wont move the file to the h...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-using-user-supplied-database.html'>php - Using user-supplied database credentials acr...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-is-text-considered-node-too.html'>javascript - Is text considered a node too in the ...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/using-sql-server-2008-r2-express-with-c.html'>Using SQL Server 2008 R2 Express with C# Express 2010</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-uncaught-error-call-to-undefined.html'>php - Uncaught Error: Call to undefined function m...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/passing-2d-array-to-c-function.html'>Passing a 2D array to a C++ function</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/convert-associative-array-to-simple.html'>Convert an associative array to a simple array of ...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-why-do-i-get-sql-error-when.html'>php - Why do I get a SQL error when preparing a st...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/redirect-from-html-page.html'>Redirect from an HTML page</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/android-how-to-get-device-uuid-without.html'>android - How to get device UUID without permission</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/what-is-meant-by-ems-android-textview.html'>What is meant by Ems? (Android TextView)</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-instantiate-new-object-from-variable.html'>php - Instantiate new object from variable</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-securityexception-1000-even.html'>javascript - SecurityException 1000, even though u...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/how-do-i-declare-namespace-in-javascript_28.html'>How do I declare a namespace in JavaScript?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/java-what-is-this-date-format-2011-08.html'>java - What is this date format? 2011-08-12T20:17:...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/css-selectors-difference-between-and.html'>CSS Selectors - difference between and when to use...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/css-transitions-with-jquery-not-working.html'>CSS Transitions with jquery not working</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/best-way-to-find-if-item-is-in.html'>Best way to find if an item is in a JavaScript array?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-html5-local-storage-fallback.html'>javascript - HTML5 Local Storage fallback solutions</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-how-do-i-use-wmain-entry-point-in.html'>c++ - How do I use the wmain() entry point in Code...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/shell-how-do-i-split-string-on.html'>shell - How do I split a string on a delimiter in ...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/debugging-how-can-i-get-useful-error.html'>debugging - How can I get useful error messages in...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/regex-regular-expression-for-remove.html'>regex - Regular expression for remove html links</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/performance-when-to-use-couchdb-over.html'>performance - When to use CouchDB over MongoDB and...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-want-to-get-all-values-of-checked.html'>php - Want to get all values of checked checkbox u...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-sql-injection-that-gets-around.html'>php - SQL injection that gets around mysql_real_es...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/html-list-tag-not-working-in-android.html'>Html List tag not working in android textview. wha...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-mysql-get-hack-prevention.html'>PHP MySQL $_GET Hack prevention</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/r-how-to-achieve-hand-drawn-pencil-fill.html'>r - how to achieve a hand-drawn pencil fill in ggp...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-how-to-send-html-in-attachment.html'>c# - How to send html in attachment?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-phpass-producing-warning-isreadable.html'>php - PHPass producing warning: is_readable() [fun...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/can-this-c-vector-initialization-cause.html'>Can this c++ vector initialization cause memory leak?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-what-way-is-best-way-to-hash.html'>php - What way is the best way to hash a password?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/how-to-can-apply-multithreading-for-for.html'>How to can apply multithreading for a for loop in ...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-good-cryptographic-hash-functions.html'>php - Good cryptographic hash functions</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/is-there-any-advantage-of-using.html'>Is there any advantage of using references instead...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/how-to-deal-with-floating-point-number.html'>How to deal with floating point number precision i...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/wordpress-how-to-echo-taxonomy-tags-in.html'>wordpress - How to echo taxonomy tags in the wp_dr...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/generate-random-number-between-2.html'>generate random number between 2 variables jquery</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-how-does-free-know-size-of-memory-to.html'>c - how does free know the size of memory to be fr...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/stdstring-vs-string-in-c.html'>std::string vs string in c++</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-sorting-array-of-objects-by.html'>javascript sorting array of objects by string prop...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-implement-promises-pattern.html'>javascript - Implement promises pattern</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-how-to-check-if-jquery.html'>javascript - How to check if jQuery object exist i...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-strange-with-nodejsjs-in.html'>javascript - Strange with nodejs/js in using "this...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/how-does-python-super-work-with.html'>How does Python's super() work with multiple inher...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-phpspec-catching-typeerror-in-php7.html'>php - PHPSpec Catching TypeError in PHP7</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-file-name-or-path-doesn-exist-or-used.html'>c# - The file name or path doesn't exist or used b...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/java-jframe-class-not-working-in-main.html'>java - JFrame class not working in Main</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-flood-of-unresolved-external-symbol.html'>c++ - flood of unresolved external symbol errors</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-while-loop-doesn-seem-to-finish-after.html'>c - While loop doesn't seem to finish after EOF</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-undefined-reference-to-classfunction.html'>c++ - undefined reference to CLASS::function()</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/jquery-cannot-read-property-of.html'>jquery - "TypeError: Cannot read property 'setStat...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/how-to-generate-random-five-digit.html'>How to generate a random five digit number Java</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/plot-explanation-in-kane-does-bernstein.html'>plot explanation - In "Citizen Kane" does Bernstei...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-structure-initialization.html'>C++ Structure Initialization</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/java-is-there-any-performance.html'>java - Is there any performance difference between...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-mysqlfetcharraymysqlfetchassocmysql.html'>php -
mysql_fetch_array()/mysql_fetch_assoc()/mysq...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-actionscript-does-not-see-changes.html'>php - actionscript does not see changes to the tex...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/zend-framework-requireonce-gives-php.html'>zend framework - Require_Once gives PHP Division B...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/css-how-to-style-placeholder-attribute.html'>css - How to style placeholder attribute across al...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/java-pass-by-value-reference-variables.html'>Java, pass-by-value, reference variables</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-division-giving-wrong-answer.html'>javascript division giving wrong answer?</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-pass-by-pointer-pass-by-reference.html'>c++ - Pass by pointer & Pass by reference</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/sql-how-to-lowercase-whole-string.html'>sql - How to lowercase the whole string keeping th...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/javascript-js-round-to-2-decimal-places.html'>javascript - JS round to 2 decimal places</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/c-error-lnk2019-unresolved-external.html'>c++ - error LNK2019: unresolved external symbol er...</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/php-mysql-chinese-pinyin-encoding-issue.html'>php - MySQL Chinese pinyin encoding issue</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/cant-connect-my-database-with-php.html'>Cant connect my database with php</a></li>
<li><a href='https://stklowf.blogspot.com/2016/09/java-how-do-i-get-object-from-hashmap.html'>java - How do I get object from HashMap respectively?</a></li>
</ul>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/08/'>
August 2016
</a>
<span class='post-count' dir='ltr'>(369)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/07/'>
July 2016
</a>
<span class='post-count' dir='ltr'>(355)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/06/'>
June 2016
</a>
<span class='post-count' dir='ltr'>(306)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/05/'>
May 2016
</a>
<span class='post-count' dir='ltr'>(305)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/04/'>
April 2016
</a>
<span class='post-count' dir='ltr'>(311)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/03/'>
March 2016
</a>
<span class='post-count' dir='ltr'>(269)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/02/'>
February 2016
</a>
<span class='post-count' dir='ltr'>(145)</span>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2016/01/'>
January 2016
</a>
<span class='post-count' dir='ltr'>(66)</span>
</li>
</ul>
</li>
</ul>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2015/'>
2015
</a>
<span class='post-count' dir='ltr'>(11)</span>
<ul class='hierarchy'>
<li class='archivedate collapsed'>
<a class='toggle' href='javascript:void(0)'>
<span class='zippy'>

        ► 
      
</span>
</a>
<a class='post-count-link' href='https://stklowf.blogspot.com/2015/12/'>
December 2015
</a>
<span class='post-count' dir='ltr'>(11)</span>
</li>
</ul>
</li>
</ul>
</div>
</div>
<div class='clear'></div>
</div>
</div></div>
<table border='0' cellpadding='0' cellspacing='0' class='section-columns columns-2'>
<tbody>
<tr>
<td class='first columns-cell'>
<div class='sidebar no-items section' id='sidebar-right-2-1'></div>
</td>
<td class='columns-cell'>
<div class='sidebar no-items section' id='sidebar-right-2-2'></div>
</td>
</tr>
</tbody>
</table>
<div class='sidebar no-items section' id='sidebar-right-3'></div>
</aside>
</div>
</div>
</div>
<div style='clear: both'></div>
<!-- columns -->
</div>
<!-- main -->
</div>
</div>
<div class='main-cap-bottom cap-bottom'>
<div class='cap-left'></div>
<div class='cap-right'></div>
</div>
</div>
<footer>
<div class='footer-outer'>
<div class='footer-cap-top cap-top'>
<div class='cap-left'></div>
<div class='cap-right'></div>
</div>
<div class='fauxborder-left footer-fauxborder-left'>
<div class='fauxborder-right footer-fauxborder-right'></div>
<div class='region-inner footer-inner'>
<div class='foot no-items section' id='footer-1'></div>
<table border='0' cellpadding='0' cellspacing='0' class='section-columns columns-2'>
<tbody>
<tr>
<td class='first columns-cell'>
<div class='foot no-items section' id='footer-2-1'></div>
</td>
<td class='columns-cell'>
<div class='foot no-items section' id='footer-2-2'></div>
</td>
</tr>
</tbody>
</table>
<!-- outside of the include in order to lock Attribution widget -->
<div class='foot section' id='footer-3' name='Footer'><div class='widget Attribution' data-version='1' id='Attribution1'>
<div class='widget-content' style='text-align: center;'>
Theme images by <a href='http://www.istockphoto.com/file_closeup.php?id=9505737&platform=blogger' target='_blank'>Ollustrator</a>. Powered by <a href='https://www.blogger.com' target='_blank'>Blogger</a>.
</div>
<div class='clear'></div>
</div></div>
</div>
</div>
<div class='footer-cap-bottom cap-bottom'>
<div class='cap-left'></div>
<div class='cap-right'></div>
</div>
</div>
</footer>
<!-- content -->
</div>
</div>
<div class='content-cap-bottom cap-bottom'>
<div class='cap-left'></div>
<div class='cap-right'></div>
</div>
</div>
</div>
<script type='text/javascript'>
    window.setTimeout(function() {
        document.body.className = document.body.className.replace('loading', '');
      }, 10);
  </script>

<script type="text/javascript" src="https://www.blogger.com/static/v1/widgets/457131501-widgets.js"></script>
<script type='text/javascript'>
window['__wavt'] = 'AOuZoY6uoOBdP2fZvsaSXrOCcZGeCT_iPQ:1745892798852';_WidgetManager._Init('//www.blogger.com/rearrange?blogID\x3d8010773932506618868','//stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html','8010773932506618868');
_WidgetManager._SetDataContext([{'name': 'blog', 'data': {'blogId': '8010773932506618868', 'title': 'Blog', 'url': 'https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html', 'canonicalUrl': 'https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html', 'homepageUrl': 'https://stklowf.blogspot.com/', 'searchUrl': 'https://stklowf.blogspot.com/search', 'canonicalHomepageUrl': 'https://stklowf.blogspot.com/', 'blogspotFaviconUrl': 'https://stklowf.blogspot.com/favicon.ico', 'bloggerUrl': 'https://www.blogger.com', 'hasCustomDomain': false, 'httpsEnabled': true, 'enabledCommentProfileImages': true, 'gPlusViewType': 'FILTERED_POSTMOD', 'adultContent': false, 'analyticsAccountNumber': '', 'encoding': 'UTF-8', 'locale': 'en-GB', 'localeUnderscoreDelimited': 'en_gb', 'languageDirection': 'ltr', 'isPrivate': false, 'isMobile': false, 'isMobileRequest': false, 'mobileClass': '', 'isPrivateBlog': false, 'isDynamicViewsAvailable': true, 'feedLinks': '\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Blog - Atom\x22 href\x3d\x22https://stklowf.blogspot.com/feeds/posts/default\x22 /\x3e\n\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/rss+xml\x22 title\x3d\x22Blog - RSS\x22 href\x3d\x22https://stklowf.blogspot.com/feeds/posts/default?alt\x3drss\x22 /\x3e\n\x3clink rel\x3d\x22service.post\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Blog - Atom\x22 href\x3d\x22https://www.blogger.com/feeds/8010773932506618868/posts/default\x22 /\x3e\n\n\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Blog - Atom\x22 href\x3d\x22https://stklowf.blogspot.com/feeds/6139975210327932969/comments/default\x22 /\x3e\n', 'meTag': '', 'adsenseHostId': 'ca-host-pub-1556223355139109', 'adsenseHasAds': true, 'adsenseAutoAds': false, 'boqCommentIframeForm': true, 'loginRedirectParam': '', 'view': '', 'dynamicViewsCommentsSrc': '//www.blogblog.com/dynamicviews/4224c15c4e7c9321/js/comments.js', 'dynamicViewsScriptSrc': '//www.blogblog.com/dynamicviews/dfa3d49a0fed8ecc', 'plusOneApiSrc': 'https://apis.google.com/js/platform.js', 'disableGComments': true, 'interstitialAccepted': false, 'sharing': {'platforms': [{'name': 'Get link', 'key': 'link', 'shareMessage': 'Get link', 'target': ''}, {'name': 'Facebook', 'key': 'facebook', 'shareMessage': 'Share to Facebook', 'target': 'facebook'}, {'name': 'BlogThis!', 'key': 'blogThis', 'shareMessage': 'BlogThis!', 'target': 'blog'}, {'name': 'X', 'key': 'twitter', 'shareMessage': 'Share to X', 'target': 'twitter'}, {'name': 'Pinterest', 'key': 'pinterest', 'shareMessage': 'Share to Pinterest', 'target': 'pinterest'}, {'name': 'Email', 'key': 'email', 'shareMessage': 'Email', 'target': 'email'}], 'disableGooglePlus': true, 'googlePlusShareButtonWidth': 0, 'googlePlusBootstrap': '\x3cscript type\x3d\x22text/javascript\x22\x3ewindow.___gcfg \x3d {\x27lang\x27: \x27en_GB\x27};\x3c/script\x3e'}, 'hasCustomJumpLinkMessage': false, 'jumpLinkMessage': 'Read more', 'pageType': 'item', 'postId': '6139975210327932969', 'postImageUrl': 'imgtag.gif', 'pageName': 'Can you provide some examples of why it is hard to parse XML and HTML\nwith a regex?', 'pageTitle': 'Blog: Can you provide some examples of why it is hard to parse XML and HTML\nwith a regex?'}}, {'name': 'features', 'data': {}}, {'name': 'messages', 'data': {'edit': 'Edit', 'linkCopiedToClipboard': 'Link copied to clipboard', 'ok': 'Ok', 'postLink': 'Post link'}}, {'name': 'template', 'data': {'name': 'custom', 'localizedName': 'Custom', 'isResponsive': false, 'isAlternateRendering': false, 'isCustom': true}}, {'name': 'view', 'data': {'classic': {'name': 'classic', 'url': '?view\x3dclassic'}, 'flipcard': {'name': 'flipcard', 'url': '?view\x3dflipcard'}, 'magazine': {'name': 'magazine', 'url': '?view\x3dmagazine'}, 'mosaic': {'name': 'mosaic', 'url': '?view\x3dmosaic'}, 'sidebar': {'name': 'sidebar', 'url': '?view\x3dsidebar'}, 'snapshot': {'name': 'snapshot', 'url': '?view\x3dsnapshot'}, 'timeslide': {'name': 'timeslide', 'url': '?view\x3dtimeslide'}, 'isMobile': false, 'title': 'Can you provide some examples of why it is hard to parse XML and HTML\nwith a regex?', 'description': '        One mistake I see people making over  and over again  is trying to parse XML or HTML with a regex.  Here are a few of the reasons pa...', 'featuredImage': 'https://lh3.googleusercontent.com/blogger_img_proxy/AEn0k_vAnxn8b91WkHY6p0LV2mDUZMROzhC0A9wD6TJk7j8QmlyKnSWHn3h1r9NA3wGASP_DDg', 'url': 'https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html', 'type': 'item', 'isSingleItem': true, 'isMultipleItems': false, 'isError': false, 'isPage': false, 'isPost': true, 'isHomepage': false, 'isArchive': false, 'isLabelSearch': false, 'postId': 6139975210327932969}}]);
_WidgetManager._RegisterWidget('_HeaderView', new _WidgetInfo('Header1', 'header', document.getElementById('Header1'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_BlogView', new _WidgetInfo('Blog1', 'main', document.getElementById('Blog1'), {'cmtInteractionsEnabled': false, 'lightboxEnabled': true, 'lightboxModuleUrl': 'https://www.blogger.com/static/v1/jsbin/2637434619-lbx__en_gb.js', 'lightboxCssUrl': 'https://www.blogger.com/static/v1/v-css/3681588378-lightbox_bundle.css'}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_FeaturedPostView', new _WidgetInfo('FeaturedPost1', 'main', document.getElementById('FeaturedPost1'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_PopularPostsView', new _WidgetInfo('PopularPosts1', 'main', document.getElementById('PopularPosts1'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_BlogSearchView', new _WidgetInfo('BlogSearch1', 'sidebar-right-1', document.getElementById('BlogSearch1'), {}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_BlogArchiveView', new _WidgetInfo('BlogArchive1', 'sidebar-right-1', document.getElementById('BlogArchive1'), {'languageDirection': 'ltr', 'loadingMessage': 'Loading\x26hellip;'}, 'displayModeFull'));
_WidgetManager._RegisterWidget('_AttributionView', new _WidgetInfo('Attribution1', 'footer-3', document.getElementById('Attribution1'), {}, 'displayModeFull'));
</script>
</body>
</html>

Thursday, 29 September 2016

Can you provide some examples of why it is hard to parse XML and HTML with a regex?