Thursday, 29 September 2016

Can you provide some examples of why it is hard to parse XML and HTML with a regex?




One mistake I see people making over and over again is trying to parse XML or HTML with a regex. Here are a few of the reasons parsing XML and HTML is hard:



People want to treat a file as a sequence of lines, but this is valid:



attr="5"
/>


People want to treat < or

<img>


People often want to match starting tags to ending tags, but XML and HTML allow tags to contain themselves (which traditional regexes cannot handle at all):



foo 


People often want to match against the content of a document (such as the famous "find all phone numbers on a given page" problem), but the data may be marked up (even if it appears to be normal when viewed):



(703)
348-3020



Comments may contain poorly formatted or incomplete tags:



foo

bar


What other gotchas are you aware of?


Answer



Here's some fun valid XML for you:



b"> ]>


b
b
d



And this little bundle of joy is valid HTML:



    
">
]>
x



&

< -->
&e link




Not to mention all the browser-specific parsing for invalid constructs.



Good luck pitting regex against that!



EDIT (Jörg W Mittag): Here is another nice piece of well-formed, valid HTML 4.01:



  "http://www.w3.org/TR/html4/strict.dtd"> 
/<br/> <p/><br/></code></pre><br/> </div> <div style='clear: both;'></div> </div> <div class='post-footer'> <div class='post-footer-line post-footer-line-1'> <span class='post-author vcard'> </span> <span class='post-timestamp'> - <meta content='https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html' itemprop='url'/> <a class='timestamp-link' href='https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html' rel='bookmark' title='permanent link'><abbr class='published' itemprop='datePublished' title='2016-09-29T13:27:00-07:00'>September 29, 2016</abbr></a> </span> <span class='post-comment-link'> </span> <span class='post-icons'> <span class='item-control blog-admin pid-1083048888'> <a href='https://www.blogger.com/post-edit.g?blogID=8010773932506618868&postID=6139975210327932969&from=pencil' title='Edit Post'> <img alt='' class='icon-action' height='18' src='https://resources.blogblog.com/img/icon18_edit_allbkg.gif' width='18'/> </a> </span> </span> <div class='post-share-buttons goog-inline-block'> <a class='goog-inline-block share-button sb-email' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=email' target='_blank' title='Email This'><span class='share-button-link-text'>Email This</span></a><a class='goog-inline-block share-button sb-blog' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=blog' onclick='window.open(this.href, "_blank", "height=270,width=475"); return false;' target='_blank' title='BlogThis!'><span class='share-button-link-text'>BlogThis!</span></a><a class='goog-inline-block share-button sb-twitter' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=twitter' target='_blank' title='Share to X'><span class='share-button-link-text'>Share to X</span></a><a class='goog-inline-block share-button sb-facebook' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=facebook' onclick='window.open(this.href, "_blank", "height=430,width=640"); return false;' target='_blank' title='Share to Facebook'><span class='share-button-link-text'>Share to Facebook</span></a><a class='goog-inline-block share-button sb-pinterest' href='https://www.blogger.com/share-post.g?blogID=8010773932506618868&postID=6139975210327932969&target=pinterest' target='_blank' title='Share to Pinterest'><span class='share-button-link-text'>Share to Pinterest</span></a> </div> </div> <div class='post-footer-line post-footer-line-2'> <span class='post-labels'> </span> </div> <div class='post-footer-line post-footer-line-3'> <span class='post-location'> </span> </div> </div> </div> <div class='comments' id='comments'> <a name='comments'></a> <h4>No comments:</h4> <div id='Blog1_comments-block-wrapper'> <dl class='avatar-comment-indent' id='comments-block'> </dl> </div> <p class='comment-footer'> <div class='comment-form'> <a name='comment-form'></a> <h4 id='comment-post-message'>Post a Comment</h4> <p> </p> <a href='https://www.blogger.com/comment/frame/8010773932506618868?po=6139975210327932969&hl=en-GB' id='comment-editor-src'></a> <iframe allowtransparency='true' class='blogger-iframe-colorize blogger-comment-from-post' frameborder='0' height='410px' id='comment-editor' name='comment-editor' src='' width='100%'></iframe> <script src='https://www.blogger.com/static/v1/jsbin/2315299244-comment_from_post_iframe.js' type='text/javascript'></script> <script type='text/javascript'> BLOG_CMT_createIframe('https://www.blogger.com/rpc_relay.html'); </script> </div> </p> </div> </div> </div></div> </div> <div class='blog-pager' id='blog-pager'> <span id='blog-pager-newer-link'> <a class='blog-pager-newer-link' href='https://stklowf.blogspot.com/2016/09/c-how-would-i-run-async-task-method.html' id='Blog1_blog-pager-newer-link' title='Newer Post'>Newer Post</a> </span> <span id='blog-pager-older-link'> <a class='blog-pager-older-link' href='https://stklowf.blogspot.com/2016/09/how-to-get-get-query-string-variables.html' id='Blog1_blog-pager-older-link' title='Older Post'>Older Post</a> </span> <a class='home-link' href='https://stklowf.blogspot.com/'>Home</a> </div> <div class='clear'></div> <div class='post-feeds'> <div class='feed-links'> Subscribe to: <a class='feed-link' href='https://stklowf.blogspot.com/feeds/6139975210327932969/comments/default' target='_blank' type='application/atom+xml'>Post Comments (Atom)</a> </div> </div> </div><div class='widget FeaturedPost' data-version='1' id='FeaturedPost1'> <div class='post-summary'> <h3><a href='https://stklowf.blogspot.com/2017/06/c-does-curly-brackets-matter-for-empty_20.html'>c++ - Does curly brackets matter for empty constructor?</a></h3> <p> Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t... </p> </div> <style type='text/css'> .image { width: 100%; } </style> <div class='clear'></div> </div><div class='widget PopularPosts' data-version='1' id='PopularPosts1'> <div class='widget-content popular-posts'> <ul> <li> <div class='item-content'> <div class='item-title'><a href='https://stklowf.blogspot.com/2016/11/analysis-were-parts-of-dark-knight.html'>analysis - Were parts of The Dark Knight Rises a commentary on the Occupy movement? - Movies & TV</a></div> <div class='item-snippet'>A fair amount of the second act of The Dark Knight Rises has a class warfare plotline. This is foreshadowed in the trailers with Selina Ky...</div> </div> <div style='clear: both;'></div> </li> <li> <div class='item-content'> <div class='item-title'><a href='https://stklowf.blogspot.com/2017/03/javascript-create-multidimensional.html'>javascript - Create multidimensional array from string</a></div> <div class='item-snippet'> I want to create an options array from a string. How can i create an array as { width : 100, height : 200 } from a string ...</div> </div> <div style='clear: both;'></div> </li> <li> <div class='item-content'> <div class='item-title'><a href='https://stklowf.blogspot.com/2017/02/c-how-to-fix-body-of-cannot-be-iterator.html'>c# - How to fix "The body of 'display(List)' cannot be an iterator block because 'string' is not an iterator interface type"?</a></div> <div class='item-snippet'>I'm new to Programming. I would like to implement a program with a yield keyword . So That, I have created a new List and ask the user ...</div> </div> <div style='clear: both;'></div> </li> </ul> <div class='clear'></div> </div> </div></div> </div> </div> <div class='column-left-outer'> <div class='column-left-inner'> <aside> </aside> </div> </div> <div class='column-right-outer'> <div class='column-right-inner'> <aside> <div class='sidebar section' id='sidebar-right-1'><div class='widget BlogSearch' data-version='1' id='BlogSearch1'> <h2 class='title'>Search This Blog</h2> <div class='widget-content'> <div id='BlogSearch1_form'> <form action='https://stklowf.blogspot.com/search' class='gsc-search-box' target='_top'> <table cellpadding='0' cellspacing='0' class='gsc-search-box'> <tbody> <tr> <td class='gsc-input'> <input autocomplete='off' class='gsc-input' name='q' size='10' title='search' type='text' value=''/> </td> <td class='gsc-search-button'> <input class='gsc-search-button' title='search' type='submit' value='Search'/> </td> </tr> </tbody> </table> </form> </div> </div> <div class='clear'></div> </div><div class='widget BlogArchive' data-version='1' id='BlogArchive1'> <h2>Blog Archive</h2> <div class='widget-content'> <div id='ArchiveList'> <div id='BlogArchive1_ArchiveList'> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2017/'> 2017 </a> <span class='post-count' dir='ltr'>(2404)</span> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2017/06/'> June 2017 </a> <span class='post-count' dir='ltr'>(276)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2017/05/'> May 2017 </a> <span class='post-count' dir='ltr'>(434)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2017/04/'> April 2017 </a> <span class='post-count' dir='ltr'>(433)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2017/03/'> March 2017 </a> <span class='post-count' dir='ltr'>(450)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2017/02/'> February 2017 </a> <span class='post-count' dir='ltr'>(379)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2017/01/'> January 2017 </a> <span class='post-count' dir='ltr'>(432)</span> </li> </ul> </li> </ul> <ul class='hierarchy'> <li class='archivedate expanded'> <a class='toggle' href='javascript:void(0)'> <span class='zippy toggle-open'> ▼  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/'> 2016 </a> <span class='post-count' dir='ltr'>(3825)</span> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/12/'> December 2016 </a> <span class='post-count' dir='ltr'>(446)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/11/'> November 2016 </a> <span class='post-count' dir='ltr'>(421)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/10/'> October 2016 </a> <span class='post-count' dir='ltr'>(458)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate expanded'> <a class='toggle' href='javascript:void(0)'> <span class='zippy toggle-open'> ▼  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/09/'> September 2016 </a> <span class='post-count' dir='ltr'>(374)</span> <ul class='posts'> <li><a href='https://stklowf.blogspot.com/2016/09/get-int-value-from-enum-in-c.html'>Get int value from enum in C#</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/r-faq-how-to-make-great-r-reproducible.html'>r faq - How to make a great R reproducible example</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-typeerror-is-not-function.html'>javascript - TypeError: "this..." is not a function</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/iterating-javascript-object-properties.html'>Iterating a JavaScript object's properties using j...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-how-to-start-programming-from-scratch.html'>c# - How to start programming from scratch?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/how-to-set-limits-for-axes-in-ggplot2-r.html'>How to set limits for axes in ggplot2 R plots?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/assembly-how-does-division-by-constant.html'>assembly - How does division by constant work in a...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-where-and-why-do-i-have-to-put-and.html'>c++ - Where and why do I have to put the "template...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/security-best-practices-salting.html'>security - Best Practices: Salting & peppering pas...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/how-to-get-today-date-in-java-in.html'>How to get today's Date in java in the following p...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/android-how-to-change-font-on-textview.html'>android - How to change the font on the TextView?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/function-what-is-scope-of-variables-in.html'>function - What is the scope of variables in JavaS...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/unit-testing-what-is-mocking.html'>unit testing - What is Mocking?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/plot-explanation-why-did-grandfather.html'>plot explanation - Why did Grandfather insist on A...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/a-quick-and-easy-way-to-join-array.html'>A quick and easy way to join array elements with a...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/angular-what-is-difference-between.html'>angular - What is the difference between Promises ...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/please-explain-use-of-javascript.html'>Please explain the use of JavaScript closures in l...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-fatal-error-allowed-memory-size-of.html'>php - Fatal Error: Allowed Memory Size of 13421772...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/vba-deleting-duplicate-copy-of-chart.html'>VBA deleting a duplicate copy of chart object fail...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/floating-point-general-way-of-comparing.html'>floating point - General way of comparing numerics...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-parse-error-syntax-error-unexpected.html'>php - Parse error: syntax error, unexpected 'endif...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/excel-select-multiple-ranges-with-vba.html'>excel - Select multiple ranges with VBA</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-how-would-i-run-async-task-method.html'>c# - How would I run an async Task method synchron...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html'>Can you provide some examples of why it is hard to...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/how-to-get-get-query-string-variables.html'>How to get GET (query string) variables in Express...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/java-onpostexecute-is-only-sometimes.html'>java - onPostExecute is only sometimes called in A...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/breaking-bad-why-is-walter-jr-being.html'>breaking bad - Why is Walter Jr. being called "Fly...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/python-unboundlocalerror-at-inversing.html'>python - UnboundLocalError at inversing a string</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/what-is-meant-by-ems-android-textview_29.html'>What is meant by Ems? (Android TextView)</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/interleave-lists-in-r.html'>Interleave lists in R</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-moveuploadedfile-wont-move-file-to.html'>php move_uploaded_file wont move the file to the h...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-using-user-supplied-database.html'>php - Using user-supplied database credentials acr...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-is-text-considered-node-too.html'>javascript - Is text considered a node too in the ...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/using-sql-server-2008-r2-express-with-c.html'>Using SQL Server 2008 R2 Express with C# Express 2010</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-uncaught-error-call-to-undefined.html'>php - Uncaught Error: Call to undefined function m...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/passing-2d-array-to-c-function.html'>Passing a 2D array to a C++ function</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/convert-associative-array-to-simple.html'>Convert an associative array to a simple array of ...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-why-do-i-get-sql-error-when.html'>php - Why do I get a SQL error when preparing a st...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/redirect-from-html-page.html'>Redirect from an HTML page</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/android-how-to-get-device-uuid-without.html'>android - How to get device UUID without permission</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/what-is-meant-by-ems-android-textview.html'>What is meant by Ems? (Android TextView)</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-instantiate-new-object-from-variable.html'>php - Instantiate new object from variable</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-securityexception-1000-even.html'>javascript - SecurityException 1000, even though u...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/how-do-i-declare-namespace-in-javascript_28.html'>How do I declare a namespace in JavaScript?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/java-what-is-this-date-format-2011-08.html'>java - What is this date format? 2011-08-12T20:17:...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/css-selectors-difference-between-and.html'>CSS Selectors - difference between and when to use...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/css-transitions-with-jquery-not-working.html'>CSS Transitions with jquery not working</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/best-way-to-find-if-item-is-in.html'>Best way to find if an item is in a JavaScript array?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-html5-local-storage-fallback.html'>javascript - HTML5 Local Storage fallback solutions</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-how-do-i-use-wmain-entry-point-in.html'>c++ - How do I use the wmain() entry point in Code...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/shell-how-do-i-split-string-on.html'>shell - How do I split a string on a delimiter in ...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/debugging-how-can-i-get-useful-error.html'>debugging - How can I get useful error messages in...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/regex-regular-expression-for-remove.html'>regex - Regular expression for remove html links</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/performance-when-to-use-couchdb-over.html'>performance - When to use CouchDB over MongoDB and...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-want-to-get-all-values-of-checked.html'>php - Want to get all values of checked checkbox u...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-sql-injection-that-gets-around.html'>php - SQL injection that gets around mysql_real_es...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/html-list-tag-not-working-in-android.html'>Html List tag not working in android textview. wha...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-mysql-get-hack-prevention.html'>PHP MySQL $_GET Hack prevention</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/r-how-to-achieve-hand-drawn-pencil-fill.html'>r - how to achieve a hand-drawn pencil fill in ggp...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-how-to-send-html-in-attachment.html'>c# - How to send html in attachment?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-phpass-producing-warning-isreadable.html'>php - PHPass producing warning: is_readable() [fun...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/can-this-c-vector-initialization-cause.html'>Can this c++ vector initialization cause memory leak?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-what-way-is-best-way-to-hash.html'>php - What way is the best way to hash a password?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/how-to-can-apply-multithreading-for-for.html'>How to can apply multithreading for a for loop in ...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-good-cryptographic-hash-functions.html'>php - Good cryptographic hash functions</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/is-there-any-advantage-of-using.html'>Is there any advantage of using references instead...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/how-to-deal-with-floating-point-number.html'>How to deal with floating point number precision i...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/wordpress-how-to-echo-taxonomy-tags-in.html'>wordpress - How to echo taxonomy tags in the wp_dr...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/generate-random-number-between-2.html'>generate random number between 2 variables jquery</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-how-does-free-know-size-of-memory-to.html'>c - how does free know the size of memory to be fr...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/stdstring-vs-string-in-c.html'>std::string vs string in c++</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-sorting-array-of-objects-by.html'>javascript sorting array of objects by string prop...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-implement-promises-pattern.html'>javascript - Implement promises pattern</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-how-to-check-if-jquery.html'>javascript - How to check if jQuery object exist i...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-strange-with-nodejsjs-in.html'>javascript - Strange with nodejs/js in using "this...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/how-does-python-super-work-with.html'>How does Python's super() work with multiple inher...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-phpspec-catching-typeerror-in-php7.html'>php - PHPSpec Catching TypeError in PHP7</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-file-name-or-path-doesn-exist-or-used.html'>c# - The file name or path doesn't exist or used b...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/java-jframe-class-not-working-in-main.html'>java - JFrame class not working in Main</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-flood-of-unresolved-external-symbol.html'>c++ - flood of unresolved external symbol errors</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-while-loop-doesn-seem-to-finish-after.html'>c - While loop doesn't seem to finish after EOF</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-undefined-reference-to-classfunction.html'>c++ - undefined reference to CLASS::function()</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/jquery-cannot-read-property-of.html'>jquery - "TypeError: Cannot read property 'setStat...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/how-to-generate-random-five-digit.html'>How to generate a random five digit number Java</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/plot-explanation-in-kane-does-bernstein.html'>plot explanation - In "Citizen Kane" does Bernstei...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-structure-initialization.html'>C++ Structure Initialization</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/java-is-there-any-performance.html'>java - Is there any performance difference between...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-mysqlfetcharraymysqlfetchassocmysql.html'>php - mysql_fetch_array()/mysql_fetch_assoc()/mysq...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-actionscript-does-not-see-changes.html'>php - actionscript does not see changes to the tex...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/zend-framework-requireonce-gives-php.html'>zend framework - Require_Once gives PHP Division B...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/css-how-to-style-placeholder-attribute.html'>css - How to style placeholder attribute across al...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/java-pass-by-value-reference-variables.html'>Java, pass-by-value, reference variables</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-division-giving-wrong-answer.html'>javascript division giving wrong answer?</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-pass-by-pointer-pass-by-reference.html'>c++ - Pass by pointer & Pass by reference</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/sql-how-to-lowercase-whole-string.html'>sql - How to lowercase the whole string keeping th...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/javascript-js-round-to-2-decimal-places.html'>javascript - JS round to 2 decimal places</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/c-error-lnk2019-unresolved-external.html'>c++ - error LNK2019: unresolved external symbol er...</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/php-mysql-chinese-pinyin-encoding-issue.html'>php - MySQL Chinese pinyin encoding issue</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/cant-connect-my-database-with-php.html'>Cant connect my database with php</a></li> <li><a href='https://stklowf.blogspot.com/2016/09/java-how-do-i-get-object-from-hashmap.html'>java - How do I get object from HashMap respectively?</a></li> </ul> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/08/'> August 2016 </a> <span class='post-count' dir='ltr'>(369)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/07/'> July 2016 </a> <span class='post-count' dir='ltr'>(355)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/06/'> June 2016 </a> <span class='post-count' dir='ltr'>(306)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/05/'> May 2016 </a> <span class='post-count' dir='ltr'>(305)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/04/'> April 2016 </a> <span class='post-count' dir='ltr'>(311)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/03/'> March 2016 </a> <span class='post-count' dir='ltr'>(269)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/02/'> February 2016 </a> <span class='post-count' dir='ltr'>(145)</span> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2016/01/'> January 2016 </a> <span class='post-count' dir='ltr'>(66)</span> </li> </ul> </li> </ul> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2015/'> 2015 </a> <span class='post-count' dir='ltr'>(11)</span> <ul class='hierarchy'> <li class='archivedate collapsed'> <a class='toggle' href='javascript:void(0)'> <span class='zippy'> ►  </span> </a> <a class='post-count-link' href='https://stklowf.blogspot.com/2015/12/'> December 2015 </a> <span class='post-count' dir='ltr'>(11)</span> </li> </ul> </li> </ul> </div> </div> <div class='clear'></div> </div> </div></div> <table border='0' cellpadding='0' cellspacing='0' class='section-columns columns-2'> <tbody> <tr> <td class='first columns-cell'> <div class='sidebar no-items section' id='sidebar-right-2-1'></div> </td> <td class='columns-cell'> <div class='sidebar no-items section' id='sidebar-right-2-2'></div> </td> </tr> </tbody> </table> <div class='sidebar no-items section' id='sidebar-right-3'></div> </aside> </div> </div> </div> <div style='clear: both'></div> <!-- columns --> </div> <!-- main --> </div> </div> <div class='main-cap-bottom cap-bottom'> <div class='cap-left'></div> <div class='cap-right'></div> </div> </div> <footer> <div class='footer-outer'> <div class='footer-cap-top cap-top'> <div class='cap-left'></div> <div class='cap-right'></div> </div> <div class='fauxborder-left footer-fauxborder-left'> <div class='fauxborder-right footer-fauxborder-right'></div> <div class='region-inner footer-inner'> <div class='foot no-items section' id='footer-1'></div> <table border='0' cellpadding='0' cellspacing='0' class='section-columns columns-2'> <tbody> <tr> <td class='first columns-cell'> <div class='foot no-items section' id='footer-2-1'></div> </td> <td class='columns-cell'> <div class='foot no-items section' id='footer-2-2'></div> </td> </tr> </tbody> </table> <!-- outside of the include in order to lock Attribution widget --> <div class='foot section' id='footer-3' name='Footer'><div class='widget Attribution' data-version='1' id='Attribution1'> <div class='widget-content' style='text-align: center;'> Theme images by <a href='http://www.istockphoto.com/file_closeup.php?id=9505737&platform=blogger' target='_blank'>Ollustrator</a>. Powered by <a href='https://www.blogger.com' target='_blank'>Blogger</a>. </div> <div class='clear'></div> </div></div> </div> </div> <div class='footer-cap-bottom cap-bottom'> <div class='cap-left'></div> <div class='cap-right'></div> </div> </div> </footer> <!-- content --> </div> </div> <div class='content-cap-bottom cap-bottom'> <div class='cap-left'></div> <div class='cap-right'></div> </div> </div> </div> <script type='text/javascript'> window.setTimeout(function() { document.body.className = document.body.className.replace('loading', ''); }, 10); </script> <script type="text/javascript" src="https://www.blogger.com/static/v1/widgets/745881458-widgets.js"></script> <script type='text/javascript'> window['__wavt'] = 'AOuZoY52Q56ocaS8qhjXVzC6TAvjR2ZXvw:1735062931076';_WidgetManager._Init('//www.blogger.com/rearrange?blogID\x3d8010773932506618868','//stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html','8010773932506618868'); _WidgetManager._SetDataContext([{'name': 'blog', 'data': {'blogId': '8010773932506618868', 'title': 'Blog', 'url': 'https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html', 'canonicalUrl': 'https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html', 'homepageUrl': 'https://stklowf.blogspot.com/', 'searchUrl': 'https://stklowf.blogspot.com/search', 'canonicalHomepageUrl': 'https://stklowf.blogspot.com/', 'blogspotFaviconUrl': 'https://stklowf.blogspot.com/favicon.ico', 'bloggerUrl': 'https://www.blogger.com', 'hasCustomDomain': false, 'httpsEnabled': true, 'enabledCommentProfileImages': true, 'gPlusViewType': 'FILTERED_POSTMOD', 'adultContent': false, 'analyticsAccountNumber': '', 'encoding': 'UTF-8', 'locale': 'en-GB', 'localeUnderscoreDelimited': 'en_gb', 'languageDirection': 'ltr', 'isPrivate': false, 'isMobile': false, 'isMobileRequest': false, 'mobileClass': '', 'isPrivateBlog': false, 'isDynamicViewsAvailable': true, 'feedLinks': '\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Blog - Atom\x22 href\x3d\x22https://stklowf.blogspot.com/feeds/posts/default\x22 /\x3e\n\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/rss+xml\x22 title\x3d\x22Blog - RSS\x22 href\x3d\x22https://stklowf.blogspot.com/feeds/posts/default?alt\x3drss\x22 /\x3e\n\x3clink rel\x3d\x22service.post\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Blog - Atom\x22 href\x3d\x22https://www.blogger.com/feeds/8010773932506618868/posts/default\x22 /\x3e\n\n\x3clink rel\x3d\x22alternate\x22 type\x3d\x22application/atom+xml\x22 title\x3d\x22Blog - Atom\x22 href\x3d\x22https://stklowf.blogspot.com/feeds/6139975210327932969/comments/default\x22 /\x3e\n', 'meTag': '', 'adsenseHostId': 'ca-host-pub-1556223355139109', 'adsenseHasAds': true, 'adsenseAutoAds': false, 'boqCommentIframeForm': true, 'loginRedirectParam': '', 'view': '', 'dynamicViewsCommentsSrc': '//www.blogblog.com/dynamicviews/4224c15c4e7c9321/js/comments.js', 'dynamicViewsScriptSrc': '//www.blogblog.com/dynamicviews/361ce9cf7a112c52', 'plusOneApiSrc': 'https://apis.google.com/js/platform.js', 'disableGComments': true, 'interstitialAccepted': false, 'sharing': {'platforms': [{'name': 'Get link', 'key': 'link', 'shareMessage': 'Get link', 'target': ''}, {'name': 'Facebook', 'key': 'facebook', 'shareMessage': 'Share to Facebook', 'target': 'facebook'}, {'name': 'BlogThis!', 'key': 'blogThis', 'shareMessage': 'BlogThis!', 'target': 'blog'}, {'name': 'X', 'key': 'twitter', 'shareMessage': 'Share to X', 'target': 'twitter'}, {'name': 'Pinterest', 'key': 'pinterest', 'shareMessage': 'Share to Pinterest', 'target': 'pinterest'}, {'name': 'Email', 'key': 'email', 'shareMessage': 'Email', 'target': 'email'}], 'disableGooglePlus': true, 'googlePlusShareButtonWidth': 0, 'googlePlusBootstrap': '\x3cscript type\x3d\x22text/javascript\x22\x3ewindow.___gcfg \x3d {\x27lang\x27: \x27en_GB\x27};\x3c/script\x3e'}, 'hasCustomJumpLinkMessage': false, 'jumpLinkMessage': 'Read more', 'pageType': 'item', 'postId': '6139975210327932969', 'postImageUrl': 'imgtag.gif', 'pageName': 'Can you provide some examples of why it is hard to parse XML and HTML\nwith a regex?', 'pageTitle': 'Blog: Can you provide some examples of why it is hard to parse XML and HTML\nwith a regex?'}}, {'name': 'features', 'data': {}}, {'name': 'messages', 'data': {'edit': 'Edit', 'linkCopiedToClipboard': 'Link copied to clipboard', 'ok': 'Ok', 'postLink': 'Post link'}}, {'name': 'template', 'data': {'name': 'custom', 'localizedName': 'Custom', 'isResponsive': false, 'isAlternateRendering': false, 'isCustom': true}}, {'name': 'view', 'data': {'classic': {'name': 'classic', 'url': '?view\x3dclassic'}, 'flipcard': {'name': 'flipcard', 'url': '?view\x3dflipcard'}, 'magazine': {'name': 'magazine', 'url': '?view\x3dmagazine'}, 'mosaic': {'name': 'mosaic', 'url': '?view\x3dmosaic'}, 'sidebar': {'name': 'sidebar', 'url': '?view\x3dsidebar'}, 'snapshot': {'name': 'snapshot', 'url': '?view\x3dsnapshot'}, 'timeslide': {'name': 'timeslide', 'url': '?view\x3dtimeslide'}, 'isMobile': false, 'title': 'Can you provide some examples of why it is hard to parse XML and HTML\nwith a regex?', 'description': ' One mistake I see people making over and over again is trying to parse XML or HTML with a regex. Here are a few of the reasons pa...', 'featuredImage': 'https://lh3.googleusercontent.com/blogger_img_proxy/AEn0k_uo2AKSx3G_-2TaZJF-Ood-MnjOjL2DREwOw7U4x72lHReGZXGj_Kd-h_FKWwN-BAxPyA', 'url': 'https://stklowf.blogspot.com/2016/09/can-you-provide-some-examples-of-why-it.html', 'type': 'item', 'isSingleItem': true, 'isMultipleItems': false, 'isError': false, 'isPage': false, 'isPost': true, 'isHomepage': false, 'isArchive': false, 'isLabelSearch': false, 'postId': 6139975210327932969}}]); _WidgetManager._RegisterWidget('_HeaderView', new _WidgetInfo('Header1', 'header', document.getElementById('Header1'), {}, 'displayModeFull')); _WidgetManager._RegisterWidget('_BlogView', new _WidgetInfo('Blog1', 'main', document.getElementById('Blog1'), {'cmtInteractionsEnabled': false, 'lightboxEnabled': true, 'lightboxModuleUrl': 'https://www.blogger.com/static/v1/jsbin/2656912462-lbx__en_gb.js', 'lightboxCssUrl': 'https://www.blogger.com/static/v1/v-css/1964470060-lightbox_bundle.css'}, 'displayModeFull')); _WidgetManager._RegisterWidget('_FeaturedPostView', new _WidgetInfo('FeaturedPost1', 'main', document.getElementById('FeaturedPost1'), {}, 'displayModeFull')); _WidgetManager._RegisterWidget('_PopularPostsView', new _WidgetInfo('PopularPosts1', 'main', document.getElementById('PopularPosts1'), {}, 'displayModeFull')); _WidgetManager._RegisterWidget('_BlogSearchView', new _WidgetInfo('BlogSearch1', 'sidebar-right-1', document.getElementById('BlogSearch1'), {}, 'displayModeFull')); _WidgetManager._RegisterWidget('_BlogArchiveView', new _WidgetInfo('BlogArchive1', 'sidebar-right-1', document.getElementById('BlogArchive1'), {'languageDirection': 'ltr', 'loadingMessage': 'Loading\x26hellip;'}, 'displayModeFull')); _WidgetManager._RegisterWidget('_AttributionView', new _WidgetInfo('Attribution1', 'footer-3', document.getElementById('Attribution1'), {}, 'displayModeFull')); </script> </body> </html>