Tuesday 31 May 2016

Finding non-Ascii character








I'm struggling trying to find an answer to how I can find a non-ascii character in a very large file of xml data. I do not want to convert the non-ascii characters, I just want to identify where in the data file the character is located so I can inform the source to remove the value. The non-ascii data (seems to be a single character) is causing my processing program to fail. Unfortunately the error data does not help me determine where in the file the offending character is located. This XML data file contains data records, and it is most likely in a description field or name field.



I have tried using text tools, but it is such a large file (>32MB) of text that it is overwhelming. Is there a way to run a REGEX to find any character outside the 7-BIT ASCII character set in a tool like PSPad or TextPad?

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...