python - finding elements by attribute with lxml

Friday, 29 July 2016

python - finding elements by attribute with lxml

I need to parse a xml file to extract some data.
I only need some elements with certain attributes, here's an example of document:

Here I would like to get only the article with the type "news".

What's the most efficient and elegant way to do it with lxml?

I tried with the find method but it's not very nice:

from lxml import etree
f = etree.parse("myfile")
root = f.getroot()
articles = root.getchildren()[0]
article_list = articles.findall('article')
for article in article_list:

    if "type" in article.keys():
        if article.attrib['type'] == 'news':
            content = article.find('content')
            content = content.text

Answer

You can use xpath, e.g. root.xpath("//article[@type='news']")

This xpath expression will return a list of all

elements with "type" attributes with value "news". You can then iterate over it to do what you want, or pass it wherever.

To get just the text content, you can extend the xpath like so:

root = etree.fromstring("""

    
        
             some text
        

        
             some text

        

        
             some text
        

    

""")

print root.xpath("//article[@type='news']/content/text()")

and this will output ['some text', 'some text']. Or if you just wanted the content elements, it would be "//article[@type='news']/content" -- and so on.

Blog

Friday, 29 July 2016

python - finding elements by attribute with lxml

No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?