[ACCEPTED]-Get second element text with XPath?-lxml

Accepted answer
Score: 42

I tried this but it doesn't work.

t = item.findtext('.//span[@class="python"]//a[2]')

This is a FAQ about the // abbreviation.

.//a[2] means: Select 11 all a descendents of the current node that 10 are the second a child of their parent. So 9 this may select more than one element or 8 no element -- depending on the concrete 7 XML document.

To put it more simply, the 6 [] operator has higher precedence than //.

If 5 you want just one (the second) of all nodes 4 returned you have to use brackets to force 3 your wanted precedence:

(.//a)[2]

This really selects 2 the second a descendent of the current node.

For the actual expression used in the question, change it to:

(.//span[@class="python"]//a)[2]

or 1 change it to:

(.//span[@class="python"]//a)[2]/text()
Score: 2

I'm not sure what the problem is...

>>> d = """<span class='python'>
...   <a>google</a>
...   <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>

0

Score: 2

From Comments:

or the simplification of the actual HTML 6 I posted is too simple

You are right. What 5 is the meaning of .//span[@class="python"]//a[2]? This will be expanded 4 to:

self::node()
 /descendant-or-self::node()
  /child::span[attribute::class="python"]
   /descendant-or-self::node()
    /child::a[position()=2]

It will finaly select the second a child 3 (fn:position() refers to the child axe). So, nothing will 2 be select if your document is like:

<span class='python'> 
  <span> 
    <span> 
      <img></img> 
      <a>google</a><!-- This is the first "a" child of its parent --> 
    </span> 
    <a>chrome</a><!-- This is also the first "a" child of its parent --> 
  </span> 
</span> 

If you 1 want the second of all descendants, use:

descendant::span[@class="python"]/descendant::a[2]

More Related questions