[ACCEPTED]-Get second element text with XPath?-lxml
I tried this but it doesn't work.
t = item.findtext('.//span[@class="python"]//a[2]')
This is a FAQ about the //
abbreviation.
.//a[2]
means: Select 11 all a
descendents of the current node that 10 are the second a
child of their parent. So 9 this may select more than one element or 8 no element -- depending on the concrete 7 XML document.
To put it more simply, the 6 []
operator has higher precedence than //
.
If 5 you want just one (the second) of all nodes 4 returned you have to use brackets to force 3 your wanted precedence:
(.//a)[2]
This really selects 2 the second a
descendent of the current node.
For the actual expression used in the question, change it to:
(.//span[@class="python"]//a)[2]
or 1 change it to:
(.//span[@class="python"]//a)[2]/text()
I'm not sure what the problem is...
>>> d = """<span class='python'>
... <a>google</a>
... <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>
0
From Comments:
or the simplification of the actual HTML 6 I posted is too simple
You are right. What 5 is the meaning of .//span[@class="python"]//a[2]
? This will be expanded 4 to:
self::node()
/descendant-or-self::node()
/child::span[attribute::class="python"]
/descendant-or-self::node()
/child::a[position()=2]
It will finaly select the second a
child 3 (fn:position()
refers to the child
axe). So, nothing will 2 be select if your document is like:
<span class='python'>
<span>
<span>
<img></img>
<a>google</a><!-- This is the first "a" child of its parent -->
</span>
<a>chrome</a><!-- This is also the first "a" child of its parent -->
</span>
</span>
If you 1 want the second of all descendants, use:
descendant::span[@class="python"]/descendant::a[2]
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.