[ACCEPTED]-Get text between HTML tags-preg-match

Accepted answer
Score: 24

Don't parse HTML via preg_match, use this PHP class 1 instead:

The DOMDocument class

Example:

<?php 

$html= "<p>hi</p>
<h1>H1 title</h1>
<h2>H2 title</h2>
<h3>H2 title</h3>";
 // a new dom object 
 $dom = new domDocument('1.0', 'utf-8'); 
 // load the html into the object 
 $dom->loadHTML($html); 
 //discard white space 
 $dom->preserveWhiteSpace = false; 
 $hTwo= $dom->getElementsByTagName('h2'); // here u use your desired tag
 echo $hTwo->item(0)->nodeValue; 
 //will return "H2 title";
 ?>

Reference

Score: 12

Using regular expressions is generally a 12 good idea for your problem.

When you look 11 at http://php.net/preg_match you see that $matches will be an array, since 10 there may be more than one match. Try

print_r($matches);

to 9 get an idea of how the result looks, and 8 then pick the right index.

EDIT:

If there 7 is a match, then you can get the text extracted 6 between the parenthesis-group with

print($matches[1]);

If you 5 had more than one parenthesis-group they 4 would be numbered 2, 3 etc. You should also 3 consider the case when there is no match, in 2 which case the array will have the size 1 of 0.

Score: 2

You could do it this way::

$h1 = preg_replace('/<h1[^>]*?>([\\s\\S]*?)<\/h1>/',
'\\1', $h1);

This will Strip 1 off or unwrap the TEXT from the <H1></H1> HTML Tags

More Related questions