run code in 150+ php & hhvm versions
Bugs & Features
<?php $hrefPattern = '/<a\\s+[^>]*href=(["\']??)([^"\'>]*?)\\1([^>]*)>(.*)<\/a>/siU'; $html = <<<HTML <p>If you find any cases where this code falls down, let us know using the Feedback link below.</p> <p>Before using this or similar scripts to fetch pages from other websites, we suggest you read through the related article on <a href="/php/parse-robots/">setting a user agent and parsing robots.txt</a>.</p> <h2>First checking robots.txt</h2> <p>As mentioned above, before using a script to download files you should always <a href="/php/parse-robots/">check the robots.txt file</a>. Here we're making use of the <tt>robots_allowed</tt> function from the article linked above to determine whether we're allowed to access files:</p> HTML; preg_match_all($hrefPattern, $html, $matches, PREG_SET_ORDER); var_dump($matches);
Output for 4.3.0 - 5.6.21, hhvm-3.10.0 - 3.12.0, 7.0.0 - 7.1.0RC4
array(2) { [0]=> array(5) { [0]=> string(76) "<a href="/php/parse-robots/">setting a user agent and parsing robots.txt</a>" [1]=> string(1) """ [2]=> string(18) "/php/parse-robots/" [3]=> string(0) "" [4]=> string(43) "setting a user agent and parsing robots.txt" } [1]=> array(5) { [0]=> string(59) "<a href="/php/parse-robots/">check the robots.txt file</a>" [1]=> string(1) """ [2]=> string(18) "/php/parse-robots/" [3]=> string(0) "" [4]=> string(26) "check the robots.txt file" } }