3v4l.org

run code in 300+ PHP versions simultaneously
<?php $hrefPattern = '/<a\\s+[^>]*href=(["\']??)([^"\'>]*?)\\1([^>]*)>(.*)<\/a>/siU'; $html = <<<HTML <p>If you find any cases where this code falls down, let us know using the Feedback link below.</p> <p>Before using this or similar scripts to fetch pages from other websites, we suggest you read through the related article on <a href="/php/parse-robots/">setting a user agent and parsing robots.txt</a>.</p> <h2>First checking robots.txt</h2> <p>As mentioned above, before using a script to download files you should always <a href="/php/parse-robots/">check the robots.txt file</a>. Here we're making use of the <tt>robots_allowed</tt> function from the article linked above to determine whether we're allowed to access files:</p> <code class="final">&lt;?PHP <i>// Original PHP code by Chirp Internet: www.chirp.com.au // Please acknowledge use of this code by including this header.</i> <span> ini_set('user_agent', '<i>NameOfAgent (http://www.example.net)</i>');</span> $url = &quot;http://www.example.net/somepage.html&quot;; <span> if(robots_allowed($url, &quot;<i>NameOfAgent</i>&quot;)) {</span> $input = @file_get_contents($url) or die(&quot;Could not access file: $url&quot;); $regexp = &quot;<tt>&lt;a\s[^&gt;]*href=(\&quot;??)([^\&quot; &gt;]*?)\\1[^&gt;]*&gt;(.*)&lt;\/a&gt;</tt>&quot;; if(preg_match_all(&quot;/$regexp/siU&quot;, $input, $matches, PREG_SET_ORDER)) { foreach($matches as $match) { <i>// $match[2] = link address // $match[3] = link text</i> } } <span> } else { die('Access denied by robots.txt'); }</span> ?&gt;</code> HTML; preg_match_all($hrefPattern, $html, $matches); var_dump($matches);

preferences:
39.48 ms | 402 KiB | 5 Q