3v4l.org

run code in 300+ PHP versions simultaneously
<?php $hrefPattern = '/<a\\s+[^>]*href=(["\']??)([^"\'>]*?)\\1([^>]*)>(.*)<\/a>/siU'; $html = <<<HTML <p>If you find any cases where this code falls down, let us know using the Feedback link below.</p> <p>Before using this or similar scripts to fetch pages from other websites, we suggest you read through the related article on <a href="/php/parse-robots/">setting a user agent and parsing robots.txt</a>.</p> <h2>First checking robots.txt</h2> <p>As mentioned above, before using a script to download files you should always <a href="/php/parse-robots/">check the robots.txt file</a>. Here we're making use of the <tt>robots_allowed</tt> function from the article linked above to determine whether we're allowed to access files:</p> <code class="final">&lt;?PHP <i>// Original PHP code by Chirp Internet: www.chirp.com.au // Please acknowledge use of this code by including this header.</i> <span> ini_set('user_agent', '<i>NameOfAgent (http://www.example.net)</i>');</span> $url = &quot;http://www.example.net/somepage.html&quot;; <span> if(robots_allowed($url, &quot;<i>NameOfAgent</i>&quot;)) {</span> $input = @file_get_contents($url) or die(&quot;Could not access file: $url&quot;); $regexp = &quot;<tt>&lt;a\s[^&gt;]*href=(\&quot;??)([^\&quot; &gt;]*?)\\1[^&gt;]*&gt;(.*)&lt;\/a&gt;</tt>&quot;; if(preg_match_all(&quot;/$regexp/siU&quot;, $input, $matches, PREG_SET_ORDER)) { foreach($matches as $match) { <i>// $match[2] = link address // $match[3] = link text</i> } } <span> } else { die('Access denied by robots.txt'); }</span> ?&gt;</code> HTML; preg_match_all($hrefPattern, $html, $matches); var_dump($matches);
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename:       /in/c3qaN
function name:  (null)
number of ops:  40
compiled vars:  !0 = $hrefPattern, !1 = $html, !2 = $url, !3 = $input, !4 = $regexp, !5 = $matches, !6 = $match
line      #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
    3     0  E >   ASSIGN                                                   !0, '%2F%3Ca%5Cs%2B%5B%5E%3E%5D%2Ahref%3D%28%5B%22%27%5D%3F%3F%29%28%5B%5E%22%27%3E%5D%2A%3F%29%5C1%28%5B%5E%3E%5D%2A%29%3E%28.%2A%29%3C%5C%2Fa%3E%2FsiU'
    6     1        ROPE_INIT                                    27  ~11     '%3Cp%3EIf+you+find+any+cases+where+this+code+falls+down%2C+let+us+know+using%0Athe+Feedback+link+below.%3C%2Fp%3E%0A%0A%3Cp%3EBefore+using+this+or+similar+scripts+to+fetch+pages+from+other%0Awebsites%2C+we+suggest+you+read+through+the+related+article+on+%3Ca+href%3D%22%2Fphp%2Fparse-robots%2F%22%3Esetting+a+user+agent+and+parsing+robots.txt%3C%2Fa%3E.%3C%2Fp%3E%0A%0A%0A%3Ch2%3EFirst+checking+robots.txt%3C%2Fh2%3E%0A%0A%3Cp%3EAs+mentioned+above%2C+before+using+a+script+to+download+files+you+%0Ashould+always+%3Ca+href%3D%22%2Fphp%2Fparse-robots%2F%22%3Echeck+the+robots.txt+%0Afile%3C%2Fa%3E.++Here+we%27re+making+use+of+the+%3Ctt%3Erobots_allowed%3C%2Ftt%3E+function+%0Afrom+the+article+linked+above+to+determine+whether+we%27re+allowed+to+%0Aaccess+files%3A%3C%2Fp%3E%0A%0A%3Ccode+class%3D%22final%22%3E%26lt%3B%3FPHP%0A++%3Ci%3E%2F%2F+Original+PHP+code+by+Chirp+Internet%3A+www.chirp.com.au%0A++%2F%2F+Please+acknowledge+use+of+this+code+by+including+this+header.%3C%2Fi%3E%0A%0A%3Cspan%3E++ini_set%28%27user_agent%27%2C+%27%3Ci%3ENameOfAgent+%28http%3A%2F%2Fwww.example.net%29%3C%2Fi%3E%27%29%3B%3C%2Fspan%3E%0A%0A++'
   27     2        ROPE_ADD                                      1  ~11     ~11, !2
          3        ROPE_ADD                                      2  ~11     ~11, '+%3D+%26quot%3Bhttp%3A%2F%2Fwww.example.net%2Fsomepage.html%26quot%3B%3B%0A%3Cspan%3E++if%28robots_allowed%28'
   28     4        ROPE_ADD                                      3  ~11     ~11, !2
          5        ROPE_ADD                                      4  ~11     ~11, '%2C+%26quot%3B%3Ci%3ENameOfAgent%3C%2Fi%3E%26quot%3B%29%29+%7B%3C%2Fspan%3E%0A++++'
   29     6        ROPE_ADD                                      5  ~11     ~11, !3
          7        ROPE_ADD                                      6  ~11     ~11, '+%3D+%40file_get_contents%28'
          8        ROPE_ADD                                      7  ~11     ~11, !2
          9        ROPE_ADD                                      8  ~11     ~11, '%29+or+die%28%26quot%3BCould+not+access+file%3A+'
         10        ROPE_ADD                                      9  ~11     ~11, !2
         11        ROPE_ADD                                     10  ~11     ~11, '%26quot%3B%29%3B%0A++++'
   30    12        ROPE_ADD                                     11  ~11     ~11, !4
         13        ROPE_ADD                                     12  ~11     ~11, '+%3D+%26quot%3B%3Ctt%3E%26lt%3Ba%5Cs%5B%5E%26gt%3B%5D%2Ahref%3D%28%5C%26quot%3B%3F%3F%29%28%5B%5E%5C%26quot%3B+%26gt%3B%5D%2A%3F%29%5C1%5B%5E%26gt%3B%5D%2A%26gt%3B%28.%2A%29%26lt%3B%5C%2Fa%26gt%3B%3C%2Ftt%3E%26quot%3B%3B%0A++++if%28preg_match_all%28%26quot%3B%2F'
   31    14        ROPE_ADD                                     13  ~11     ~11, !4
         15        ROPE_ADD                                     14  ~11     ~11, '%2FsiU%26quot%3B%2C+'
         16        ROPE_ADD                                     15  ~11     ~11, !3
         17        ROPE_ADD                                     16  ~11     ~11, '%2C+'
         18        ROPE_ADD                                     17  ~11     ~11, !5
         19        ROPE_ADD                                     18  ~11     ~11, '%2C+PREG_SET_ORDER%29%29+%7B%0A++++++foreach%28'
   32    20        ROPE_ADD                                     19  ~11     ~11, !5
         21        ROPE_ADD                                     20  ~11     ~11, '+as+'
         22        ROPE_ADD                                     21  ~11     ~11, !6
         23        ROPE_ADD                                     22  ~11     ~11, '%29+%7B%0A++++++++%3Ci%3E%2F%2F+'
   33    24        FETCH_DIM_R                                      ~8      !6, 2
         25        ROPE_ADD                                     23  ~11     ~11, ~8
         26        ROPE_ADD                                     24  ~11     ~11, '+%3D+link+address%0A++++++++%2F%2F+'
   34    27        FETCH_DIM_R                                      ~9      !6, 3
         28        ROPE_ADD                                     25  ~11     ~11, ~9
         29        ROPE_END                                     26  ~10     ~11, '+%3D+link+text%3C%2Fi%3E%0A++++++%7D%0A++++%7D%0A%3Cspan%3E++%7D+else+%7B%0A++++die%28%27Access+denied+by+robots.txt%27%29%3B%0A++%7D%3C%2Fspan%3E%0A%3F%26gt%3B%3C%2Fcode%3E'
    5    30        ASSIGN                                                   !1, ~10
   43    31        INIT_FCALL                                               'preg_match_all'
         32        SEND_VAR                                                 !0
         33        SEND_VAR                                                 !1
         34        SEND_REF                                                 !5
         35        DO_ICALL                                                 
   45    36        INIT_FCALL                                               'var_dump'
         37        SEND_VAR                                                 !5
         38        DO_ICALL                                                 
         39      > RETURN                                                   1

Generated using Vulcan Logic Dumper, using php 8.0.0


preferences:
153.09 ms | 1400 KiB | 17 Q