3v4l.org

run code in 300+ PHP versions simultaneously
<?php function plaintext_from_HTML($HTML_string) { $document_data = new DOMDocument(); $document_data->loadHTML($HTML_string); return go_through_recursively($document_data->getElementsByTagName('body')[0]); } function go_through_recursively(DomNode $node) { global $buffer; if (!isset($buffer)) $buffer = ''; $node_name = $node->nodeName; $parent_node_name = $node->parentNode->nodeName; $text_contents = $node->textContent; // This is some text content, meaning we are inside an element such as a <p> or <h1>. if ($node_name == '#text') { if ($parent_node_name == 'h1') $buffer .= 'H1: ' . $text_contents . "\n\n"; if ($parent_node_name == 'h2') $buffer .= 'H2: ' . $text_contents . "\n\n"; if ($parent_node_name == 'h3') $buffer .= 'H3: ' . $text_contents . "\n\n"; if ($parent_node_name == 'p') $buffer .= $text_contents . "\n\n"; if ($parent_node_name == 'strong') $buffer .= '**' . $text_contents . '**'; if ($parent_node_name == 'em') $buffer .= '*' . $text_contents . '*'; if ($parent_node_name == 'a') $buffer .= $text_contents . ' ( ' . 'this is supposed to be the URL, but I can\'t figure out how to grab the "href"...' . ' )'; } else // It's an actual element. { if ($node_name == 'br') $buffer .= "\n"; if ($node_name == 'hr') $buffer .= '---------------' . "\n" . "\n"; } if ($node->childNodes) { foreach ($node->childNodes as $node) go_through_recursively($node); } return $buffer; } $HTML_string = ' <h1>Test of h1</h1> <p>This is a p test.</p> <h2>Test of h2</h2> <p>This is a p test with a <strong>strong emphasis</strong> followed by this.</p> <h3>Test of h3</h3> <p>This here is a link: <a href="http://www.example.com/1">Example.com</a>.<br> And this is a linebreak.</p> <p>Another paragraph, followed by a horizontal line:</p> <hr> <p>A final paragraph with <a href="http://www.example.com/2"><em>some emphasis</em> inside a link</a>.</p> '; echo plaintext_from_HTML($HTML_string); /* DESIRED/EXPECTED OUTPUT: H1: Test of h1 This is a p test. H2: Test of h2 This is a p test with a **strong emphasis** followed by this. H3: Test of h3 This here is a link: Example.com ( http://www.example.com/1 ). And this is a linebreak. Another paragraph, followed by a horizontal line: --------------- A final paragraph with *some emphasis* inside a link ( http://www.example.com/2 ). ACTUAL OUTPUT: H1: Test of h1 This is a p test. H2: Test of h2 This is a p test with a **strong emphasis** followed by this. H3: Test of h3 This here is a link: Example.com ( this is supposed to be the URL, but I can't figure out how to grab the "href"... ). And this is a linebreak. Another paragraph, followed by a horizontal line: --------------- A final paragraph with *some emphasis* inside a link ( this is supposed to be the URL, but I can't figure out how to grab the "href"... ). */
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename:       /in/KSEjo
function name:  (null)
number of ops:  6
compiled vars:  !0 = $HTML_string
line      #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   63     0  E >   ASSIGN                                                   !0, '%0A%0A%09%3Ch1%3ETest+of+h1%3C%2Fh1%3E%0A%09%3Cp%3EThis+is+a+p+test.%3C%2Fp%3E%0A%0A%09%3Ch2%3ETest+of+h2%3C%2Fh2%3E%0A%09%3Cp%3EThis+is+a+p+test+with+a+%3Cstrong%3Estrong+emphasis%3C%2Fstrong%3E+followed+by+this.%3C%2Fp%3E%0A%0A%09%3Ch3%3ETest+of+h3%3C%2Fh3%3E%0A%09%3Cp%3EThis+here+is+a+link%3A+%3Ca+href%3D%22http%3A%2F%2Fwww.example.com%2F1%22%3EExample.com%3C%2Fa%3E.%3Cbr%3E%0A%09And+this+is+a+linebreak.%3C%2Fp%3E%0A%0A%09%3Cp%3EAnother+paragraph%2C+followed+by+a+horizontal+line%3A%3C%2Fp%3E%0A%0A%09%3Chr%3E%0A%0A%09%3Cp%3EA+final+paragraph+with+%3Ca+href%3D%22http%3A%2F%2Fwww.example.com%2F2%22%3E%3Cem%3Esome+emphasis%3C%2Fem%3E+inside+a+link%3C%2Fa%3E.%3C%2Fp%3E%0A%09'
   83     1        INIT_FCALL                                               'plaintext_from_html'
          2        SEND_VAR                                                 !0
          3        DO_FCALL                                      0  $2      
          4        ECHO                                                     $2
  187     5      > RETURN                                                   1

Function plaintext_from_html:
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename:       /in/KSEjo
function name:  plaintext_from_HTML
number of ops:  18
compiled vars:  !0 = $HTML_string, !1 = $document_data
line      #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
    3     0  E >   RECV                                             !0      
    5     1        NEW                                              $2      'DOMDocument'
          2        DO_FCALL                                      0          
          3        ASSIGN                                                   !1, $2
    6     4        INIT_METHOD_CALL                                         !1, 'loadHTML'
          5        SEND_VAR_EX                                              !0
          6        DO_FCALL                                      0          
    7     7        INIT_FCALL_BY_NAME                                       'go_through_recursively'
          8        CHECK_FUNC_ARG                                           
          9        INIT_METHOD_CALL                                         !1, 'getElementsByTagName'
         10        SEND_VAL_EX                                              'body'
         11        DO_FCALL                                      0  $6      
         12        SEPARATE                                         $6      $6
         13        FETCH_DIM_FUNC_ARG                               $7      $6, 0
         14        SEND_FUNC_ARG                                            $7
         15        DO_FCALL                                      0  $8      
         16      > RETURN                                                   $8
    8    17*     > RETURN                                                   null

End of function plaintext_from_html

Function go_through_recursively:
Finding entry points
Branch analysis from position: 0
2 jumps found. (Code = 43) Position 1 = 5, Position 2 = 6
Branch analysis from position: 5
2 jumps found. (Code = 43) Position 1 = 15, Position 2 = 51
Branch analysis from position: 15
2 jumps found. (Code = 43) Position 1 = 17, Position 2 = 20
Branch analysis from position: 17
2 jumps found. (Code = 43) Position 1 = 22, Position 2 = 25
Branch analysis from position: 22
2 jumps found. (Code = 43) Position 1 = 27, Position 2 = 30
Branch analysis from position: 27
2 jumps found. (Code = 43) Position 1 = 32, Position 2 = 34
Branch analysis from position: 32
2 jumps found. (Code = 43) Position 1 = 36, Position 2 = 39
Branch analysis from position: 36
2 jumps found. (Code = 43) Position 1 = 41, Position 2 = 44
Branch analysis from position: 41
2 jumps found. (Code = 43) Position 1 = 46, Position 2 = 50
Branch analysis from position: 46
1 jumps found. (Code = 42) Position 1 = 57
Branch analysis from position: 57
2 jumps found. (Code = 43) Position 1 = 59, Position 2 = 67
Branch analysis from position: 59
2 jumps found. (Code = 77) Position 1 = 61, Position 2 = 66
Branch analysis from position: 61
2 jumps found. (Code = 78) Position 1 = 62, Position 2 = 66
Branch analysis from position: 62
1 jumps found. (Code = 42) Position 1 = 61
Branch analysis from position: 61
Branch analysis from position: 66
1 jumps found. (Code = 62) Position 1 = -2
Branch analysis from position: 66
Branch analysis from position: 67
Branch analysis from position: 50
Branch analysis from position: 44
Branch analysis from position: 39
Branch analysis from position: 34
Branch analysis from position: 30
Branch analysis from position: 25
Branch analysis from position: 20
Branch analysis from position: 51
2 jumps found. (Code = 43) Position 1 = 53, Position 2 = 54
Branch analysis from position: 53
2 jumps found. (Code = 43) Position 1 = 56, Position 2 = 57
Branch analysis from position: 56
2 jumps found. (Code = 43) Position 1 = 59, Position 2 = 67
Branch analysis from position: 59
Branch analysis from position: 67
Branch analysis from position: 57
Branch analysis from position: 54
Branch analysis from position: 6
filename:       /in/KSEjo
function name:  go_through_recursively
number of ops:  69
compiled vars:  !0 = $node, !1 = $buffer, !2 = $node_name, !3 = $parent_node_name, !4 = $text_contents
line      #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   10     0  E >   RECV                                             !0      
   12     1        BIND_GLOBAL                                              !1, 'buffer'
   14     2        ISSET_ISEMPTY_CV                                 ~5      !1
          3        BOOL_NOT                                         ~6      ~5
          4      > JMPZ                                                     ~6, ->6
   15     5    >   ASSIGN                                                   !1, ''
   17     6    >   FETCH_OBJ_R                                      ~8      !0, 'nodeName'
          7        ASSIGN                                                   !2, ~8
   18     8        FETCH_OBJ_R                                      ~10     !0, 'parentNode'
          9        FETCH_OBJ_R                                      ~11     ~10, 'nodeName'
         10        ASSIGN                                                   !3, ~11
   19    11        FETCH_OBJ_R                                      ~13     !0, 'textContent'
         12        ASSIGN                                                   !4, ~13
   22    13        IS_EQUAL                                                 !2, '%23text'
         14      > JMPZ                                                     ~15, ->51
   24    15    >   IS_EQUAL                                                 !3, 'h1'
         16      > JMPZ                                                     ~16, ->20
   25    17    >   CONCAT                                           ~17     'H1%3A+', !4
         18        CONCAT                                           ~18     ~17, '%0A%0A'
         19        ASSIGN_OP                                     8          !1, ~18
   27    20    >   IS_EQUAL                                                 !3, 'h2'
         21      > JMPZ                                                     ~20, ->25
   28    22    >   CONCAT                                           ~21     'H2%3A+', !4
         23        CONCAT                                           ~22     ~21, '%0A%0A'
         24        ASSIGN_OP                                     8          !1, ~22
   30    25    >   IS_EQUAL                                                 !3, 'h3'
         26      > JMPZ                                                     ~24, ->30
   31    27    >   CONCAT                                           ~25     'H3%3A+', !4
         28        CONCAT                                           ~26     ~25, '%0A%0A'
         29        ASSIGN_OP                                     8          !1, ~26
   33    30    >   IS_EQUAL                                                 !3, 'p'
         31      > JMPZ                                                     ~28, ->34
   34    32    >   CONCAT                                           ~29     !4, '%0A%0A'
         33        ASSIGN_OP                                     8          !1, ~29
   36    34    >   IS_EQUAL                                                 !3, 'strong'
         35      > JMPZ                                                     ~31, ->39
   37    36    >   CONCAT                                           ~32     '%2A%2A', !4
         37        CONCAT                                           ~33     ~32, '%2A%2A'
         38        ASSIGN_OP                                     8          !1, ~33
   39    39    >   IS_EQUAL                                                 !3, 'em'
         40      > JMPZ                                                     ~35, ->44
   40    41    >   CONCAT                                           ~36     '%2A', !4
         42        CONCAT                                           ~37     ~36, '%2A'
         43        ASSIGN_OP                                     8          !1, ~37
   42    44    >   IS_EQUAL                                                 !3, 'a'
         45      > JMPZ                                                     ~39, ->50
   43    46    >   CONCAT                                           ~40     !4, '+%28+'
         47        CONCAT                                           ~41     ~40, 'this+is+supposed+to+be+the+URL%2C+but+I+can%27t+figure+out+how+to+grab+the+%22href%22...'
         48        CONCAT                                           ~42     ~41, '+%29'
         49        ASSIGN_OP                                     8          !1, ~42
         50    > > JMP                                                      ->57
   47    51    >   IS_EQUAL                                                 !2, 'br'
         52      > JMPZ                                                     ~44, ->54
   48    53    >   ASSIGN_OP                                     8          !1, '%0A'
   50    54    >   IS_EQUAL                                                 !2, 'hr'
         55      > JMPZ                                                     ~46, ->57
   51    56    >   ASSIGN_OP                                     8          !1, '---------------%0A%0A'
   54    57    >   FETCH_OBJ_R                                      ~48     !0, 'childNodes'
         58      > JMPZ                                                     ~48, ->67
   56    59    >   FETCH_OBJ_R                                      ~49     !0, 'childNodes'
         60      > FE_RESET_R                                       $50     ~49, ->66
         61    > > FE_FETCH_R                                               $50, !0, ->66
   57    62    >   INIT_FCALL_BY_NAME                                       'go_through_recursively'
         63        SEND_VAR_EX                                              !0
         64        DO_FCALL                                      0          
   56    65      > JMP                                                      ->61
         66    >   FE_FREE                                                  $50
   60    67    > > RETURN                                                   !1
   61    68*     > RETURN                                                   null

End of function go_through_recursively

Generated using Vulcan Logic Dumper, using php 8.0.0


preferences:
170.09 ms | 1407 KiB | 14 Q