3v4l.org

run code in 300+ PHP versions simultaneously
<?php // Clown Emoji // "🤡".length = 2 in Javascript (Firefox) // '🤡abc'.length = 5 in Javascript (Firefox) // . = Byte, () = Surrogate pairs $x = '🤡'; // 4 bytes (UTF-8 (....)) $y = '🤡abc'; // 4 + 3 = 7 bytes (UTF-8 (....) . . .) // In PHP, strings are simply raw byte streams. Right now $x and $y are stored as UTF-8 because // I copy pasted them from my browser. echo "--- These are UTF-8 ---"."\n"; echo "\$x Bytes: ".strlen($x)."\n"; echo "\$x Unicode Codepoint Count (\"characters\"): ".mb_strlen($x, "UTF-8")."\n"; echo "\$x Hex Representation: ".bin2hex($x)."\n"; echo "\$y Bytes: ".strlen($y)."\n"; echo "\$y Unicode Codepoint Count (\"characters\"): ".mb_strlen($y, "UTF-8")."\n"; echo "\$y Hex Representation: ".bin2hex($y)."\n"; echo "--- End ---"."\n"; // Now, lets convert them to UTF-16 where each codepoint is 2 bytes and a surrogate pair is 4 bytes $x1 = mb_convert_encoding($x, "UTF-16", "UTF-8"); // Still 4 bytes! (UTF-16 (.. ..)) $y1 = mb_convert_encoding($y, "UTF-16", "UTF-8"); // 4 + 6 = 10 bytes (UTF-16 (.. ..) .. .. ..) echo "--- These are UTF-16 ---"."\n"; echo "\$x1 Bytes: ".strlen($x1)."\n"; echo "\$x1 Unicode Codepoint Count (\"characters\"): ".mb_strlen($x1, "UTF-16")."\n"; echo "\$x1 Hex Representation: ".bin2hex($x1)."\n"; echo "\$y1 Bytes: ".strlen($y1)."\n"; echo "\$y1 Unicode Codepoint Count (\"characters\"): ".mb_strlen($y1, "UTF-16")."\n"; echo "\$y1 Hex Representation: ".bin2hex($y1)."\n"; echo "--- End ---"."\n"; // Now, Javascript's String is sort of like PHP's raw string byte stream, except: // >>>>>> // JavaScript treats code units as individual characters, while humans generally think in terms of Unicode characters. // This has some unfortunate consequences for Unicode characters outside the BMP. Since surrogate pairs consist of // two code units, '𝌆'.length == 2, even though there’s only one Unicode character there. The individual surrogate // halves are being exposed as if they were characters: '𝌆' == '\uD834\uDF06'. // <<<<<< https://mathiasbynens.be/notes/javascript-encoding // What this basically means is that while proper counting of UTF-16 codepoints would count surrogate pairs (.. ..) as // length 1, Javascript counts them separately as .. .. = length 2. // So, our characters $x1 and $y1 are counted in Javascript as: // $x1 | .. .. = 2 // $y1 | .. .. .. .. .. = 5 // Now it looks obvious that, to emulate Javascript's behaviour we simply need to count the number of bytes // in the UTF-16 encoding, and divide that by half. echo "--- These are UTF-16 ---"."\n"; echo "\$x1 Javascript Emulated strlen/2: ".(strlen($x1)/2)."\n"; echo "\$y1 Javascript Emulated strlen/2: ".(strlen($y1)/2)."\n"; echo "--- End ---"."\n"; // And we can see that Javascript's length behaviour is emulated.
Finding entry points
Branch analysis from position: 0
1 jumps found. (Code = 62) Position 1 = -2
filename:       /in/7Xo9j
function name:  (null)
number of ops:  99
compiled vars:  !0 = $x, !1 = $y, !2 = $x1, !3 = $y1
line      #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
    8     0  E >   ASSIGN                                                   !0, '%F0%9F%A4%A1'
    9     1        ASSIGN                                                   !1, '%F0%9F%A4%A1abc'
   14     2        ECHO                                                     '---+These+are+UTF-8+---%0A'
   15     3        STRLEN                                           ~6      !0
          4        CONCAT                                           ~7      '%24x+Bytes%3A+', ~6
          5        CONCAT                                           ~8      ~7, '%0A'
          6        ECHO                                                     ~8
   16     7        INIT_FCALL                                               'mb_strlen'
          8        SEND_VAR                                                 !0
          9        SEND_VAL                                                 'UTF-8'
         10        DO_ICALL                                         $9      
         11        CONCAT                                           ~10     '%24x+Unicode+Codepoint+Count+%28%22characters%22%29%3A+', $9
         12        CONCAT                                           ~11     ~10, '%0A'
         13        ECHO                                                     ~11
   17    14        INIT_FCALL                                               'bin2hex'
         15        SEND_VAR                                                 !0
         16        DO_ICALL                                         $12     
         17        CONCAT                                           ~13     '%24x+Hex+Representation%3A+', $12
         18        CONCAT                                           ~14     ~13, '%0A'
         19        ECHO                                                     ~14
   18    20        STRLEN                                           ~15     !1
         21        CONCAT                                           ~16     '%24y+Bytes%3A+', ~15
         22        CONCAT                                           ~17     ~16, '%0A'
         23        ECHO                                                     ~17
   19    24        INIT_FCALL                                               'mb_strlen'
         25        SEND_VAR                                                 !1
         26        SEND_VAL                                                 'UTF-8'
         27        DO_ICALL                                         $18     
         28        CONCAT                                           ~19     '%24y+Unicode+Codepoint+Count+%28%22characters%22%29%3A+', $18
         29        CONCAT                                           ~20     ~19, '%0A'
         30        ECHO                                                     ~20
   20    31        INIT_FCALL                                               'bin2hex'
         32        SEND_VAR                                                 !1
         33        DO_ICALL                                         $21     
         34        CONCAT                                           ~22     '%24y+Hex+Representation%3A+', $21
         35        CONCAT                                           ~23     ~22, '%0A'
         36        ECHO                                                     ~23
   21    37        ECHO                                                     '---+End+---%0A'
   25    38        INIT_FCALL                                               'mb_convert_encoding'
         39        SEND_VAR                                                 !0
         40        SEND_VAL                                                 'UTF-16'
         41        SEND_VAL                                                 'UTF-8'
         42        DO_ICALL                                         $24     
         43        ASSIGN                                                   !2, $24
   26    44        INIT_FCALL                                               'mb_convert_encoding'
         45        SEND_VAR                                                 !1
         46        SEND_VAL                                                 'UTF-16'
         47        SEND_VAL                                                 'UTF-8'
         48        DO_ICALL                                         $26     
         49        ASSIGN                                                   !3, $26
   29    50        ECHO                                                     '---+These+are+UTF-16+---%0A'
   30    51        STRLEN                                           ~28     !2
         52        CONCAT                                           ~29     '%24x1+Bytes%3A+', ~28
         53        CONCAT                                           ~30     ~29, '%0A'
         54        ECHO                                                     ~30
   31    55        INIT_FCALL                                               'mb_strlen'
         56        SEND_VAR                                                 !2
         57        SEND_VAL                                                 'UTF-16'
         58        DO_ICALL                                         $31     
         59        CONCAT                                           ~32     '%24x1+Unicode+Codepoint+Count+%28%22characters%22%29%3A+', $31
         60        CONCAT                                           ~33     ~32, '%0A'
         61        ECHO                                                     ~33
   32    62        INIT_FCALL                                               'bin2hex'
         63        SEND_VAR                                                 !2
         64        DO_ICALL                                         $34     
         65        CONCAT                                           ~35     '%24x1+Hex+Representation%3A+', $34
         66        CONCAT                                           ~36     ~35, '%0A'
         67        ECHO                                                     ~36
   33    68        STRLEN                                           ~37     !3
         69        CONCAT                                           ~38     '%24y1+Bytes%3A+', ~37
         70        CONCAT                                           ~39     ~38, '%0A'
         71        ECHO                                                     ~39
   34    72        INIT_FCALL                                               'mb_strlen'
         73        SEND_VAR                                                 !3
         74        SEND_VAL                                                 'UTF-16'
         75        DO_ICALL                                         $40     
         76        CONCAT                                           ~41     '%24y1+Unicode+Codepoint+Count+%28%22characters%22%29%3A+', $40
         77        CONCAT                                           ~42     ~41, '%0A'
         78        ECHO                                                     ~42
   35    79        INIT_FCALL                                               'bin2hex'
         80        SEND_VAR                                                 !3
         81        DO_ICALL                                         $43     
         82        CONCAT                                           ~44     '%24y1+Hex+Representation%3A+', $43
         83        CONCAT                                           ~45     ~44, '%0A'
         84        ECHO                                                     ~45
   36    85        ECHO                                                     '---+End+---%0A'
   58    86        ECHO                                                     '---+These+are+UTF-16+---%0A'
   59    87        STRLEN                                           ~46     !2
         88        DIV                                              ~47     ~46, 2
         89        CONCAT                                           ~48     '%24x1+Javascript+Emulated+strlen%2F2%3A+', ~47
         90        CONCAT                                           ~49     ~48, '%0A'
         91        ECHO                                                     ~49
   60    92        STRLEN                                           ~50     !3
         93        DIV                                              ~51     ~50, 2
         94        CONCAT                                           ~52     '%24y1+Javascript+Emulated+strlen%2F2%3A+', ~51
         95        CONCAT                                           ~53     ~52, '%0A'
         96        ECHO                                                     ~53
   61    97        ECHO                                                     '---+End+---%0A'
   63    98      > RETURN                                                   1

Generated using Vulcan Logic Dumper, using php 8.0.0


preferences:
139.17 ms | 1011 KiB | 16 Q