Browse Source

MySQL's utf8 charset only supports up to 3-byte symbols. Insterting four byte symbols (U+010000 to U+10FFFF) can be done maliciously to break HTML mark-up.

The ideal solution was to convert to MySQL's utf8mb4 charset, but then we would lose support for MySQL < 5.5.3. In this fix, incompatible characters are encoded as HTML numeric character references (eg. #65536) and just stripped from body_nommarkup.
pull/40/head
Michael Foster 11 years ago
parent
commit
461084d400
  1. 2
      inc/functions.php
  2. 2
      post.php

2
inc/functions.php

@ -1523,7 +1523,7 @@ function markup(&$body, $track_cites = false) {
}
function utf8tohtml($utf8) {
return htmlspecialchars($utf8, ENT_NOQUOTES, 'UTF-8');
return mb_encode_numericentity(htmlspecialchars($utf8, ENT_NOQUOTES, 'UTF-8'), array(0x010000, 0xffffff, 0, 0xffffff), 'UTF-8');
}
function ordutf8($string, &$offset) {

2
post.php

@ -378,7 +378,7 @@ if (isset($_POST['delete'])) {
wordfilters($post['body']);
$post['body_nomarkup'] = $post['body'];
$post['body_nomarkup'] = preg_replace('/[\x{010000}-\x{ffffff}]/u', '', $post['body']);
if (!($mod && isset($post['raw']) && $post['raw']))
$post['tracked_cites'] = markup($post['body'], true);

Loading…
Cancel
Save