There are 2 types of Unicode characters which this extension replaces with emoji images:
- standardized characters, which Unicode officially describes as representing a certain image, for example U+1F6B2 BICYCLE
- private use characters, found in private use areas within Unicode, which happen to have been used by certain specific systems to represent certain symbols (I am guessing this is what iOS 4 and ealier used before Unicode standardized all the emoji in Unicode 6.0), for example U+E154 which renders as the ATM symbol, and is equivalent to the standardized character U+1F3E7 ATM
The problem is that PUA (Private Use Area) characters can be used by any application for any purpose. For example, MathJax, a Math Formula engine for websites, used by Wikipedia if the user enables it, uses PUA characters to represent certain parts of mathematical symbols.
I've explained the issue in an image here: http://i.imgur.com/bE5S7Ug.png
Therefore, I think the chardict.json file should be redesigned to specify for each character if it is a standardized symbol or a PUA character. For example you could have:
{
"name":"AUTOMATED TELLER MACHINE",
"id":"automated_teller_machine",
"image":"1f3e7.png",
"chars":[
"\uD83C\uDFE7"
],
"pua_chars":[
"\uE154"
]
},
Alternatively, you could distinguish which ones are PUA chars based on the unicode ranges. All PUA chars are in U+E000..U+F8FF, U+F0000..U+FFFFD, U+100000..U+10FFFD.
Then you could include an option to disable the replacement of PUA characters with images so that users can replace just the standardized characters.
Yet another way to solve the problem would be simply to stop supporting the PUA chars altogether, as most (if not all) have a standardized equivalent which can be used instead, although this would break compatibility with very old messages.