See implementations of [linkify](https://github.com/markdown-it/markdown-it/blob/master/lib/rules_core/linkify.js) & [emoji](https://github.com/markdown-it/markdown-it-emoji/blob/master/lib/replace.js) - those do text token splits.
Inline parser skips porsion of texts for the best speed. It stops only on [small set of chars](https://github.com/markdown-it/markdown-it/blob/master/lib/rules_inline/text.js), which can be tokens. We did not made this list extendable, also for performance reasons.
If you are absolutely sure, that something important is missed there - create a
ticket and we will add new charcodes.
#### Why do you reject to accept some useful things?
We do markdown parser. It should keep "markdown spirit". Other things should
be kept separate (in plugins, for example). We have no clear criteria, sorry.
Probably, you will find useful to read [CommonMark forum](http://talk.commonmark.org/) to understand us better.
Of cause, if you find architecture of this parser interesting for another type
of markup - you are welcome to reuse it in another project.