Ticket T755326
Visible to All Users

RichEditControl: pasted bullet list symbols lead to Unicode characters in private use area when converting to in plain text

created 6 years ago

In our application, text with bullet lists is edited using the RichEditControl, the resulting RTF text is obtained from the "RtfText" property and is stored in a database. Later we load that RTF text from the database into an instance of RichEditDocumentServer and obtain plain text by reading from the "Text" property.

When a bullet list is pasted into the editor, the underlying RTF for the bullet symbol looks like this:

\u-3913'3f\tab

The same bullet list symbol when saved from Word 2016 looks like this:

\f3 'b7\tab

When formatting a bullet list directly in the RichEditControl as opposed to pasting one, the bullet list symbol looks like this:

\u183'b7\tab

In the first case, the character with code f0b7 is written (-3913 when treated as signed 16bit integer), followed by the character with code 3f (? character). No font is defined. Some magic causes this to be rendered with the correct font, even though most fonts do not have glyphs for characters in that range (it's the private use area after all). When converting to plain text, this character comes out unchanged, and appears as "invalid character" symbol when viewed in most text editors.

In the second case, font 3 is defined, which is "Symbol" in this document, followed by the character with code b7, which is the mid point character. The "Symbol" font has that character, and converting to plain text is no problem.

In the third case, no font is defined either, but the mid point character is written. Converting to plain text is also no problem.

The issue is obviously that pasting into the editor vs. formatting directly in the editor gives different results. I would expect pasting to leave the original character b7 unchanged instead of changing it to private use area character f0b7.

A small demo project is attached to demonstrate this.

Comments (2)

    Just tested with 19.1.3 as well, same behavior.

    DevExpress Support Team 6 years ago

      Hello Markus,

      I have reproduced the behavior you described. However, I need additional time to research it. I will contact you as soon as I have any results.

      Answers approved by DevExpress Support

      created 6 years ago

      >>When a bullet list is pasted into the editor, the underlying RTF for the bullet symbol looks like this:
      \u-3913'3f\tab

      The same bullet list symbol when saved from Word 2016 looks like this:
      \f3 'b7\tab<<

      I discussed this behavior with our developers, and we decided to keep the current behavior unchanged in order not to break existing documents of other customers.

      Our RichEditControl operates (reads and writes) Rtf documents according to the Word 97 RTF specification. The Word 97 RTF specification also provides the flat text representation of each number (in the *listtext* destination); so, RTF readers that don't understand Word 97 numbering will get a paragraph number along with appropriate character properties inserted into their document at the beginning of the paragraph - see the Word 97 RTF specification (the "Bullets and Numbering" sub-topic): *"Any RTF reader that does understand Word 97 numbering should ignore the entire * *listtext destination."

      Thus, our RichEditControl reads numbered and bulleted list settings from the List table (destination \listtable). Particularly, a bullet character is declared by the number format (the *leveltext*  destination) for each list level. The *leveltext*  destination for the first list level in your document is written as follows:
      {\leveltext\leveltemplateid67567617'01\u-3913 ?;}

      So, our RichEditControl loads and writes the bullet character as \u-3913.

      It looks like Microsoft Office Word 2016 uses the *listtext* destination in this case:
      {\listtext\pard\plain\ltrpar \s15 \rtlch\fcs1 \af31507\afs22 \ltrch\fcs0 \f3\fs22\insrsid144010 \loch\af3\dbch\af31505\hich\f3 'b7\tab}

      Microsoft Office Word behavior is not documented and does not meet the Word 97 RTF specification our RichEditControl follows.

      >>When formatting a bullet list directly in the RichEditControl as opposed to pasting one, the bullet list symbol looks like this:
      \u183'b7\tab <<

      It appears that another symbol for bullets was used for bulleted lists. Note that you can configure bulleted lists in the RichEditControl document programmatically. Please refer to the Lists help topic to get started.

      Here is a sample code snippet that demonstrates how to change the  bullet character for the first list level in your document:

      C#
      var abstractLists = richEditControl1.Document.AbstractNumberingLists; AbstractNumberingList defaultList = abstractLists[0]; defaultList.Levels[0].DisplayFormatString = "\u00B7"; //export plain text var s = richEditControl1.Text;

      >>In the first case, the character with code f0b7 is written (-3913 when treated as signed 16bit integer), followed by the character with code 3f (? character). No font is defined. Some magic causes this to be rendered with the correct font,…<<

      The character properties are defined for each list level in the List table in Rtf documents. The first list level in the document you provided is written as follows:
      {\listlevel\levelnfc23
      \levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\leveltext\leveltemplateid67567617'01\u-3913 ?;}{\levelnumbers;}* f3\fbias0 \fi-360\li720\lin720 }

      This list level is formatted with the f3 font number declared in the font table as follows:
      {\f3\fbidi \froman\fcharset2\fprq2{\\panose 05050102010706020507}Symbol;}

      Thus, bullets of the first list level are formatted with the Symbol font. The same logic is used to obtain character settings of the second list level.

        Disclaimer: The information provided on DevExpress.com and affiliated web properties (including the DevExpress Support Center) is provided "as is" without warranty of any kind. Developer Express Inc disclaims all warranties, either express or implied, including the warranties of merchantability and fitness for a particular purpose. Please refer to the DevExpress.com Website Terms of Use for more information in this regard.

        Confidential Information: Developer Express Inc does not wish to receive, will not act to procure, nor will it solicit, confidential or proprietary materials and information from you through the DevExpress Support Center or its web properties. Any and all materials or information divulged during chats, email communications, online discussions, Support Center tickets, or made available to Developer Express Inc in any manner will be deemed NOT to be confidential by Developer Express Inc. Please refer to the DevExpress.com Website Terms of Use for more information in this regard.