This project has moved. For the latest updates, please go here.

Paragraph.Text returns deleted text

Apr 10, 2013 at 8:40 PM
Edited Apr 15, 2013 at 6:11 AM
I'm not sure if this is a bug or if I'm missing something, but when I attempt to read from a .docx file that has been modified with Track Changes enabled, DocX.Paragraphs.Text returns deleted runs.

The original text read "But he is" but was corrected to "He is".

In Word 2013 with changes expanded:
Image

In Word 2013, standard view:
Image

When I access the Text property of the given paragraph, I see:
HBut he is...
Looking at the source code, it appears Paragraph.Text includes deleted runs as well as normal runs.

Is this a bug or am I missing something?

The document part:
<w:ins w:author="A Reviewer" w:date="2013-03-14T13:52:00Z" w:id="1">
    <w:r w:rsidR="00B076C7">
      <w:rPr>
        <w:sz w:val="24" />
      </w:rPr>
      <w:t>H</w:t>
    </w:r>
  </w:ins>
  <w:del w:author="A reviewer" w:date="2013-03-14T13:52:00Z" w:id="2">
    <w:r w:rsidRPr="00FF0859" w:rsidDel="00B076C7">
      <w:rPr>
        <w:sz w:val="24" />
      </w:rPr>
      <w:delText>But h</w:delText>
    </w:r>
  </w:del>
  <w:r w:rsidRPr="00FF0859">
    <w:rPr>
      <w:sz w:val="24" />
    </w:rPr>
    <w:t>e is</w:t>
  </w:r>
Apr 24, 2013 at 7:26 PM
Upon further investigation, it appears that the inclusion of deleted runs is intentional:

HelperFunctions.cs:
internal static string ToText(XElement e)
        {
            switch (e.Name.LocalName)
            {
                case "tab":
                    return "\t";
                case "br":
                    return "\n";
                case "t":
                    goto case "delText";
                case "delText":
                    {
                        if (e.Parent != null && e.Parent.Name.LocalName == "r")
                        {
                            XElement run = e.Parent;
                            var rPr = run.Elements().FirstOrDefault(a => a.Name.LocalName == "rPr");
                            if (rPr != null)
                            {
                                var caps = rPr.Elements().FirstOrDefault(a => a.Name.LocalName == "caps");

                                if (caps != null)
                                    return e.Value.ToUpper();
                            }
                        }

                        return e.Value;
                    }
                case "tr":
                    goto case "br";
                case "tc":
                    goto case "tab";
                default: return "";
            }
        }
This really opens up a can of worms. It seems the solution would be to add some sort of "include/exclude revisions" property to the base class, then, according to that property, include or exclude delText in Paragraph.Text.