Consolidating docx tags

Oct 8, 2009 at 11:31 PM

When I've used the Aspose plugin for Word 2007 to create an epub file from a docx file, I noticed that the html files in the unzipped epub directory contained a lot of contiguous <span> tags, all having the same styling.  I compared those <span> tags with the tags in the docx file and found that there was a one-to-one correspondence.  In some cases, the docx tagging makes sense, e.g. when changing fonts.  But in those cases where the tagging is used to preserve the history of changes, or whatever, I would like to consolidate those regions of text and enclose them by as few docx tags as possible in order to rid the resulting epub files of useless artifacts.  Can I do this with the DocX library?


Oct 10, 2009 at 5:32 PM
Hi John,

this kind of functionality isn't built into DocX. DocX hides the horrible inner workings of the OpenXML format from the user, thus you cannot use DocX to explore inner folders such as epub.

Have you had a look at the OpenXmlSDK? You might find this useful.

