The file 'Output.docx' cannot be opened because there are problems with the contents

Mar 20, 2012 at 7:57 AM

Hi all,

I stumbled across this great little project when refactoring some legacy web app code that was populating RTF file templates and stitching them together to create contiguous documents.

I'd love to use this little library but when rolling a little prototype I hit a snag. I took one of the old RTF templates, opened it in Word (Office 2010), saved it as a docx, converted it to the latest version, and then saved it again. Wrote a little console app to read this new template twice then stitch the two copies together in a new docx using DocX.InsertDocument().

When I tried to open the new document in Word I get the following error:

The file Output.docx cannot be opened because there are problems with the contents.

Clicking 'Details' just shows:

The file is corrupt and cannot be opened.

Clicking OK gives you:

Word found unreadable content in Output.docx. Do you want to recover the contents of the document? If you trust the source of the document, click Yes.

Clicking Yes the file opens and looks OK, but there's no real indication of what was wrong with the structure. Obviously I need to get rid of the corrupt document warnings.

I think this is possibly the same issue that I saw another thread on here but there was no resolution. I've tried re-saving the corrupt document to see what Word changes (subsequent opens work fine) and there are additional files in the re-saved package. Unfortunately I don't know enough about the structure of these docx packages to be much more help than that.

Can anyone help get me up and running with this? I'd be happy to help where I can, repro'ing the issue and testing fixes if required? I'd love to be able to replace that old code with this component ;)

Cheers,

Alan

Mar 20, 2012 at 8:01 AM

Ugh. Sorry. Just saw this documented as issue 10317. 

Developer
Mar 20, 2012 at 8:46 AM

We're waiting for fix on "merging" documents for a loong while. Cathal said it's very young code for the merging so all we can do is just wait till it gets priority.

Mar 21, 2012 at 1:59 AM
Yes, I saw that in another thread as well. I've subscribed to the discussion and hope to see this issue get picked up sometime in the near future.

Cheers,
Alan

On Tue, Mar 20, 2012 at 9:46 PM, MadBoy <notifications@codeplex.com> wrote:

From: MadBoy

We're waiting for fix on "merging" documents for a loong while. Cathal said it's very young code for the merging so all we can do is just wait till it gets priority.

Read the full discussion online.

To add a post to this discussion, reply to this email (DocX@discussions.codeplex.com)

To start a new discussion for this project, email DocX@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com


Coordinator
Apr 3, 2012 at 8:51 PM

I finially got around to working on this feature. Please checkout change-set: 75223
Thanks for your patience on this one guys.

Here is an example of its use.

using (DocX a = DocX.Load("a.docx"))
{   
    using (DocX b = DocX.Load("b.docx"))
        a.InsertDocument(b);

    a.Save();
}

Note: Headers, Footers and document specific defaults are not merged. So if you want your merged document to contain Headers and Footers your initial document (a.docx in the above example) must have them. You could of course add Headers and Footers after merging.


Developer
Apr 3, 2012 at 9:12 PM
Edited Apr 3, 2012 at 9:13 PM

Yay! Hip Hip Hurray for Coffey :-) Does it contain a fix for the last thing I sent you?

Coordinator
Apr 3, 2012 at 9:39 PM

Madboy: Are you referring to the default document settings problem? 
I have been thinking about this a lot today.

Firstly, for the others who will read this, who could not have read the emails we sent each about this issue :-)
Here is an explanation of the problem.

Internally in a document each piece of text is marked with a Style attribute.
This Style contains among other things font information. Below is a horribly simplified sudo example

<p style="style_1">Line one</p>
<p style="style_2">Line two</p>
.
.

<p style="style_n">Line n</p>

However most documents re-use the same Style for most text... so the idea of a default Style was introduced. Now only text which uses the no default Style needs to state its Style.

<p>Line one</p>
<p>Line two</p>
.
.

<p style="weird_style"></p>
.
.

<p>Line n</p>

This document specific default style idea causes a problem when you merge two documents together which have different default styles. A document can only have one default style... so all the text in the second document (which doesn't specify a style) inherits the default style from the first document.

The only solution I can think... and I don't like this solution would be.

1) Create a new Style which is the default style from document 2.
2) Explicitly set the Style of every piece of text in the second document to this new Style.
3) Merge the documents.

Will this fix this issue? Yes it will, the document will look perfect.
Is this a good idea? I don't think so 1) this would cause the document to be bloated and 2) What if you decide to change the style of a section? You would have to remove the explicit Style of each piece of text.

Developer
Apr 3, 2012 at 9:45 PM

But how come those 2 documents have different default styles? They are almost identical with small changes in values, created by same code (DocX). Each line in it's place has exactly the same style. How would you recommend me fixing this in my code so that I get the document formatted as it is? In my case the documents are supposed to be printed out for a customer so there's no need to worry about future editing of DocX (although i understand the concern that the method needs to work for everyone not just my specific scenario).

 

Coordinator
Apr 3, 2012 at 9:50 PM

Have you ever (using Microsoft Word) opened and saved any of the documents being merged?
Word does some very strange things sometimes.

If you generate a document using DocX, then open it in Word and re-save it, the internal document could be changed dramatically by Word. I would really like to talk to (give out to) the team developing MS Word :-)

Developer
Apr 3, 2012 at 9:59 PM

No, but I guess it won't fix my issue with 12 Calibri becoming 10 Calibri :-)

Coordinator
Apr 3, 2012 at 10:02 PM
Ha... wishful thinking.

If you send me the documents again (I wiped them from my machine). I will investigate the missing 2 units of font further.
Apr 15, 2013 at 9:51 AM
I have a similar problem: sometimes the merged document has multiple default style causing Word to try to fix it.

Have you ever tried implementing the suggestion you make some comments above? That is:
The only solution I can think... and I don't like this solution would be.

1) Create a new Style which is the default style from document 2.
2) Explicitly set the Style of every piece of text in the second document to this new Style.
3) Merge the documents.
Does it work fine (forgetting the increased size of the document for a moment)?
Dec 20, 2014 at 2:52 PM
Here is one of the easiest and quickest solutions to fix this error : http://repairwordfile.blogspot.com/2010/03/fixing-word-found-unreadable-content.html

I successfully resolved the similar problem with the help of the solution given there !