This project has moved. For the latest updates, please go here.

How do you reference objects on an existing MS Word document?

Aug 13, 2014 at 11:32 AM
Hi. I would like to use DocX to parse data from documents generated in MS Word, using a template Word doc that I control. I have a feeling I am missing something very fundamental however, because I can't figure out how to reference paragraphs, images, and tables on a document. I created a sample document in Word 2010 with several tables, images, and paragraphs of text. I wrote a simple ASPX page to save a the doc as a MemoryStream and try to spit back some properties. I can't seem to get it to recognize that the document contains any content. Sample snippet is below.
using (MemoryStream stream = new MemoryStream(uploadedFile))
{
   DocX document = DocX.Create(stream);
   Response.Write("Page height: " + document.PageHeight.ToString());    //Returns 1122

   Response.Write("# Paragraphs: " + document.Paragraphs.Count.ToString()); //Returns 0
   Response.Write("# Images: " + document.Images.Count.ToString()); //Returns 0
   Response.Write("# Tables: " + document.Tables.Count.ToString());     //Returns 0
}
I know the DocX document was created properly because there is a page height being returned. Any ideas on what I'm missing here? I have read other posts about referencing tables and paragraphs by iterating through the collections. I'm not able to do that because the collection contains no elements.

I have used DocX in the past to create Word docs (brilliant software!), but this is my first attempt at trying to parse data from an existing Word document. Thank you for your help.
Aug 15, 2014 at 12:55 PM
You are using DocX.Create, which creates a new, blank document. Since it's a new document, there are no paragraphs in it. The PageHeight is a property that is automatically set to a default value when a new DocX document is created, so that's why you do get a value for that property.

Try using DocX.Load(stream) to open your document.
Marked as answer by skinnee on 8/18/2014 at 10:20 AM
Aug 18, 2014 at 5:18 PM
That's exactly what I needed, I can now loop through all of the existing elements on the page. Thank you for your help! It figures it would be something that simple.