C# Duplicate DOCX File Using OpenXML
https://stackoverflow.com/questions/16069435/c-sharp-duplicate-docx-file-using-openxml
I'm trying to duplicate the docx file contents and save them within the same file using OpenXML in C#
Here is the code:
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(wordFileNamePath, true))
{
foreach(OpenXmlElement element in wordDoc.MainDocumentPart.Document.ChildElements)
{
OpenXmlElement cloneElement = (OpenXmlElement)element.Clone();
wordDoc.MainDocumentPart.Document.Append(cloneElement);
}
wordDoc.MainDocumentPart.Document.Save();
}
The code is working fine and does what I need. My problem is that the resulting docx file is partially corrupted. When I open my file I get the following two messages:
Clicking on 'OK' then 'Yes' will open the file normally. However, the file keeps being corrupted until I 'save as' it (with the same or with a different name). That's how the new saved file becomes fixed.
By using the Open XML SDK 2.5 Productivity Tool for Microsoft Office, I can Validate the file and see the reflected code. Validating the file will give the following 5 errors:
So I think that "Clone" function that I use in my code copies the element as it is so when it is appended to the document, some IDs duplications occur.
Any idea to get a proper working DOCX file after duplicating itself? Any alternative code is appreciated.
The problem with your method is that it creates invalid Open XML markup. Here is why.
Let's say you have a very simple Word document that is represented by the following markup:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>First paragraph</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t>Second paragraph</w:t>
</w:r>
</w:p>
<w:body>
<w:document>
In your foreach
loop, wordDoc.MainDocumentPart.Document.ChildElements
will be a single-element list that only contains the w:body
element. Thus, you create a deep clone of the w:body
element and append that to the w:document
. The resulting Open XML markup looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>First paragraph</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t>Second paragraph</w:t>
</w:r>
</w:p>
<w:body>
<w:body>
<w:p>
<w:r>
<w:t>First paragraph</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t>Second paragraph</w:t>
</w:r>
</w:p>
<w:body>
<w:document>
The above is a w:document
with two w:body
child elements, which is invalid Open XML markup as w:document
must have exactly one w:body
child element. Thus, Word shows that error message.
To fix this, you need to work with Document.Body
wherever you just use Document
. The following, streamlined example shows how to do it.
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(wordFileNamePath, true))
{
Body body = wordDoc.MainDocumentPart.Document.Body;
IEnumerable<OpenXmlElement> clonedElements = body
.Elements()
.Select(e => e.CloneNode(true))
.ToList();
body.Append(clonedElements);
}
You'll see that I did not save the Document
explicitly as that is not necessary due to the using
statement and the fact that those documents are auto-saved by default. Secondly, I used ToList()
to materialize the collection before appending. This is to avoid any issues while enumerating elements that are changed at the same time.