c# - Convert Docx to html using OpenXml power tools without formatting -
i'm using openxml power tools in project convert document (docx) html, using code provided sdk produces elegant duplicate in html form.(github link : https://github.com/officedev/open-xml-powertools/blob/vnext/openxmlpowertoolsexamples/htmlconverter01/htmlconverter01.cs )
however looking @ html markup, html has embedded styling.
is there way of turning off , using plain , simple <h1>
, <p>
tags ?
i know embedded styling formatting taken care of bootstrap.
the embedded styling follows :
<p dir="ltr" style="font-family: calibri;font-size: 11pt;line-height: 115.0%;margin-bottom: 0;margin-left: 0;margin-right: 0;margin-top: 0;"> <span xml:space="preserve" style="font-size: 11pt;font-style: normal;font-weight: normal;margin: 0;padding: 0;"> </span> </p>
this can see fine if want direct copy, not if want control style yourself.
in c# code have made following ajustments :
- additionalcss commented out
- fabricatecssclasses false
- cssclassprefix commented out
many thanks.
if can xmlreader
, xmlwriter
obtain bare bone html. little overkill, tag , text content kept.
public static class htmlhelper { /// <summary> /// keep openning , closing tag, , text content html /// </summary> public static string cleanup(string html) { var output = new stringbuilder(); using (var reader = xmlreader.create(new stringreader(html))) { var settings = new xmlwritersettings() { indent = true, omitxmldeclaration = true }; using (var writer = xmlwriter.create(output, settings)) { while (reader.read()) { switch (reader.nodetype) { case xmlnodetype.element: writer.writestartelement(reader.name); break; case xmlnodetype.text: writer.writestring(reader.value); break; case xmlnodetype.endelement: writer.writefullendelement(); break; } } } } return output.tostring(); } }
resulting output :
<p> <span></span> </p>
Comments
Post a Comment