[ACCEPTED]-strip comments from xml file and pretty-print it-sh
you can use tidy
$ tidy -quiet -asxml -xml -indent -wrap 1024 --hide-comments 1 tomcat-users.xml
<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
<user username="qwerty" password="ytrewq" roles="manager-gui" />
</tomcat-users>
0
Run your XML through an identity transform XSLT, with an empty 4 template for comments.
All of the XML content, except 3 for the comments, will be passed through 2 to the output.
In order to niecely format 1 the output, set the output @indent="yes":
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<!--Match on Attributes, Elements, text nodes, and Processing Instructions-->
<xsl:template match="@*| * | text() | processing-instruction()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!--Empty template prevents comments from being copied into the output -->
<xsl:template match="comment()"/>
</xsl:stylesheet>
You might want to look at the xmllint
tool. It 6 has several options (one of which --format
will 5 do a pretty print), but I can't figure out 4 how to remove the comments using this tool.
Also, check 3 out XMLStarlet, a bunch of command line tools to do 2 anything you would want to with xml. Then 1 do:
xml c14n --without-comments # XML file canonicalization w/o comments
EDIT: OP eventually used this line:
xmlstarlet c14n --without-comments old.xml > new.xml
To tidy up something simple like Tomcat's 14 server.xml, I use
sed 's/<!--/\x0<!--/g;s/-->/-->\x0/g' | grep -zv '^<!--' | tr -d '\0' | grep -v "^\s*$"
I.e.
function tidy() {
echo "$( cat $1 | sed 's/<!--/\x0<!--/g;s/-->/-->\x0/g' | grep -zv '^<!--' | tr -d '\0' | grep -v "^\s*$")"
}
tidy server.xml
... will print the 13 xml without comments.
NOTE: while it works 12 reasonably well for simple things, it will 11 fail with certain CDATA blocks and some 10 other situations. Only use it for controlled 9 xml scripts that have no need and will never 8 need to escape a single <--
or -->
anywhere!
First 7 sed marks comment's start and stop with 6 0x0 characters, then grep with -z
treats 0x0 5 as the only line delimiter, searches for 4 lines starting with comment, it's -v inverts 3 the filter, leaving only meaningful lines. Finally, tr -d
\0` deletes 2 all these 0x0, and to polish it up, another 1 grep removes empty lines: voila.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.