Coder Social home page Coder Social logo

lgtm-migrator / xml_normalize Goto Github PK

View Code? Open in Web Editor NEW

This project forked from daniel-sc/xml_normalize

0.0 0.0 0.0 1.22 MB

Normalizes xml files. Options include sorting siblings based on provided attribute, remove nodes, normalize whitespace/trim and pretty print.

JavaScript 4.85% TypeScript 95.15%

xml_normalize's Introduction

npm Coverage Status Language grade: JavaScript

XML Normalize

This program allows normalizing arbitrary xml files. Normalization can be configured:

  • sort sibling elements based on some attribute value
  • remove unwanted nodes
  • trim texts
  • normalize whitespaces/line breaks

This can be used as a post-/pre-processing step to keep diffs small for generated xml files.

Usage

Either install via npm i -g xml_normalize or run directly with npx xml_normalize.

Usage: npx xml_normalize [options]

Options:
  -i, --input-file <inputFile>       input file
  -o, --output-file <outputFile>     output file - if not provided result is printed to stdout
  -r, --remove-path <removePath...>  simple XPath(s) to remove elements - e.g. "/html/head[1]/script"
  -s, --sort-path <sortPath>         simple XPath that references an attribute to sort - e.g. "/html/head[1]/script/@src"
  --no-pretty                        Disable pretty format output
  --no-trim                          Disable trimming of whitespace at the beginning and end of text nodes (trims only pure text nodes)
  --no-attribute-trim                Disable trimming whitespace at the beginning and end of attribute values
  -tf, --trim-force                  Trim the whitespace at the beginning and end of text nodes (trims as well text adjacent to nested nodes)
  -n, --normalize-whitespace         Normalize whitespaces inside text nodes and attribute values
  -d, --debug                        enable debug output
  -h, --help                         display help for command

Options and Examples

Sorting

Allows to sort siblings at a specific path with the same tag name lexicographically based on a specific attribute value.

Example:

<root>
  <node>
    <child id="z">should be last</child>
    <child id="a">should be first</child>
  </node>
  <node>
    <child id="y">should be last</child>
    <child id="b">should be first</child>
  </node>
</root>

npx xml_normalize -s /root/node/child/@id will create:

<root>
  <node>
    <child id="a">should be first</child>
    <child id="z">should be last</child>
  </node>
  <node>
    <child id="b">should be first</child>
    <child id="y">should be last</child>
  </node>
</root>

Removing

Allows to remove nodes in a specific path.

Example:

<root>
  <node>
    <child id="z">should be removed</child>
    <child id="a">should be removed</child>
  </node>
  <node>
    <child id="y">should stay</child>
    <child id="b">should stay</child>
  </node>
</root>

npx xml_normalize -r /root/node[1]/child will create:

<root>
  <node/>
  <node>
    <child id="b">should stay</child>
    <child id="y">should stay</child>
  </node>
</root>

npx xml_normalize -r /root/node/child instead, will create:

<root>
  <node/>
  <node/>
</root>

Normalize whitespace

This option replaces any number of consecutive whitespace, tab, new line characters with a single whitespace (in text nodes).

Example:

<root>
  <node>
    <child id="z">some    xml
    has messed up 
    formatting
    </child>
      
      
    <child id="sometimes      even attributes are messed 
    up">some more     mess</child>
  </node>
</root>

npx xml_normalize --normalize-whitespace will create:

<root>
  <node>
    <child id="z">some xml has messed up formatting</child>
    <child id="sometimes even attributes are messed up">some more mess</child>
  </node>
</root>

Paths for sorting and removing

Paths are a simple subset of XPaths.

/ROOT/NODE_NAME[INDEX]/ANOTHER_NODE

Supported:

  • Only absolute paths
  • Index access (note in XPath indices are 1-based!)
  • Simple predicates using the following functions (parameters can be string (double quotes) or XPaths):
    • starts-with(str,prefix)
    • contains(str,contained)
  • Node wildcard - e.g /root/* to select all nodes in root of any type.
  • Attribute reference in last node - e.g. /root/node/@id.

What is this good for?

This helps to bring xml in a standardized form, so that changes can easily be spotted in diff tool or git pull request.

For example, you could run it as a post processing/pre commit script when re-generating XLIFF translation files (or getting them back from your beloved translator in a messed up form).

Contribute

PRs always welcome :-)

xml_normalize's People

Contributors

daniel-sc avatar tkiefer24 avatar bartcorremans avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.