Skip to content

Latest commit

 

History

History
162 lines (145 loc) · 3.61 KB

README.md

File metadata and controls

162 lines (145 loc) · 3.61 KB

xmlutils.py

A set of Python scripts for processing xml files serially, namely converting them to other formats (SQL, CSV, JSON). The scripts use ElementTree.iterparse() to iterate through nodes in an XML file, thus not needing to load the whole DOM into memory. The scripts can be used to churn through large XML files (albeit taking long :P) without memory hiccups. (Note: The XML files are NOT validated by the scripts.)

Kailash Nadh, October 2011

License: MIT License

Documentation: http://kailashnadh.name/code/xmlutils.py

xml2csv.py

Convert an XML document to a CSV file.

python xml2csv.py --input "samples/fruits.xml" --output "samples/fruits.csv" --tag "item"

options

--input Input XML document's filename*
--output Output CSV file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--delimiter Delimiter for seperating items in a row. Default is , (a comma followed by a space)
--ignore A space separated list of element tags in the XML document to ignore.
--no-header Skip adding the CSV header (list of fields) to the first line; Default is False.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--buffer The number of records to be kept in memory before it is written to the output CSV file. Helps reduce the number of disk writes. Default is 1000.

##xml2sql.py Convert an XML document to an SQL file.

python xml2sql.py --input "samples/fruits.xml" --output "samples/fruits.sql" --tag "item" --table "myfruits"

options

--input Input XML document's filename*
--output Output SQL file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--ignore A space separated list of element tags in the XML document to ignore.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--packet Maximum size of a single INSERT query in MBs. Default is 8. Set based on MySQL's max_allowed_packet configuration.

##xml2json.py Convert XML to JSON.

Unlike xml2sql and xml2csv, xml2py is not a stand alone utility, but a library. Moreover, it supports hierarchies nested to any number of levels.

usage

from xml2json import *

# given an ElementTree Element, return its json
json = xml2json(elem)


# __________ Working with files
# xml2json_file(input_filename, output_filename[optional], prettyprint[True or False], file_encoding[default: utf-8])

# read an xml file and return json
json = xml2json_file("samples/fruits.xml")

# read an xml file and write json to a file
xml2json_file("samples/fruits.xml", "samples/fruits.json")