Skip to content

Python scripts for processing XML documents and converting them to SQL, CSV, and JSON

Notifications You must be signed in to change notification settings

junskang/xmlutils.py

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xmlutils.py

A set of Python scripts for processing xml files serially, namely converting them to other formats (SQL, CSV, JSON). The scripts use ElementTree.iterparse() to iterate through nodes in an XML file, thus not needing to load the whole DOM into memory. The scripts can be used to churn through large XML files (albeit taking long :P) without memory hiccups. (Note: The XML files are NOT validated by the scripts.)

Kailash Nadh, October 2011

License: MIT License

Documentation: http://kailashnadh.name/code/xmlutils.py

xml2csv.py

Convert an XML document to a CSV file.

python xml2csv.py --input "samples/fruits.xml" --output "samples/fruits.csv" --tag "item"

options

--input Input XML document's filename*
--output Output CSV file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--delimiter Delimiter for seperating items in a row. Default is , (a comma followed by a space)
--ignore A space separated list of element tags in the XML document to ignore.
--no-header Skip adding the CSV header (list of fields) to the first line; Default is False.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--buffer The number of records to be kept in memory before it is written to the output CSV file. Helps reduce the number of disk writes. Default is 1000.

##xml2sql.py Convert an XML document to an SQL file.

python xml2sql.py --input "samples/fruits.xml" --output "samples/fruits.sql" --tag "item" --table "myfruits"

options

--input Input XML document's filename*
--output Output SQL file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--ignore A space separated list of element tags in the XML document to ignore.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--packet Maximum size of a single INSERT query in MBs. Default is 8. Set based on MySQL's max_allowed_packet configuration.

##xml2json.py Convert XML to JSON.

Unlike xml2sql and xml2csv, xml2py is not a stand alone utility, but a library. Moreover, it supports hierarchies nested to any number of levels.

usage

from xml2json import *

# given an ElementTree Element, return its json
json = xml2json(elem)


# __________ Working with files
# xml2json_file(input_filename, output_filename[optional], prettyprint[True or False], file_encoding[default: utf-8])

# read an xml file and return json
json = xml2json_file("samples/fruits.xml")

# read an xml file and write json to a file
xml2json_file("samples/fruits.xml", "samples/fruits.json")

About

Python scripts for processing XML documents and converting them to SQL, CSV, and JSON

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published