Australian government public data extraction, formatting and exploratory analysis.
The main goal is to form an understanding about the data Australia has available in its data portal (data.gov.au). The steps considered to achieve the goal are:
- Create data set
- Perform some exploratory analysis using summaries and visualisations.
The dataset will be created web scraping the data portal. The programming language to use is yet to be decided (R or python).
As the goal is to understand the main characteristics and features of the data, the expected results are: to get a tidy dataset in the first place, and then some insights about what kind of data is uploaded to the portal, how many different formats are there and how are they distributed, what are the main tags or categories of the datasets available for download, how new or old the data is, and others to be discovered during the analytic process. The dataset to be created is very basic, hence the expected results are also fairly basic.