Project 1: File Format Conversions

Description

Converting between 3 plain text data representations: CSV, JSON, and XML. You need to write Python 3 code in the scaffold provided that can read in a given file, save them into some common internal representation, and output them in the desired format. You may only use built in modules of Python 3. And, we strongly encourage you to make use of the "csv", "json", and "xml.etree" modules.

What needs to be done

In your git repository, in the "Project1" directory, there is a file named "project.py". There are 6 functions that need to be implemented. The "project.py" file is the only file that should be modified for this project (other than "README.txt").

To run your code, I've provided a second Python file that imports the "project.py" module, named "cli.py" (short for command line interface). The "cli.py" script does the work of figuring out which of the functions you've written should be called for the file conversion. It is a python executable that takes two arguments:

  1. an input file path (to be converted)
  2. an output file path (to be created)

The input and output files must have an extension of csv, json or xml. It infers the desired conversion from the file extensions provided.

To run the executable on a specific file, run the command:

./cli.py Test_Suite/convert_to_json.01.simple.csv output.json

If you've implemented the functions in "project.py", this should create a file named "output.json", and this file should match (character for character) the file "Test_Suite/01.simple.json".

In addition to implementing the 6 functions in "project.py", you need to create a "README.txt" in the Project1 directory. The "README.txt" file must contain: your name, your MSU Net ID, and any sources you drew from for implementing this project.

You can only edit two files for this project, "project.py" and "README.txt".

Testing

To facilitate the successful completion of this project, there is a script called "run_tests.py". After running the following command, it will run all of the tests in the Test_Suite and report the success/failure state for each. It will also yield a tentative grade. Command to run tests:

./run_tests.py

You can also run an individual test using the "run_single_test.py" script. Example:

./run_single_test.py Test_Suite/convert_to_json.01.simple.csv

This will show the output from your conversion. You can specify the flags --input and --correct to display the test input contents and the correct (expected) output from the test.

Details

Order of Columns/Attributes

Although the order of the columns/attributes doesn't matter in practice, for ease of testing, order the columns/attributes of file in lexicographical order.

CSV Specific Instruction

You need for have a header line denoting the columns.

JSON Specific Instructions

The tabular JSON format that you will be using for input and output involves a file with a single array. For each record (e.g. line in a csv file), it becomes one object in the JSON array. A record object has a property (i.e. key) for each column, and a value which is the value of that record.

XML Specific Instructions

XML has a lot of freedom with how you structure your data. For this project, in each XML file, there should be a single "data" node, with as many "record" nodes within it as needed. In each record, there will be an column node corresponding to each column. In each column node, the text content should be the value for that record. There should be no attributes for any element.

Example:

<data>
     <record>
          <id>1</id>
          <name>Josh Nahum</name>
     </record>
     <record>
          <id>2</id>
          <name>Tyler Derr</name>
     </record>
</data>

Note: This output has been prettified, the correct output is all on one line (no extraneous whitespace).

Testing your code

CSE Servers

As the autograder will be run on the CSE servers ("arctic" specifically), you are strongly encouraged to test your code on the CSE servers to ensure your results match the grade we will give. You can run the provided TestSuite using "./run_tests.py" inside the directory for this project. Note: the grade given is tentative and we reserve the right to add more tests if problems arise. But if you pass all the tests (and you've tested your code manually), you will very likely get full credit.

Continuous Integration

As we are using GitHub from project turn in, we can only grade the work that you commit and push to GitHub. Each time you push commits to GitHub, go to https://travis-ci.com/ to see the tests run on the latest commit to GitHub (the content we will be grading). These tests should match the results from running the test script locally. If they do not, you haven't committed and/or pushed your all of your code to GitHub. Pushes after the deadline will be considered late, so ensure that Travis shows the test results that you expect prior to the deadline.

Submission and Late Turn In

At the time listed as the due date for the project, we will fetch the latest commit on the "master" branch of your GitHub repos. This commit will be graded and you should receive an email informing you of the results within a few hours.

To handle lates, the autograder will run every 24 hours for the next 4 days, fetching the latest commit on GitHub and grading it. If there is an improvement in grade (despite the late penalty), you should receive an email informing you of the grade change. No additional action is needed on your part except finishing the project within the late window.

What are the example datasets?

There are three example data files provided for you (from https://support.spatialkey.com/spatialkey-sample-csv-data/ with some minor modifications):

  • "*crime*" contains 7,584 crime records, as made available by the Sacramento Police Department.
  • "*sales*" contains some “sanitized” sales transactions during the month of January.
  • "*realestate.*" is a list of 985 real estate transactions in the Sacramento area reported over a five-day period, as reported by the Sacramento Bee.