Project 3: UPDATE, WHERE, JOIN

Description

This project will be extending the "project" module, which solves Project 2, and emulates the behaviour of the built in module, "sqlite3". Your module will be able to execute SQL statements corresponding to: (from Project 2) creating tables, inserting rows, and selecting rows; (and adding) various syntax improvements, updating and deleting rows, joins and the distinct keyword.

Files in Project3/

  • Test_Suite/: Folder containing test SQL files. Each file contains a number of SQL statements.
  • project.py: You need to supply this file. You should probably copy the project.py file from Project2/ or from Project2_solution/.
  • cli.py: This file imports your project.py. It takes 2 required arguments (a test_sql_file and an output_file) and one optional argument (--sqlite). When executed with ./cli.py test.sql output.txt, it loads the test.sql file, passes it to the database (your project.py file) and writes a report to output.txt. If you give it the optional argument (./cli.py --sqlite test.sql output.txt), it passes the SQL statements to sqlite instead. You can use this to see the correct output.
  • run_tests.py: Run this file ("./run_tests.py") to run every test in Test_Suite with cli.py and get your current grade.
  • README.txt: You need to create this file, and it should contain your name, MSU email, feedback, and any sources beyond this course you used.

What you need to do

All of the code you write for this project must be in "project.py", you may not modify any of the existing tests or test runners.

"cli.py" imports "project.py" and expects to find a "connect" function with one parameter. It passes a filename (specifically ":memory:"), which isn't used in this project. The "connect" function should return an object that has two methods "execute" and "close". For each SQL statement in a test file, "cli.py" will pass that SQL statement as a string to the "execute" method. The method should return an empty list, unless a select statement was executed that yielded rows. In that case, a list of tuples with each denoting a row, should be returned. The "close" method doesn't do anything, yet.

There is no need to write any database output to any files. Persistence will be covered in a later project. Each test will be done on a clean, empty database.

Your program is allowed to use any built in modules of python, with the exception of sqlite3. sqlite3 is the reference implementation for this project, your output will be compared against it for correctness. Importing the sqlite3 module in project.py will be considered academic dishonesty for these projects.

You may (and should) write additional functions and classes in "project.py" to perform the needed actions.

SQL Keywords

All SQL keywords will be in ALL_CAPS. For this project, the SQL keywords you need to handle are as follows (red are new for this project):

CREATE TABLE
INSERT INTO VALUES
SELECT FROM ORDER BY
UPDATE SET DELETE
WHERE DISTINCT
LEFT OUTER JOIN

New SQL Syntax

Extensions over Project 2

  • Identifiers (table and column names) must start with a letter (a-zA-Z), but also can have underscores (_), numbers (0-9), and additional letters.
  • Single quoted strings may have single quotes in them (escaped with a preceding single quote), you need to store and return the python string after removing this escaping. Example: 'My dog''s name' -> "My dog's name".
  • Any time a column name is used it may be qualified with a table name ("id" is the same thing as "students.id" if the table being SELECT'ed is "students").
  • The "*" column name may be qualified or included as one of multiple columns (SELECT *, name ..., SELECT student.*, name ...).

INSERT INTO

The INSERT INTO statement can specify the columns and order of the VALUES being inserted: INSERT INTO students (id, name) VALUES ... If not all the columns of a table are specified, the absent columns will have NULL inserted. Multiple rows can be inserted with a single INSERT INTO statement: INSERT INTO students (name, grade) VALUES ('Josh', 3.7), ('Tyler', 2.5), ('Hangchen', 3.9);

WHERE clause

SELECT (as well as, UPDATE and DELETE) statements can have a WHERE clause that specifies what rows should be processed. SELECT * WHERE id > 4; To make things easier, all WHERE clauses will be in this form: WHERE column_name operator value The column_name may be qualified. The operator will be one of: >, <, =, !=, IS NOT, IS. The value will be a constant (not a different column or expression). There won't be any parentheses, ANDs or ORs in the projects.

DELETE statement

DELETE works much like UPDATE, but instead removes all rows from a table, (unless a WHERE clause is added, in that case only removes the rows which pass the predicate). DELETE FROM students; DELETE FROM students WHERE id > 4;

UPDATE statement

You will need to add the UPDATE statement: UPDATE table_name SET col1 = value1, col2 = value2; An UPDATE statement changes the associated columns to the value. For simplicity, the value will always be a constant. If a WHERE clause is added, only those rows will be updated (see above). UPDATE student SET grades=4.0 WHERE name = 'Josh';

DISTINCT keyword

The DISTINCT keyword specifies you only want the unique values (no duplicates). For simplicity, we the tests will only use the DISTINCT keyword with a single output column. Example: SELECT DISTINCT column_name FROM ... ORDER BY column_name;

LEFT OUTER JOIN clause

The LEFT OUTER JOIN is the only join you need to implement for this project. If will be of the form: SELECT "columns" FROM "table_a" LEFT OUTER JOIN "table_b" ON "column from table_a" = "column from table_b" ORDER BY "columns"; As shown above, the ON clause will always match on equality with a column from the left table then the right table.