CS 242 Fall 2009 : Assignment 3.1

This page last changed on Oct 12, 2009 by cemeyer2.

Assignment 3.1 – Parsing Data Feeds Into a Database

Goals:

  1. Learn the basics of a web scripting language such as php
  2. Learn the basics of parsing data feeds
  3. Learn the basics of databases such as MySQL

For this assignment, please commit your code to a directory named Assignment3.1 in your subversion repository.

Language: a web scripting language, XML, and SQL

Note: To complete this assignment, you will need access to a web server that can run pages generated by a scripting language such as php, perl, or jsp. You will also need access to a database server to store the data that you parse. As part of your enrolment in this course, you have been provided with an account on the csil-projects server. Csil-projects provides each student with web space to run php code and access to a single MySQL database. Csil-projects will be the only supported platform for completing this assignment, but if you wish to host your code somewhere else, it is up to you to get the infrastructure set up. Your code must also be able to be shown in class.

This week we will be parsing and storing data, next week we will be taking that data from your database and mashing it with a web 2.0 technology such as Google Maps.

Csil-projects

Documentation from TSG on csil-projects can be found at https://agora.cs.illinois.edu/display/CSIL/csil-projects.cs.uiuc.edu. There is one important change to note. Database names are of the form cs242_netid, so for me, my database would be named cs242_cemeyer2. Passwords are your netiddbpw, so for me, my database password would be cemeyer2dbpw. Please contact me if you cannot access your web space or database. Note for this course, we will be using the MySQL database, not Oracle.

Intro to php

Here are several good tutorials to get you started on the basics of php:

Intro to MySQL

For this assignment, you will only need to create your tables, insert data into them, and select from them.

The assignment

Your job for this week is to take one of the given data feeds below and parse it into a database table(s). You will then create a simple page that shows the data from your database in table form.

Steps:

  1. pick an xml feed
  2. design your database table
  3. write php to fetch the feed, parse the feed, and insert it into the database
  4. write php to select all the data in your table and show it in table form (we learned HTML tables in assignment 1.2)

It is sufficient for this week to have the php that parses your page simply do its job when the page loads, but be careful not to duplicate data in your tables if you run your code multiple times. One idea might be to have your php drop the table if it exists and then recreate it each time the parser page is invoked. Also, if your api has multiple parameters, such as the year of data to fetch, the state to fetch for, etc, either have your parser take those parameters as input or parse all possibilities into your database. The documentation for each feed should be attached to each feed.

to fetch a remote data feed into a string, use something like:

<?php
#enable error reporting, otherwise errors will not be printed to screen
ini_set('display_errors', true);
ini_set('error_reporting', E_ALL);

....some code before this....

$feed = file_get_contents('http://www.usaspending.gov/faads/faads.php?datype=X&detail=2&recipient_name=Smith&fiscal_year=2006');

# $feed now contains the contents of the feed, as a string that can be parsed
....some code after this....
?>

Data Feeds

For this assignment, we will be using Government generated data to parse into our database. Note, some feeds may require you to sign up for an API key before you can access them. This should be a short process. In addition, think about how you might mash the data with a web 2.0 technology such as Google Maps, Facebook, or Twitter, since this will be the assignment for next week. Also, make sure you pick a feed which is in XML format, not another format such as CSV or KML.

Rubric

  • Usabaility
    • Your program exit cleanly in all cases: 2
    • It is clear how your program should be used: 2
    • There are no unwanted side effects (i.e. clobbered files): 2
  • Documentation
    • Your code contains ample comments: 4
    • Your program provides useful feedback to the user: 4
  • Code Design
    • Your code is modular: 4
    • Functions only do one thing: 2
    • Each function is no longer than 25 lines of code: 2
    • There were no code smells in your code (-2 for each): 10
    • Your database design makes sense for your data feed: 4
  • Functionality
    • You can read in a remote data feed: 4
    • You can parse the xml: 4
    • You can insert the data into the database: 4
    • You wrote a simple page to display the data in the database: 4
  • Participation
    • You actively participated in discussion: 4
    • You came prepared based on your assigned code reading: 4
Total 56 Points
Document generated by Confluence on Feb 22, 2012 18:18

  1. No comments yet.

  1. No trackbacks yet.