CMPU 101 - Assignment 8 Due: by 11:59 PM on Monday, May 8th

$Revision: 1.5 $

In this assignment you will write a program to tabulate the popularity of baby names in different decades, based on data from the Social Security Administration.

Acknowledgment: the idea for this project comes from Nick Parlante at Stanford University.

The Data

In this project you will read data from a large text file, babynames.txt.  Each line of the file is a single record.  Each record represents the popularity of a single name in a particular decade, and consists of the following data items:

Decade
The decade
Rank
The popularity of the name (1 for most popular, 1000 for least popular)
Sex
Either 'M' or 'F'
Name
The name
Count
The total number of babies given this name

Each data item in a record is separated by colon (':') characters.  The first line of the file is:

1900:1:M:John:84602

This means in the 1900s, John was the most popular (Rank = 1) name recorded for boys.

Your Task

Your task is to write a program that performs the following tasks:

  1. Prompts the user for the name of the file containing the baby name records.  Reads the file into a data structure containing each record.

  2. Prompts the user for a name and sex (M/F).

  3. Plots a graph showing the popularity of the name for each decade, 1900 through 1990.  Each point on the graph is plotted as a single line of text as shown below.

  4. After plotting the graph, prompts to determine whether or not another name should be plotted.  If the user answers yes, go back to step 2.

Here is an example transcript (user input shown in bold):

Name of baby names file: babynames.txt
Reading file...
Done.
Name to plot: Eli
Male or female? (M/F): M
Ranks for Eli (M)
     |---------|---------|---------|---------|---------|---------|---------|---------|
1900: --------------------------------------------------------* (290)
1910: ------------------------------------------------------* (326)
1920: --------------------------------------------------* (374)
1930: -------------------------------------------* (455)
1940: -----------------------------------* (558)
1950: ----------------------------* (640)
1960: --------------------------* (674)
1970: ---------------------------------------------* (428)
1980: ----------------------------------------------------* (351)
1990: ------------------------------------------------------* (321)
Another plot? (y/n): y
Name to plot: Madison
Male or female? (M/F): F
Ranks for Madison (F)
     |---------|---------|---------|---------|---------|---------|---------|---------|
1900: * (1001)
1910: * (1001)
1920: * (1001)
1930: * (1001)
1940: * (1001)
1950: * (1001)
1960: * (1001)
1970: * (1001)
1980: -------------------------------------* (538)
1990: -----------------------------------------------------------------------------* (29)
Another plot? (y/n): n

This session shows plots for two names: Eli (male) and Madison (female).  The popularity of each name in one decade is plotted as a "bar" consisting of a number of dash ('-') characters, followed by an asterisk ('*'), followed by the rank of the name in the decade in parentheses.  The maximum length of any bar should be 80 characters (for names that rank as number 1 in the decade).  The minimum length should be 0 (for a name that is the least popular, or is not ranked).  Ranks that fall between least and most popular should be plotted between length 0 and 80, based on where the rank lies proportionally between least popular and most popular.  For example, a name that ranks 500 should be plotted as length 40.

An unranked name (not appearing at all for a particular decade) should be considered the least popular, with rank equal to 1001.  In the example above, the name "Madison" for girls does not appear in the data until the 1980s.

Suggested Approach

You can approach this problem in much the same way as Lab 10.  Use two classes: one to represent a single record in the file, and another to represent the data for the entire file.  The second class (the tabulator) can have a method to read the file whose filename is given as a parameter, and a method to search for the rank of a name for a given name, sex, and decade.

Using the two classes described above, write a third class called NameSurfer containing a main method that performs the tasks described above.

Reading the File

Two suggestions for reading the file:

  1. Use a BufferedReader object to read the file, using the readLine method to read each record.

  2. You can create a Scanner object to read the data items out of the String containing one line of the file as follows:
    Scanner lineScanner = new Scanner(line).useDelimiter(":");
    

    This assumes that the String you want to decode is in a variable called line.  Once you have created the scanner, you can use the next and nextInt methods to read Strings and ints, respectively.  The Scanner takes care of the job of separating the input based on the colon (':') characters that separate the data items.  You will need a new Scanner object for each line you read.

Testing Your Program

Assuming your main method is in a class called NameSurfer you can choose to run it in either the DrJava interactions window or a terminal window (in both cases, using the command java NameSurfer).  In either case, make sure that the directory containing the Java source files contains the file "babynames.txt".

Submitting

You may either:

  1. Submit a zipfile containing all of your Java source files, or

  2. Submit each Java source file individually