Google Certificate Capstone Project

These files document the capstone project for Google’s Data Analytics Professional Certificate.

I downloaded the csv from Stats Canada, removed the rows about other subjects (e.g. ethnicity, dwellings), and added columns to allow me to separate language families. To do this, I created one column per family and within that column, typed a “0” if the language was not in that family, “1” if it was a parent of other languages within that family, and “2” if it was the child language within that family. This allowed me to visualize groups of languages without included duplicate values that were counted in the parent family rows.

Because the R visualizations covered in this course seem to work best over a timeline, I downloaded the CSV from 2011. However, only a few languages were accounted for at this time and earlier (e.g. 2001) censuses didn’t have any language information beyond the official languages. R works well to clean and summarize this data but Tableau seems best for visualizations. For such a relatively small data set, SQL was probably overkill and it seemed possible to learn the same things from a pivot table and filtering/sorting in Excel.