SQL Data Cleaning and Management


Avoiding Parking Tickets

When visiting a major city, there are many things to do and interesting people. But in car-centric cities, finding parking can be a hassle. And paying for parking, much less getting a parking ticket is even worse.

  • If I were to visit a neighboring city on vacation, how can I strategize to avoid a parking ticket?

Sub Questions:

  • What time of day am I most likely to get a ticket?
  • What is the most common violation, and where does it occur?
  • Is there a specific Month or Weekday I am less likely to get a ticket?

Overview

  • Data sourced from DOT in Montgomery, Maryland. Link: Data.gov
  • 150,000+ rows of data, SQL is best use for this volume
  • Designed ERD to ensure Entity and Relationship Integrity
  • Processed Raw Data (cleaned addresses, classified violations, etc…)
    • See Cleaning Strategy for details
  • Loaded cleaned data into schema design
ERD done in PostGres Admin

Results

Click here to see interactive dashboard. See my Github to view all SQL used.

Recommendations for avoiding tickets:

  1. 80% of tickets are expired parking. Be punctual, or pay for extra time.
  2. If you are neither punctual or have deep pockets, park away from areas with lots of activity. Locations near malls and food hotspots such as Woodmont Ave. ,Cameron St., and Bethesda make up more than 20% of ticketing locations
  3. Take advantage of “weekend slumps”. Visit and park on Monday or Saturdays when workers are less active
  4. Morning is the most risky time to park, and should be avoided. There is a skew in the data collection. No tickets occurred from 6pm-12am. This could show a gap in DOT staffing.

More on Cleaning Strategy

  1. Remove Duplicate and Null Rows
  2. Use window function to create unique vehicle MakeID, to reduce redundancy
  3. Clean Ticket Locations by splitting into address, street, and lot #
  4. Replaced redundant false-null lot # for addresses that had both null and non-null values
  5. Classify time of ticketing to time of day: Morning, Afternoon, Night, and Late Night
  6. Condense Parking Violation Types (From 41 similar types, down to 7 distinct)
2) Window function to create MakeID for vehicles table
–Creating Unique MakeID for Vehicle Table
UPDATE "DOTclean"
SET
car_make = temp."Vehicle Make",
car_type = temp."Vehicle Type",
car_color = temp."Vehicle Color",
"makeID" = temp."makeID"
FROM (SELECT "Ticket Number",
"Vehicle Make",
"Vehicle Type",
"Vehicle Color",
DENSE_RANK() OVER(ORDER BY ("Vehicle Make","Vehicle Type","Vehicle Color")) as "makeID"
FROM "DOTraw"
) temp
Where temp."Ticket Number" = "ticketID";
view raw makeID.sql hosted with ❤ by GitHub
3) Cleaning Locations
— CTE to remove lot/garage in front of address
WITH CTE as (
Select "Ticket Location",
"Ticket Number",
CASE
when (SPLIT_PART("Ticket Location",'-',2) = '') is TRUE Then SPLIT_PART("Ticket Location",'-',1)
when (SPLIT_PART("Ticket Location",'-',2) = '') is FALSE Then SPLIT_PART("Ticket Location",'-',2)
END Address,
CASE
when (SPLIT_PART("Ticket Location",'-',2) = '') is FALSE Then SPLIT_PART("Ticket Location",'-',1)
Else NULL
END Lot
From "DOTraw"
)
–Storing address in clean table. And removing
–numbers to get street get street name, using regex. \D is any non-digit & a{m,n} finds between m and n of a
UPDATE "DOTclean"
SET address = CTE.Address,
"lot/garage" = CTE.Lot
From CTE
Where CTE."Ticket Number" = "ticketID"
AND length(substring(CTE.Address,'\D{1,36}')) > 4; –discludes 174 addresses with VERY dirty addresses
UPDATE "DOTclean"
SET street = tem.street
FROM (Select substring(CTE.Address,'\D{1,36}') as street,
"Ticket Number"
from CTE) tem
Where tem."Ticket Number" = "ticketID"
AND length(tem.street) > 4; –discludes 174 addresses with VERY dirty addresses
DELETE FROM "DOTclean"
Where Address is NULL;
4) Clean Locations Table – filtered addresses that had both null/non-null lots
INSERT INTO "Locations" (address, street, "lot/garage")
Select address,
TRIM(max(street)) as street,
max("lot/garage") as "lot/garage"
from "DOTclean"
group by address;
5) Create Time Of Day Categories
  • Morning: 6am-noon
  • Afternoon: noon-6pm
  • Evening: 6pm-midnight
  • Late Night: midnight-6am
–time of day
UPDATE "DOTclean"
SET tick_date = temp."Date",
tick_time = temp."Time",
time_of_day = temp."time_of_day"
FROM (
Select "Ticket Number",
"Date",
"Time",
CASE
When "Time" between '06:00:00' and '12:00:00' Then 'Morning'
When "Time" between '12:00:00' and '18:00:00' Then 'Afternoon'
When "Time" between'18:00:00' and '24:00:00' Then 'Evening'
Else 'Late Night'
End time_of_day
From "DOTraw"
) temp
Where temp."Ticket Number" = "ticketID"
view raw time_of_day.sql hosted with ❤ by GitHub
Condense Parking Violations to 7 types

Originally there were 41 classifications of violations. But many types were self-similar. For example, ‘off street’ and ‘on street’ are of the same overtime parking violation.

— Condense 41 violation codes into 7, called vio_type
UPDATE "DOTclean"
SET vio_code = temp."Violation Code",
vio_description = temp."Violation Description",
vio_type = temp."violation_type"
FROM (
Select "Ticket Number",
"Violation Code",
"Violation Description",
CASE
When "Violation Code" = 10 or "Violation Code" = 12 or "Violation Code" = 17 or "Violation Code" = 22 or "Violation Code" = 22 or "Violation Code" = 23 or "Violation Code" = 25 or "Violation Code" = 31 or "Violation Code" = 32 or "Violation Code" = 57 or "Violation Code" = 11 or "Violation Code" = 18 Then 'No Standing Parking'
When "Violation Code" = 3 or "Violation Code" = 5 or "Violation Code" = 34 or "Violation Code" = 37 Then 'No Parking Anytime'
When "Violation Code" = 7 or "Violation Code" = 41 or "Violation Code" = 50 Then 'Expired/Overtime'
When "Violation Code" = 9 or "Violation Code" = 35 or "Violation Code" = 36 or "Violation Code" = 43 or "Violation Code" = 54 or "Violation Code" = 1 or "Violation Code" = 47 or "Violation Code" = 55 Then 'Sign Violation or Impeding Traffic'
When "Violation Code" = 2 or "Violation Code" = 38 or "Violation Code" = 42 or "Violation Code" = 48 or "Violation Code" = 39 or "Violation Code" = 40 or "Violation Code" = 46 or "Violation Code" = 56 Then 'Prohibited Vehicle'
When "Violation Code" = 8 or "Violation Code" = 19 or "Violation Code" = 20 or "Violation Code" = 24 or "Violation Code" = 27 or "Violation Code" = 28 or "Violation Code" = 59 or "Violation Code" = 45 or "Violation Code" = 4 Then 'Orientation Violation'
Else NULL
END violation_type
From "DOTraw"
) temp
Where temp."Ticket Number" = "ticketID"
view raw vio_type.sql hosted with ❤ by GitHub