Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assignment 1 - Tiffany Luo #26

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 31 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,32 +66,62 @@ Write a query to answer each of the questions below.

2. [What is the percent change in trips in Q3 2022 as compared to Q3 2021?](query02.sql)

**Result:** 3.98%

3. [What is the average duration of a trip for 2021?](query03.sql)

**Result:** 18.86 minutes

4. [What is the average duration of a trip for 2022?](query04.sql)

**Result:** 17.88 minutes

5. [What is the longest duration trip across the two quarters?](query05.sql)

_Why are there so many trips of this duration?_

**Answer:**
**Answer:The result shows that the max duration for trips is 1440 minutes, equivalent to 24 hours, within the Indego bike-sharing dataset. It could be attributed to users forgetting to terminate their trips or due to a failure in locking the bikes, resulting in an improper check-in. Given that the bike-sharing service may have a daily maximum charge limit, thus, the duration of trips lasting exactly 1440 minutes could arise from users neglecting to end their trips. Furthermore, some trips consistently have identical start and end position, as evidenced by the exact matching of latitude and longitude coordinates, suggesting systemic issues with certain bikes.**


**Result:** 1440 minutes

6. [How many trips in each quarter were shorter than 10 minutes?](query06.sql)

**Result:** 2021 q3: 124,582 trips; 2022 q3: 137,372 trips

7. [How many trips started on one day and ended on a different day?](query07.sql)

**Result:** 2021 q3: 2301 trips; 2022 q3: 2060 trips

8. [Give the five most popular starting stations across all years between 7am and 9:59am.](query08.sql)

_Hint: Use the `EXTRACT` function to get the hour of the day from the timestamp._

**Result:** Five most popular station id: 3032, 3102, 3012, 3066, 3007

9. [List all the passholder types and number of trips for each across all years.](query09.sql)

**Result:**
Day Pass: 61,659
Indego30: 441,856
Indego365: 109,251
Null: 43
Walk-up: 2

10. [Using the station status dataset, find the distance in meters of each station from Meyerson Hall.](query10.sql)

11. [What is the average distance (in meters) of all stations from Meyerson Hall?](query11.sql)

**Result:** 3000

12. [How many stations are within 1km of Meyerson Hall?](query12.sql)

**Result:** 16

13. [Which station is furthest from Meyerson Hall?](query13.sql)

**Result:** It's Manayunk Bridge Station

14. [Which station is closest to Meyerson Hall?](query14.sql)

**Result:** It's 34th and Spruce Station
1 change: 1 addition & 0 deletions __scripts__/create_trip_tables.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ create table indego.trips_2021_q3 (

drop table if exists indego.trips_2022_q3;


create table indego.trips_2022_q3 (
trip_id text,
duration integer,
Expand Down
5 changes: 4 additions & 1 deletion query01.sql
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
Example: How many bike trips in Q3 2021. Name the resulting column
num_trips.
*/

-- 300,432 trips

-- Enter your SQL query here
select count(*) as num_trips
from indego.trips_2021_q3
from indego.trips_2021_q3
25 changes: 25 additions & 0 deletions query02.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
What is the percent change in trips in Q3 2022 as compared to Q3 2021?

Expand All @@ -9,8 +10,32 @@
Remember you can do calculations in the select clause.
*/

-- 3.98

-- Enter your SQL query here

WITH Trips AS (
SELECT
COUNT(*) AS num_trips_21,
0 AS num_trips_22
FROM indego.trips_2021_q3
UNION ALL
SELECT
0 AS num_trips_21,
COUNT(*) AS num_trips_22
FROM indego.trips_2022_q3
),
AggregatedCounts AS (
SELECT
SUM(num_trips_21) AS num_trips_21,
SUM(num_trips_22) AS num_trips_22
FROM Trips
)

SELECT
ROUND(((num_trips_22 - num_trips_21)::numeric / num_trips_21) * 100, 2) AS perc_change
FROM AggregatedCounts;



/*
Expand Down
3 changes: 3 additions & 0 deletions query03.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
What is the average duration of a trip for 2021?

Expand All @@ -6,3 +7,5 @@
*/

-- Enter your SQL query here
SELECT ROUND(AVG(duration)::NUMERIC, 2) AS avg_duration
FROM indego.trips_2021_q3;
5 changes: 5 additions & 0 deletions query04.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
What is the average duration of a trip for 2022?

Expand All @@ -6,3 +7,7 @@
*/

-- Enter your SQL query here
SELECT
ROUND(AVG(duration)::NUMERIC, 2) AS avg_duration
FROM
indego.trips_2022_q3;
10 changes: 10 additions & 0 deletions query05.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
What is the longest duration trip across the two quarters?

Your result should have a single row with a single column named max_duration.
*/

-- Enter your SQL query here
WITH TwoTables AS (
SELECT duration FROM indego.trips_2021_q3
UNION ALL
SELECT duration FROM indego.trips_2022_q3
)

SELECT MAX(duration) AS max_duration
FROM TwoTables;

15 changes: 15 additions & 0 deletions query06.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
How many trips in each quarter were shorter than 10 minutes?

Expand All @@ -7,3 +8,17 @@
*/

-- Enter your SQL query here

SELECT
2021 AS trip_year,
'3' AS trip_quarter,
COUNT(*) AS num_trips
FROM indego.trips_2021_q3
WHERE duration < 10
UNION
SELECT
2022 AS trip_year,
'3' AS trip_quarter,
COUNT(*) AS num_trips
FROM indego.trips_2022_q3
WHERE duration < 10
15 changes: 14 additions & 1 deletion query07.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
How many trips started on one day and ended on a different day?

Expand All @@ -6,7 +7,19 @@
*/

-- Enter your SQL query here

SELECT
2021 AS trip_year,
'3' AS trip_quarter,
COUNT(*) AS num_trips
FROM indego.trips_2021_q3
WHERE CAST(start_time AS date) != CAST(end_time AS date)
UNION
SELECT
2022 AS trip_year,
'3' AS trip_quarter,
COUNT(*) AS num_trips
FROM indego.trips_2022_q3
WHERE CAST(start_time AS date) != CAST(end_time AS date)


/*
Expand Down
40 changes: 40 additions & 0 deletions query08.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
- Active: 1707329957355@@localhost@5432@m509A1
/*
Give the five most popular starting stations across all years between 7am
and 9:59am.
Expand All @@ -10,6 +11,45 @@

-- Enter your SQL query here

WITH
trips_2021_q3 AS (
SELECT
start_station AS station_id,
ST_GEOMFROMTEXT('POINT(' || start_lon::text || ' ' || start_lat::text || ')')::geography AS station_geog
FROM indego.trips_2021_q3
WHERE EXTRACT(HOUR FROM start_time) >= 7 AND EXTRACT(HOUR FROM start_time) < 10
),
trips_2022_q3 AS (
SELECT
start_station AS station_id,
ST_GEOMFROMTEXT('POINT(' || start_lon::text || ' ' || start_lat::text || ')')::geography AS station_geog
FROM indego.trips_2022_q3
WHERE EXTRACT(HOUR FROM start_time) >= 7 AND EXTRACT(HOUR FROM start_time) < 10
),
combined_trips AS (
SELECT * FROM trips_2021_q3
UNION ALL
SELECT * FROM trips_2022_q3
)

SELECT
station_id,
station_geog,
COUNT(*) AS num_trips
FROM combined_trips
GROUP BY station_id, station_geog
ORDER BY num_trips DESC
LIMIT 5;











/*
Hint: Use the `EXTRACT` function to get the hour of the day from the
Expand Down
11 changes: 11 additions & 0 deletions query09.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,14 @@
*/

-- Enter your SQL query here
SELECT
passholder_type,
COUNT(*) AS num_trips
FROM (
SELECT passholder_type
FROM indego.trips_2021_q3
UNION ALL
SELECT passholder_type
FROM indego.trips_2022_q3
) AS all_trips
GROUP BY passholder_type;
9 changes: 9 additions & 0 deletions query10.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
Using the station status dataset, find the distance in meters of each
station from Meyerson Hall. Use latitude 39.952415 and longitude -75.192584
Expand All @@ -8,3 +9,11 @@
*/

-- Enter your SQL query here
SELECT
id AS station_id,
geog AS station_geog,
ROUND(ST_DISTANCE(geog, 'POINT(-75.192584 39.952415)'::geography) / 50) * 50 AS distance
FROM
indego.station_statuses;


3 changes: 3 additions & 0 deletions query11.sql
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
-- Active: 1707329957355@@localhost@5432@m509A1
/*
What is the average distance (rounded to the nearest km) of all stations
from Meyerson Hall? Your result should have a single record with a single
column named avg_distance_km.
*/

-- Enter your SQL query here
SELECT ROUND(AVG(ST_DISTANCE(geog, ST_MAKEPOINT(-75.192584, 39.952415)::GEOGRAPHY)::NUMERIC) / 1000, 0) AS avg_distance_km
FROM indego.station_statuses
3 changes: 3 additions & 0 deletions query12.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,6 @@
*/

-- Enter your SQL query here
SELECT COUNT(*) AS num_stations
FROM indego.station_statuses
WHERE ST_DISTANCE(geog, ST_MAKEPOINT(-75.192584, 39.952415)::GEOGRAPHY) < 1000
8 changes: 8 additions & 0 deletions query13.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,11 @@
*/

-- Enter your SQL query here
SELECT
id AS station_id,
name AS station_name,
ROUND(ST_DISTANCE(geog, 'POINT(-75.192584 39.952415)'::geography) / 50) * 50 AS distance
FROM
indego.station_statuses
ORDER BY distance DESC
LIMIT 1
7 changes: 7 additions & 0 deletions query14.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,10 @@
*/

-- Enter your SQL query here
SELECT
id AS station_id,
name AS station_name,
ROUND(ST_DISTANCE(geog, ST_MAKEPOINT(-75.192584, 39.952415)::GEOGRAPHY)::NUMERIC / 50, 0) * 50 AS distance
FROM indego.station_statuses
ORDER BY distance ASC
LIMIT 1
Loading