Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
ThePyProgrammer committed Feb 19, 2022
1 parent 15d0be1 commit bff9fa2
Show file tree
Hide file tree
Showing 100 changed files with 4,554 additions and 77 deletions.
140 changes: 116 additions & 24 deletions research/Data Retrieval and Generation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "markdown",
"id": "1309caee",
"id": "a49bdfe8",
"metadata": {},
"source": [
"# Data Retrieval and Generation\n",
Expand All @@ -12,7 +12,7 @@
},
{
"cell_type": "markdown",
"id": "77bf1190",
"id": "5773bdb2",
"metadata": {
"toc": true
},
Expand All @@ -23,7 +23,7 @@
},
{
"cell_type": "markdown",
"id": "4cec12af",
"id": "638e2bd9",
"metadata": {},
"source": [
"## Set-Up and Imports\n",
Expand All @@ -37,8 +37,8 @@
"id": "62d2265e",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-18T03:58:09.174279Z",
"start_time": "2022-02-18T03:58:06.955285Z"
"end_time": "2022-02-19T07:22:40.100008Z",
"start_time": "2022-02-19T07:22:34.867555Z"
}
},
"outputs": [],
Expand All @@ -52,7 +52,7 @@
},
{
"cell_type": "markdown",
"id": "ff48bb29",
"id": "d7ed0258",
"metadata": {},
"source": [
"### Listing out all functions"
Expand All @@ -64,8 +64,8 @@
"id": "467895d1",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-18T03:58:09.204278Z",
"start_time": "2022-02-18T03:58:09.178280Z"
"end_time": "2022-02-19T07:22:40.131360Z",
"start_time": "2022-02-19T07:22:40.106012Z"
}
},
"outputs": [
Expand All @@ -88,28 +88,28 @@
},
{
"cell_type": "markdown",
"id": "b562bdaf",
"id": "9db2306c",
"metadata": {},
"source": [
"## Last-Minute Retrieval of more Articles"
]
},
{
"cell_type": "markdown",
"id": "4c10cb91",
"id": "0f3cf576",
"metadata": {},
"source": [
"### The New York Times"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ba15bb17",
"execution_count": 3,
"id": "cc6e85c1",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-18T04:22:28.144449Z",
"start_time": "2022-02-18T04:22:05.628527Z"
"end_time": "2022-02-19T07:22:43.459281Z",
"start_time": "2022-02-19T07:22:40.136360Z"
}
},
"outputs": [],
Expand All @@ -120,7 +120,7 @@
},
{
"cell_type": "markdown",
"id": "3abef8a4",
"id": "d3399017",
"metadata": {},
"source": [
"## Unique URLs\n",
Expand All @@ -130,12 +130,12 @@
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e3433d42",
"execution_count": 4,
"id": "08625ece",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-18T04:22:28.240449Z",
"start_time": "2022-02-18T04:22:28.210454Z"
"end_time": "2022-02-19T07:22:43.522282Z",
"start_time": "2022-02-19T07:22:43.465282Z"
}
},
"outputs": [
Expand All @@ -158,7 +158,7 @@
},
{
"cell_type": "markdown",
"id": "f9b1e4db",
"id": "b8635c98",
"metadata": {},
"source": [
"## Retrieval"
Expand All @@ -170,8 +170,8 @@
"id": "cc00d6ab",
"metadata": {
"ExecuteTime": {
"end_time": "2022-02-18T04:00:02.613608Z",
"start_time": "2022-02-18T03:58:13.419446Z"
"end_time": "2022-02-19T07:23:58.450141Z",
"start_time": "2022-02-19T07:22:43.527281Z"
},
"scrolled": false
},
Expand All @@ -180,11 +180,16 @@
"name": "stdout",
"output_type": "stream",
"text": [
"datagen/nytimes/at-yosemite-a-waterfall-turns-into-a-firefall.txt\n",
"datagen/nytimes/what-to-do-when-you-dont-want-to-run.txt\n",
"datagen/nytimes/seasonal-depression-covid.txt\n",
"datagen/nytimes/women-stem-pandemic.txt\n",
"datagen/nytimes/ai-education-neural-networks.txt\n",
"datagen/nytimes/motivation-energy-advice.txt\n",
"datagen/nytimes/smartphones-iphone-android.txt\n",
"datagen/nytimes/depression-anxiety-physical-health.txt\n",
"datagen/nytimes/hearing-aids-fda.txt\n",
"datagen/nytimes/yosemite-falls.txt\n",
"datagen/nytimes/google-facebook-advertising.txt\n",
"datagen/nytimes/metaverse-politics-disinformation-society.txt\n",
"datagen/nytimes/apple-face-computers.txt\n",
Expand All @@ -194,13 +199,17 @@
"datagen/nytimes/metaverse-gaming-definition.txt\n",
"datagen/nytimes/how-excited-are-you-about-the-metaverse.txt\n",
"datagen/nytimes/facebook-experiments.txt\n",
"datagen/nytimes/olympics-beijing-xi-putin.txt\n",
"datagen/nytimes/fact-check-joe-rogan-robert-malone.txt\n",
"datagen/nytimes/ukraine-conflict-russia-military.txt\n",
"datagen/nytimes/cecil-taylor-return-concert.txt\n",
"datagen/nytimes/food-english-foreign-languages.txt\n",
"datagen/nytimes/death-certificate-cause.txt\n",
"datagen/nytimes/natural-wines.txt\n",
"datagen/nytimes/burnout-work-stress.txt\n",
"datagen/nytimes/basquiat-painting-orlando-mumford-museum.txt\n",
"datagen/nytimes/finland-bordertown-piece-of-my-heart.txt\n",
"datagen/nytimes/uc-berkeley-admissions-court-ruling.txt\n",
"datagen/nytimes/covid-depression-anxiety.txt\n",
"datagen/nytimes/flight-attendants-covid.txt\n",
"datagen/nytimes/law-order-svu-organized-crime.txt\n",
Expand All @@ -213,17 +222,29 @@
"datagen/nytimes/oddity-ceramics-surrealism-art.txt\n",
"datagen/nytimes/metamates-googlers.txt\n",
"datagen/nytimes/energy-savings-nest.txt\n",
"datagen/nytimes/focus-johann-hari.txt\n",
"datagen/nytimes/olympics-china-american-athletes.txt\n",
"datagen/nytimes/andrew-prince-charles-charity.txt\n",
"datagen/nytimes/teresa-reichlen-retiring-from-new-york-city-ballet.txt\n",
"datagen/nytimes/faith-ringgold-new-museum.txt\n",
"datagen/nytimes/stonehenge-british-museum.txt\n",
"datagen/nytimes/gregory-peck-mockingbird-sequel.txt\n",
"datagen/nytimes/kanye-west-jeen-yuhs-documentary.txt\n",
"datagen/nytimes/rokia-kone-jacknife-lee-bamanan-review.txt\n",
"datagen/nytimes/spotify-joe-rogan-misinformation.txt\n",
"datagen/nytimes/comedy-jewish-identity.txt\n",
"datagen/nytimes/sam-waterston-law-and-order.txt\n",
"datagen/nytimes/severance-review.txt\n",
"datagen/nytimes/state-of-the-union-painting-with-john.txt\n",
"datagen/nytimes/things-to-do-this-weekend.txt\n",
"datagen/nytimes/ukraine-donald-trump-beijing-olympics.txt\n",
"datagen/nytimes/jeff-koons-bmw.txt\n",
"datagen/nytimes/jobs-hiring-fraud.txt\n",
"datagen/nytimes/oscars-vaccine-mandate-coronavirus.txt\n",
"datagen/nytimes/watches-obscure-brands-switzerland.txt\n",
"datagen/nytimes/babies-work-meeting.txt\n",
"datagen/nytimes/berlin-film-festival-2022.txt\n",
"datagen/nytimes/dog-review.txt\n",
"datagen/nytimes/cuomo-melissa-derosa-sexual-harassment.txt\n",
"datagen/nytimes/letitia-james-ny-attorney-general.txt\n",
"datagen/nytimes/mall-fight-bridgewater-commons-nj.txt\n",
Expand All @@ -244,18 +265,83 @@
"datagen/nytimes/ukraine-russia-us-troops.txt\n",
"datagen/nytimes/basketball-celtics-ime-udoka.txt\n",
"datagen/nytimes/kamila-valieva-falls-fourth-figure-skating.txt\n",
"datagen/nytimes/family-birthday-reminders-social-qs.txt\n",
"datagen/nytimes/wall-street-hotel.txt\n",
"datagen/nytimes/ski-tricks-utah.txt\n",
"datagen/nytimes/durham-right-wing-media-trump.txt\n",
"datagen/nytimes/fourth-dose-covid-vaccine.txt\n",
"datagen/nytimes/high-risk-covid-immunocompromised.txt\n",
"datagen/nytimes/oakland-hills-country-club-fire.txt\n",
"datagen/nytimes/biden-immigration-public-charge-trump.txt\n",
"datagen/nytimes/blinken-russia-ukraine-predictions.txt\n",
"datagen/nytimes/durham-right-wing-media-trump.txt\n",
"datagen/nytimes/justice-department-cybersecurity.txt\n",
"datagen/nytimes/kevin-mccarthy-harriet-hageman-liz-cheney.txt\n",
"datagen/nytimes/senate-spending-bill-shutdown.txt\n",
"datagen/nytimes/tiktok-ava-majury.txt\n",
"datagen/nytimes/san-francisco-school-board-parents.txt\n",
"datagen/nytimes/help-friend-support.txt\n",
"datagen/nytimes/felicity-ace-vessel-fire.txt\n",
"datagen/nytimes/stanytsia-lushankya-shelling.txt\n",
"datagen/nytimes/ukraine-conflict-russia-military.txt\n",
"datagen/nytimes/tyshawn-sorey-rothko-chapel.txt\n",
"datagen/nytimes/trevor-noah-russia-ukraine.txt\n",
"datagen/nytimes/red-covid-partisan-deaths-vaccines.txt\n",
"datagen/nytimes/china-coronavirus-vaccines.txt\n",
"datagen/nytimes/federal-reserve-trading-restrictions.txt\n",
"datagen/nytimes/allison-gollust-cnn-cuomo.txt\n",
"datagen/nytimes/what-to-cook-this-weekend.txt\n",
"datagen/nytimes/girls-eating-disorders-pandemic.txt\n",
"datagen/nytimes/sacklers-opioids-lawsuit.txt\n",
"datagen/nytimes/eileen-gu-chinese-american.txt\n",
"datagen/nytimes/hadley-palmer-greenwich.txt\n",
"datagen/nytimes/homeless-people-subway-trains-mta.txt\n",
"datagen/nytimes/christopher-buckley-pj-orourke.txt\n",
"datagen/nytimes/congress-stock-trading-ban.txt\n",
"datagen/nytimes/ezra-klein-podcast-alex-tabarrok.txt\n",
"datagen/nytimes/inflation-us-consumer-surveys.txt\n",
"datagen/nytimes/susan-collins-eca-reform.txt\n",
"datagen/nytimes/space-china-billionaires.txt\n",
"datagen/nytimes/child-tax-credit-poverty-benefits.txt\n",
"datagen/nytimes/taxes-remote-work.txt\n",
"datagen/nytimes/the-gilded-age.txt\n",
"datagen/nytimes/us-history-censorship.txt\n",
"datagen/nytimes/lab-grown-meat-sleep-airtags.txt\n",
"datagen/nytimes/covid-nursing-shortages.txt\n",
"datagen/nytimes/home-buyer-risks-bad-credit-savings.txt\n",
"datagen/nytimes/phil-mickelson-saudi-golf-tour.txt\n",
"datagen/nytimes/norway-medals-winter-olympics.txt\n",
"datagen/nytimes/olympics-skating-valieva-age.txt\n",
"datagen/nytimes/pairs-figure-skating-short-program.txt\n",
"datagen/nytimes/pairs-figure-skating-short-program.txt\n",
"datagen/nytimes/american-girl-cafe-harry-hill-serena-kerrigan.txt\n",
"datagen/nytimes/modern-love-i-tried-so-hard-to-be-good.txt\n",
"datagen/nytimes/china-olympics-propaganda.txt\n",
"datagen/nytimes/australia-tourism-covid.txt\n",
"datagen/nytimes/bucket-list-travel.txt\n",
"datagen/nytimes/ahmaud-arbery-mcmichael-trial.txt\n",
"datagen/nytimes/california-state-chancellor-resigns.txt\n",
"datagen/nytimes/kim-potter-sentence-manslaughter.txt\n",
"datagen/nytimes/nashville-gerrymandering-republican-democrat.txt\n",
"datagen/nytimes/biden-ukraine-russia.txt\n",
"datagen/nytimes/civil-suits-trump-jan-6.txt\n",
"datagen/nytimes/congress-russia-sanctions.txt\n",
"datagen/nytimes/jim-hagedorn-dead.txt\n",
"datagen/nytimes/melania-trump-charity-donation.txt\n",
"datagen/nytimes/prosecutors-midterms-crime.txt\n",
"datagen/nytimes/putin-ukraine.txt\n",
"datagen/nytimes/submarine-spy-guilty-plea.txt\n",
"datagen/nytimes/supreme-court-remain-in-mexico-asylum.txt\n",
"datagen/nytimes/texas-primary-voting-law.txt\n",
"datagen/nytimes/trump-archives-white-house.txt\n",
"datagen/nytimes/seattle-bicycle-helmet.txt\n",
"datagen/nytimes/adhd-dating-relationships.txt\n",
"datagen/nytimes/afghanistan-boy-dies-well.txt\n",
"datagen/nytimes/canada-protest-arrests.txt\n",
"datagen/nytimes/eunice-storm-damage.txt\n",
"datagen/nytimes/london-highgate-cemetery-dispatch.txt\n",
"datagen/nytimes/putin-russia-ukraine.txt\n",
"datagen/nytimes/ukraine-russia-separatists-shelling.txt\n",
"datagen/nature/d41586-021-00954-8.txt\n",
"datagen/nature/d41586-021-01170-0.txt\n",
"datagen/nature/d41586-021-01812-3.txt\n",
Expand All @@ -267,7 +353,13 @@
"datagen/nature/d41586-022-00227-y.txt\n",
"datagen/nature/d41586-022-00334-w.txt\n",
"datagen/nature/d41586-022-00335-9.txt\n",
"datagen/nature/d41586-022-00414-x.txt\n",
"datagen/nature/d41586-022-00414-x.txt\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"datagen/forbes/work-experience-resume-overqualified-job-forbes-woman-leadership-career.txt\n",
"datagen/forbes/why-is-work-experience-so-undervalued.txt\n"
]
Expand All @@ -289,7 +381,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "112156d0",
"id": "616b2108",
"metadata": {},
"outputs": [],
"source": []
Expand Down
Loading

0 comments on commit bff9fa2

Please sign in to comment.