Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semana 11, pandas #16

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions exercicios/para-casa/faixas_filtradas.json

Large diffs are not rendered by default.

43 changes: 43 additions & 0 deletions exercicios/para-casa/joyce_pandas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
import pandas as pd

#['Track', 'Album Name', 'Artist', 'Release Date', 'ISRC','All Time Rank', 'Track Score', 'Spotify Streams','Spotify Playlist Count', 'Spotify Playlist Reach','Spotify Popularity', 'YouTube Views', 'YouTube Likes', 'TikTok Posts','TikTok Likes', 'TikTok Views', 'YouTube Playlist Reach','Apple Music Playlist Count', 'AirPlay Spins', 'SiriusXM Spins','Deezer Playlist Count', 'Deezer Playlist Reach','Amazon Playlist Count', 'Pandora Streams', 'Pandora Track Stations','Soundcloud Streams', 'Shazam Counts', 'TIDAL Popularity','Explicit Track']

df_musicas = pd.read_csv ('../../material/mais_ouvidas_2024.csv')

print(df_musicas.head()) # mostra a "cabeça" do dataframe
print(df_musicas.columns) # mostra todas as colunas

# 2 - Indentifique as colunas que contêm números, como 'Spotify Streams', 'YouTube Views', etc., e converta essas colunas para o tipo numérico se estiverem em outro formato. (Use replace() e astype())

colunas = ['Track', 'Album Name', 'Artist', 'Release Date', 'ISRC','All Time Rank', 'Track Score', 'Spotify Streams','Spotify Playlist Count', 'Spotify Playlist Reach','Spotify Popularity', 'YouTube Views', 'YouTube Likes', 'TikTok Posts','TikTok Likes', 'TikTok Views', 'YouTube Playlist Reach','Apple Music Playlist Count', 'AirPlay Spins', 'SiriusXM Spins','Deezer Playlist Count', 'Deezer Playlist Reach','Amazon Playlist Count', 'Pandora Streams', 'Pandora Track Stations','Soundcloud Streams', 'Shazam Counts', 'TIDAL Popularity','Explicit Track']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Se essas são todas as colunas , por que não utilizar o df.columns()?

nulos = df_musicas.isnull() # retorna os valores nulos
print(nulos.sum()) # soma esses valores nulos
print(df_musicas.dtypes)

for col in colunas:
if df_musicas[col].dtypes == 'object':
df_musicas[col] = df_musicas[col].str.replace(',' , '').astype(float, errors='ignore')
Comment on lines +17 to +19
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuidado ao utilizar você precisou utilizar o erros=ignore porque esta tentando converter colunas do tipo object que de fato são object


# 3 - Corrija a coluna 'Release Date' para o formato datetime.

df_musicas['Release Date'] = pd.to_datetime(df_musicas['Release Date'], format= 'mixed')
print(df_musicas.dtypes)

# 4 - Crie uma nova coluna chamada 'Streaming Popularity', que seja a média da popularidade nas plataformas 'Spotify Popularity', 'YouTube Views', 'TikTok Likes', e 'Shazam Counts'. (lembrem-se que só é possível calcular médias e fazer operações matemáticas com tipos númericos)

df_musicas ['Streaming Popularity'] = df_musicas[['Spotify Popularity', 'YouTube Views', 'TikTok Likes', 'Shazam Counts']].median(axis=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df_musicas ['Streaming Popularity'] = df_musicas[['Spotify Popularity', 'YouTube Views', 'TikTok Likes', 'Shazam Counts']].median(axis=1)
df_musicas ['Streaming Popularity'] = df_musicas[['Spotify Popularity', 'YouTube Views', 'TikTok Likes', 'Shazam Counts']].mean(axis=1)

median() mede a mediana, no caso o que você precisa é a média 😄

print(df_musicas['Streaming Popularity'])

# 5 - Crie uma coluna 'Total Streams', somando os valores de 'Spotify Streams', 'YouTube Views', 'TikTok Views', 'Pandora Streams', e 'Soundcloud Streams'.

df_musicas ['Total Streams'] = df_musicas[['Spotify Streams', 'YouTube Views', 'TikTok Views', 'Pandora Streams','Soundcloud Streams']].sum(axis=1)
print(df_musicas['Total Streams'])

# 6 - Filtre apenas as faixas onde a popularidade do Spotify ('Spotify Popularity') é maior que 80 e que tenham mais de 1 milhão de streams totais ('Total Streams').

filtrar = df_musicas[(df_musicas['Spotify Popularity'] > 80) & (df_musicas['Total Streams'] > 1_000_000)]
print(filtrar.head())

# 7 - Salve o DataFrame resultante em um novo arquivo JSON chamado 'faixas_filtradas.json'. - Garanta que o arquivo foi salvo corretamente

filtrar.to_json('./faixas_filtradas.json', index= False)
35 changes: 35 additions & 0 deletions exercicios/para-sala/ETL_pandas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import pandas as pd

['TransactionID', 'Date', 'MobileModel', 'Brand', 'Price', 'UnitsSold','TotalRevenue', 'CustomerAge', 'CustomerGender', 'Location','PaymentMethod']

df = pd.read_csv("../../material/mobile_sales.csv")

print(df.head())
print(df.columns)
df_valores_nulos = df.isnull()
print(df_valores_nulos.sum())
print(df.duplicated().sum)
print(df.dtypes)

df["Date"] = pd.to_datetime(df["Date"], format= 'mixed')
print(df.dtypes)

print(df["Date"]) # mostra os dados da coluna selecionada
print('Date')

df["Total Sales Value"] = df["Price"] * df["UnitsSold"] # Cria uma nova coluna com o título Total Sales Value através do produto de Price x UnitsSold

print(df["Total Sales Value"]) # print a nova coluna

print(df.columns)

profit_per_product = 0.30

df['Profit Margin'] = (df['Price']*profit_per_product)* df['UnitsSold']
print(df['Profit Margin'])

filtered_df = df[(df["Total Sales Value"] > 100_000) & (df["Profit Margin"] > 20_000)]

print(filtered_df.head())

filtered_df.to_csv("./filtered_list.csv", index=False)
63 changes: 63 additions & 0 deletions exercicios/para-sala/filtered_list.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
TransactionID,Date,MobileModel,Brand,Price,UnitsSold,TotalRevenue,CustomerAge,CustomerGender,Location,PaymentMethod,Total Sales Value,Profit Margin
79397f68-61ed-4ea8-bcb2-f918d4e6c05b,2024-01-06,direction,Green Inc,1196.95,85,28002.8,32,Female,Port Erik,Online,101740.75,30522.225
f7e98db9-cb87-453e-8179-e48ba5443932,2024-03-07,idea,"Massey, Nicholson and Young",1498.13,70,9703.89,45,Female,Port Daryl,Debit Card,104869.1,31460.730000000003
e59a8eb1-8448-4719-8502-2c97407d0ff9,2024-01-08,free,Nelson and Sons,1333.31,79,49676.78,45,Female,East Brianstad,Online,105331.48999999999,31599.447
b5119fd6-e0d7-44ee-8f87-44f91d42de3f,2024-05-23,law,Roach-Strong,1236.37,89,26408.0,43,Female,Victorview,Credit Card,110036.93,33011.079
03d675d2-f4c7-4860-b159-f7df7142b87e,2024-07-07,special,Weaver Ltd,1418.24,99,1877.76,42,Other,Port Ericstad,Credit Card,140405.76,42121.727999999996
9b4f4a39-8512-411a-8533-2b1d99cf4e64,2024-06-24,travel,"James, Garcia and Brown",1141.24,94,98242.2,28,Female,Bellview,Cash,107276.56,32182.968
8d757a3c-6ffc-4b44-97aa-de01f6bc3b56,2024-06-06,bar,Jordan-Williams,1409.49,81,29360.24,40,Other,Alberttown,Credit Card,114168.69,34250.606999999996
8201d6e3-4f18-4911-9b28-c50627fd1640,2024-02-11,test,Walker-White,1444.64,89,12646.8,51,Female,East Kenneth,Cash,128572.96,38571.888
8499624a-1c49-4645-9c65-df13c57c87d9,2024-07-01,matter,Andrews LLC,1269.71,96,72588.74,41,Female,Powellmouth,Credit Card,121892.16,36567.648
7f36c3a2-ff43-483b-adb8-668e37d16534,2024-07-15,partner,Hebert Inc,1352.48,95,20333.88,32,Other,Crawfordville,Online,128485.6,38545.68
56ad37f3-bb16-4f08-90ba-140b27eebd4a,2024-07-01,century,"Bates, Pearson and Hardy",1245.2,81,59695.92,34,Female,Thomasfort,Cash,100861.2,30258.36
5a5ebad6-dab7-4388-9961-411136b68e27,2024-01-24,eight,Martin-Carson,1256.1,90,25697.72,26,Other,South Benjamin,Cash,113048.99999999999,33914.7
49ee7bcb-01a3-4c71-9273-49d4afd465a4,2024-03-01,son,Anderson-White,1285.81,82,27464.7,44,Female,Port Williamshire,Online,105436.42,31630.926
7ebd3c9c-21a9-48d4-802b-50b2ae3d74e6,2024-07-25,play,Cabrera-White,1358.1,83,30654.0,40,Other,New Christina,Credit Card,112722.29999999999,33816.689999999995
96845a93-75b3-4a60-b213-201181842f96,2024-04-16,skill,"White, Ford and Andrews",1360.71,84,65262.78,29,Female,Snowfurt,Online,114299.64,34289.892
5633dd9e-0ad3-455e-8b00-7236c2379e1b,2024-07-25,security,"Ware, May and Lopez",1485.6,68,51970.4,47,Female,Michaelland,Credit Card,101020.79999999999,30306.239999999998
fdf59c83-bd14-459c-a73d-1c17b86a5a94,2024-01-13,effect,"Martin, Smith and Patterson",1317.86,84,8388.09,54,Female,Davisbury,Credit Card,110700.23999999999,33210.07199999999
07602482-1535-4857-b739-94bbfbbb2ef8,2024-04-05,property,Torres Inc,1285.42,91,16868.06,58,Other,Stacyborough,Credit Card,116973.22,35091.966
74df89fb-b693-457d-a68e-e017937d9fe0,2024-03-20,middle,"Cooper, Mcclain and Cook",1276.97,85,119864.64,50,Female,West Alice,Debit Card,108542.45,32562.735
3b05d54f-0bdf-444c-a021-19e842585d6b,2024-05-28,involve,Figueroa LLC,1184.54,89,23249.54,32,Female,South Melissa,Online,105424.06,31627.217999999997
b09fb488-824a-416d-8800-bbd06fd3530e,2024-02-07,nothing,"Miller, Hill and Lawson",1278.82,91,61818.67,49,Female,Thomasview,Credit Card,116372.62,34911.78599999999
da4115cc-24fb-4ca6-835c-6eeaf8d03cf2,2024-05-27,mention,"Skinner, Ramirez and Kelley",1486.29,68,39261.45,63,Male,Herreraborough,Cash,101067.72,30320.316
97b019b5-ee7f-47bb-a9fe-8a23c0ca4b55,2024-05-11,woman,Williamson-Clay,1378.49,80,10696.96,22,Female,Christopherbury,Debit Card,110279.2,33083.759999999995
0cc43991-586c-4734-94cc-ff0163883c1f,2024-04-15,situation,Stein-Bridges,1426.72,84,94828.04,41,Male,Espinozamouth,Online,119844.48,35953.344000000005
821fa086-7197-4c69-9733-1dcc939af665,2024-04-06,sing,Cobb LLC,1178.88,89,33021.82,53,Male,Jimchester,Online,104920.32,31476.096000000005
c07fff34-92b2-42e0-982b-6dd160b071e1,2024-04-02,operation,"Duncan, Mendoza and Mcdowell",1477.14,90,66427.94,51,Other,South Brandon,Debit Card,132942.6,39882.78
57a14899-fdc3-4bc4-82fb-dee041cc5085,2024-03-29,build,Andrews-Martin,1134.19,95,55740.27,21,Male,East Brian,Credit Card,107748.05,32324.415
1bcdf294-8f88-4cdd-aa1b-5296be5466b4,2024-03-30,expect,"Jackson, White and Brown",1409.21,76,8286.95,43,Other,Jordanfurt,Credit Card,107099.96,32129.987999999998
918c1785-2c85-4757-a15e-cfddb38dd38e,2024-05-27,bad,"Mcmahon, Jones and Baker",1317.59,86,102433.02,28,Other,New Charles,Credit Card,113312.73999999999,33993.822
0835e90c-955f-47dc-87df-98c314e501a7,2024-06-25,former,Weaver-Thompson,1196.33,87,115585.69,35,Other,Brandonton,Online,104080.70999999999,31224.212999999996
d0da1b38-58fd-4b3f-85fa-d9a8f45ee603,2024-01-20,practice,Wilcox PLC,1293.55,95,85958.04,44,Male,Mclaughlinburgh,Cash,122887.25,36866.175
83f22533-6d79-48b7-9d4c-e78aadf0595a,2024-01-23,protect,"Burns, Davila and Camacho",1297.7,85,68618.72,36,Female,North Johnport,Debit Card,110304.5,33091.35
1c06d99c-d59c-47e8-83ba-38a40a00be41,2024-04-01,artist,Smith-Tucker,1294.76,99,9072.42,42,Male,Greeneview,Debit Card,128181.24,38454.372
56f4a3f9-ee2d-4c28-b836-8f183b0333b0,2024-04-05,they,"Kirby, Oneill and Carter",1345.42,94,31312.96,25,Female,Steelemouth,Online,126469.48000000001,37940.844000000005
784b0c63-1eb4-42bf-a8e1-de0d1f9bbbb2,2024-05-18,blood,Fleming Group,1465.14,80,11440.5,60,Male,Newmantown,Cash,117211.20000000001,35163.36
1cafe067-7e81-46ff-9990-579adaebc2cf,2024-07-08,senior,Jensen-Lowe,1483.91,94,12619.62,29,Female,West Susan,Online,139487.54,41846.262
077487a5-61c4-4f29-bd88-49901d7b47e7,2024-04-30,painting,Harris-Bell,1385.88,83,9700.6,41,Other,North Samuel,Online,115028.04000000001,34508.412000000004
4fe52be3-0c3e-4098-bc58-8f391cc3fb26,2024-04-29,born,Cunningham-Hawkins,1390.89,79,60684.54,28,Female,Port Fernandomouth,Credit Card,109880.31000000001,32964.093
0740c846-3424-4692-afee-088248bbfd37,2024-07-22,figure,Flowers-Erickson,1408.39,97,44118.9,30,Other,South Holly,Online,136613.83000000002,40984.149
c1dd718f-8c25-47a9-bc37-5dd6c767199e,2024-02-26,rule,"Vasquez, Roberts and Johnson",1458.6,85,6503.25,19,Other,Bowentown,Credit Card,123980.99999999999,37194.299999999996
b563bdbe-055d-41a8-8cea-06be0db9c82c,2024-05-02,possible,"Johnson, Mcconnell and May",1385.71,75,24011.36,57,Male,Sullivanmouth,Credit Card,103928.25,31178.475000000002
15b09d84-166a-4a3d-a92b-d6cddc7e46cf,2024-05-15,experience,Young Inc,1341.15,93,99642.96,55,Male,Lawsonbury,Credit Card,124726.95000000001,37418.085
6c234dd7-845d-49d8-a506-0d0525939c52,2024-05-14,most,"Weaver, Young and King",1235.36,98,22969.48,25,Female,Micheleshire,Online,121065.27999999998,36319.583999999995
173d6e2c-d2d4-4a78-9b56-8ff05194c0be,2024-07-20,thus,Anderson-Burns,1253.69,94,65396.79,23,Male,Brownburgh,Online,117846.86,35354.058000000005
26760d0b-ece5-48d9-906f-6511c119a434,2024-04-08,fine,Sampson-Kennedy,1179.65,89,50670.65,39,Other,South Christina,Credit Card,104988.85,31496.655000000002
9230af26-83a1-4066-8d7a-8f32cb65a58c,2024-06-12,industry,"Barrett, Figueroa and White",1384.31,86,64326.15,45,Other,North Jeffrey,Debit Card,119050.65999999999,35715.198
95bee8ce-4701-4a6b-8a17-a68f2e661677,2024-03-10,director,Dennis-Sanchez,1343.65,96,3301.35,19,Female,Lake Christopher,Online,128990.40000000001,38697.12
05dc416c-5c87-4726-88bb-c3f02350f9d4,2024-06-18,easy,Jones-Nguyen,1354.53,88,16047.56,63,Other,West Kayla,Credit Card,119198.64,35759.592
06cbf9e5-391e-4727-a5b0-24c93b3f88df,2024-07-03,option,"Hanson, Barron and Castillo",1110.66,93,65983.14,25,Female,Dunnland,Debit Card,103291.38,30987.414000000004
385776a8-dd02-47b3-ac81-848385c53e01,2024-01-25,face,"Hester, Lee and Kirby",1309.52,80,52684.5,55,Female,Kellyton,Cash,104761.6,31428.48
2ad868ea-e6ec-4c08-90df-28627a36cd19,2024-05-27,science,"Daniels, Rojas and Pearson",1137.5,96,14628.86,29,Male,Sheilaburgh,Online,109200.0,32760.0
21cdcebc-fa3e-413a-9702-8fbd7b1d8682,2024-05-20,plant,Thomas Ltd,1217.74,96,65193.3,45,Other,Mooreburgh,Cash,116903.04000000001,35070.912
976cc526-100f-41ea-a8fa-6beb56f959f9,2024-02-13,decision,Miller-Jordan,1251.55,95,31085.22,55,Female,Michaelhaven,Debit Card,118897.25,35669.174999999996
c4c114b1-252e-4e40-9c57-c08f2d7388bc,2024-02-01,particularly,"Myers, Wilcox and Beck",1466.37,86,28288.0,22,Male,Danielbury,Online,126107.81999999999,37832.346
c390e049-3c4f-4b58-ad23-f52c64d7768f,2024-01-18,resource,"Fox, Stevens and Bell",1114.73,91,7388.25,64,Male,East Robertahaven,Debit Card,101440.43000000001,30432.128999999997
903f5961-35c7-47b9-a1f1-5492aa15b049,2024-04-06,four,Robinson-Thompson,1223.95,95,8217.44,43,Female,East Adam,Credit Card,116275.25,34882.575
cf0ec4eb-6751-4904-9980-9cbacd679c14,2024-03-15,hope,Hamilton-Garcia,1348.6,79,44273.85,55,Female,Tannerfort,Debit Card,106539.4,31961.82
f503c272-a176-4704-9011-e47781852269,2024-03-19,huge,Allen-Mays,1349.26,83,98113.14,52,Male,Danielport,Online,111988.58,33596.574
246f33f9-10a0-4d0d-82f5-e7c2164caf37,2024-07-14,around,"Carroll, Brown and Bates",1486.13,75,2336.76,20,Female,Amybury,Online,111459.75000000001,33437.925
b4df370f-821b-43aa-a876-841b99222c0b,2024-07-01,discussion,"Santiago, Yoder and Stevens",1447.46,73,73070.87,59,Male,Andreaview,Online,105664.58,31699.374
fcf20873-f45d-4ae1-ba0a-6333c35a01f6,2024-01-23,watch,Morrison-Stanley,1424.36,79,35283.6,63,Other,Gibbston,Credit Card,112524.43999999999,33757.331999999995
41f08915-addb-4966-8628-038c479c619a,2024-01-28,challenge,Brooks Ltd,1386.69,76,28865.7,39,Male,Ronaldchester,Credit Card,105388.44,31616.532