CREDITS:All corresponding resources
MOTIVATION:Motivation to create this repository to help upcoming aspirants and help to others in the data science field
Business understanding
1.Data collection
Data consists of 3 kinds
a.Structure data (tabular data,etc...)
b.Unstructured data (images,text,audio,etc...)
c.semi structured data (XML,JSON,etc...)
variable
a.qualitative (nominal,ordinal,binary)
b.quantitative(discrete,continuous)
a.Web scraping best article to refer-https://towardsdatascience.com/choose-the-best-python-web-scraping-library-for-your-application-91a68bc81c4f
https://www.bigdatanews.datasciencecentral.com/profiles/blogs/top-30-free-web-scraping-software
https://medium.com/analytics-vidhya/master-web-scraping-completly-from-zero-to-hero-38051423256b
1.Beautifulsoup
2.Scrapy
3.Selenium
4.Request to access data
5.AUTOSCRAPER - https://github.com/alirezamika/autoscraper
6.Twitter scraping tool (๐๐ ๐๐๐ or tweepy)-https://github.com/twintproject/twint
https://analyticsindiamag.com/complete-tutorial-on-twint-twitter-scraping-without-twitters-api/
Scraping Instagram -instaloader
7.urllib
8.pattern
9.Octoparse Easy Web Scraping https://www.octoparse.com/
ParseHub https://www.parsehub.com/ https://analyticsindiamag.com/parsehub-no-code-gui-based-web-scraping-tool/
Diffbot https://analyticsindiamag.com/diffbot/
Trustpilot
ScrapingBee https://analyticsindiamag.com/scrapingbee-api/
MechanicalSoup https://analyticsindiamag.com/mechanicalsoup-web-scraping-custom-dataset-tutorial/
Scrape HTML tables https://www.youtube.com/watch?v=6U5xJ3mXRKA&feature=youtu.be
pandas(read_html)
b.Web Crawling
https://python.libhunt.com/scrapy-alternatives
b.3rd party API'S
c.creating own data (manual collection eg:google docx,servey,etc...) primary data
d.Databases
Databases are 2 kind sequel and no sequel database
sql,sql lite,mysql,mongodb,hadoop,elastic search,cassendra,amazon s3,hive,googlebigtable,AWS DynamoDB,HBase,oracle db
e.Online resources - ultimate resource https://datasetsearch.research.google.com/
1)kaggle-https://www.kaggle.com/datasets , ๐๐๐ ๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐๐๐
2)movielens-https://grouplens.org/datasets/movielens/latest/
3)data.gov-https://data.gov.in/
4)uci-https://archive.ics.uci.edu/ml/datasets.php
5)Group Lens dataset https://grouplens.org/
6)world3bank https://data.world/ , worldbank
7)Google Cloud BigQuery public datasets
Google Public Datasets-cloud.google.com/bigquery/public-data/
Google Cloud Data Catalog https://cloud.google.com/data-catalog
Academic Torrents-https://academictorrents.com/check.htm?returnto=%2Fbrowse.php
8)online hacktons
9)image data from google_images_download
https://www.visualdata.io/discovery
http://xviewdataset.org/#dataset
https://ai.googleblog.com/2016/09/introducing-open-images-dataset.html
10)image data from Bing_Search
11)https://www.columnfivemedia.com/100-best-free-data-sources-infographic
12)Reddit:https://lnkd.in/dv5UCD4
13)https://datasets.bifrost.ai/?ref=producthunt
14)data.world:https://lnkd.in/gEK897K
15)https://data.world/datasets/open-data
https://tinyletter.com/data-is-plural
16)FiveThirtyEight :- https://lnkd.in/gyh-HDj , https://data.fivethirtyeight.com/
17)BuzzFeed :- https://lnkd.in/gzPWyHj
Buzzfeed News -github.com/BuzzFeedNews
Socrata - https://opendata.socrata.com/
18)Google public datasets :- https://lnkd.in/g5dH8qE
19)Quandl :- https://www.quandl.com stock data
statista : https://www.statista.com/ stock data
20)socorateopendata :- https://lnkd.in/gea7JMz
21)AcedemicTorrents :- https://lnkd.in/g-Ur9Xy
22)labelimage:- https://github.com/wkentaro/labelme , https://github.com/tzutalin/labelImg
Labelbox-https://labelbox.com/
Playment-https://playment.io/
SuperAnnotate -https://www.superannotate.com/
CVAT-https://github.com/openvinotoolkit/cvat
Lionbridge- https://lionbridge.ai/
LinkedAI: A No-code Data Annotations- https://analyticsindiamag.com/linkedai/
Dataturks
V7 Darwin The Rapid Image Annotator https://docs.v7labs.com/docs/loading-a-dataset-in-python https://github.com/v7labs/darwin-py#usage-as-a-python-library
https://waliamrinal.medium.com/top-and-easy-to-use-open-source-image-labelling-tools-for-machine-learning-projects-ffd9d5af4a20
https://github.com/heartexlabs/awesome-data-labeling
23)tensorflow_datasets as tfds https://www.tensorflow.org/datasets (import tensorflow_datasets as tfds)
24)https://datasets.bifrost.ai/?ref=producthunt
25)https://ourworldindata.org/
26)https://data.worldbank.org/
27)google open images:https://storage.googleapis.com/openimages/web/download.html
28)https://data.gov.in/
29)imagenet dataset-http://www.image-net.org/
30)https://parulpandey.com/2020/08/09/getting-datasets-for-data-analysis-tasks%e2%80%8a-%e2%80%8aadvanced-google-search/
31)https://storage.googleapis.com/openimages/web/index.html ,
https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=segmentation&r=false&c=%2Fm%2F09qck
https://console.cloud.google.com/marketplace/browse?filter=solution-type:dataset&_ga=2.35328417.1459465882.1589693499-869920574.1589693499
https://catalog.data.gov/dataset?groups=education2168#topic=education_navigation
https://vincentarelbundock.github.io/Rdatasets/datasets.html
32)coco dataset https://cocodataset.org/#explore
33)huggingface datasets-https://github.com/huggingface/datasets https://huggingface.co/datasets https://huggingface.co/languages
34)Big Bad NLP Database-https://datasets.quantumstat.com/
https://github.com/niderhoff/nlp-datasets
35)https://www.edureka.co/blog/25-best-free-datasets-machine-learning/
36)bigquery public dataset ,Google Public Data Explorer
37)inbuilt library data eg:iris dataset,mnist dataset,etc...
tf.data.Datasets for TensorFlow Datasets
38)data.gov.be ,data.egov.bg/ ,data.gov.cz/english ,portal.opendata.dk,govdata.de,opendata.riik.ee,data.gov.ie,data.gov.gr,datos.gob.es,data.gouv.fr,data.gov.hr
dati.gov.it,data.gov.cy,opendata.gov.lt,data.gov.lv,data.public.lu,data.gov.mt,data.overheid.nl,data.gv.at,danepubliczne.gov.pl,dados.gov.pt,data.gov.ro,podatki.gov.si
data.gov.sk,avoindata.fi,oppnadata.se,https://data.adb.org/ ,https://data.iadb.org/ ,https://www.weforum.org/agenda/2018/03/latin-america-smart-cities-big-data/
https://data.fivethirtyeight.com/ , https://wiki.dbpedia.org/ ,https://www.europeandataportal.eu/en ,https://data.europa.eu/ ,https://www.census.gov/,
https://www.who.int/data/gho ,https://data.unicef.org/open-data/ ,http://data.un.org/ ,https://data.oecd.org/ ,https://data.worldbank.org/
39.Awesome Public Dataset- https://github.com/awesomedata/awesome-public-datasets
40.Datasets for Machine Learning on Graphs-https://ogb.stanford.edu/
41.https://www.johnsnowlabs.com/data/
42.30 largest tensorflow datasets-https://lionbridge.ai/datasets/tensorflow-datasets-machine-learning/
43. coco dataset-https://cocodataset.org/#home
Google Open images-https://opensource.google/projects/open-images-dataset https://storage.googleapis.com/openimages/web/index.html
50+ Object Detection Datasets-https://medium.com/towards-artificial-intelligence/50-object-detection-datasets-from-different-industry-domains-1a53342ae13d
70+ Image Classification Datasets from different Industry domains-https://medium.com/towards-artificial-intelligence/70-image-classification-datasets-from-different-industry-domains-part-2-cd1af6e48eda
tensorflow_datasets.object_detection - https://storage.googleapis.com/openimages/web/index.html
https://github.com/google-research-datasets/Objectron/ https://ai.googleblog.com/2020/11/announcing-objectron-dataset.html?m=1
http://idd.insaan.iiit.ac.in/ http://database.mmsp-kn.de/koniq-10k-database.html
https://ai.googleblog.com/2020/11/announcing-objectron-dataset.html
ImageNet data -http://image-net.org/
ApolloScape Dataset-http://apolloscape.auto/
44.https://github.com/fivethirtyeight/data
45.Recommender Systems Datasets-https://cseweb.ucsd.edu/~jmcauley/datasets.html
46.indiadataportal-https://indiadataportal.com/
47.US Government Open Dataset: https://www.data.gov/
https://censusreporter.org/ https://data.census.gov/cedsci/
48.AWS Public Data Sets:https://registry.opendata.aws/ https://aws.amazon.com/opendata/?wwps-cards.sort-by=item.additionalFields.sortDate&wwps-cards.sort-order=desc
49.https://the-eye.eu/public/AI/pile_preliminary_components/
Reddit -https://www.reddit.com/r/datasets/
wikipedia-https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
http://opendata.cern.ch/ , https://www.imf.org/en/Data
Global Health Observatory data repository-https://apps.who.int/gho/data/node.main
CERN Open Data Portal-http://opendata.cern.ch/
50.openblender- https://www.openblender.io/#/welcome
51.Top 10 Datasets For Cybersecurity Projects- https://analyticsindiamag.com/top-10-datasets-for-cybersecurity-projects/
52.Datasets from Web Crawl Data (nlp)-http://data.statmt.org/cc-100/
53.https://www.springboard.com/blog/free-public-data-sets-data-science-project/
54.NASA - https://nasa.github.io/data-nasa-gov-frontpage/ace
55.Academic Torrents,GitHub Datasets,CERN Open Data Portal,Global Health Observatory Data Repository
56.32 Data Sets to Uplift your Skills in Data Science-https://blog.datasciencedojo.com/data-sets-data-science-skills/?utm_content=144243072&utm_medium=social&utm_source=linkedin&hss_channel=lcp-3740012
57.OpenDaL-https://opendatalibrary.com/
Data Is Plural-https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk/edit#gid=0
VisualData-https://www.visualdata.io/discovery
https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f
58.Pandas Data Reader-https://pandas-datareader.readthedocs.io/en/latest/remote_data.html
59.ieee-dataport-https://ieee-dataport.org/datasets
https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f
https://github.com/neomatrix369/awesome-ai-ml-dl/blob/master/data/datasets.md#datasets-and-sources-of-raw-data
60.Faker is a Python package that generates fake data-https://github.com/joke2k/faker
61.Text Data Annotator Tool - Datasaur https://datasaur.ai/
62.Google Analytics cost data import https://segmentstream.com/google-analytics?utm_source=twitter&utm_medium=cpc&utm_campaign=ga_costs_import_en&utm_content=guide
63.https://lionbridge.ai/services/crowdsourcing/ https://lionbridge.ai/ https://www.clickworker.com/ https://appen.com/ https://www.globalme.net/
64.Azure Open Datasets https://azure.microsoft.com/en-us/services/open-datasets/ https://azure.microsoft.com/en-in/services/open-datasets/catalog/
Yelp Open Dataset https://www.yelp.com/dataset
https://data.world/
ODK Open Data Kit- https://getodk.org/
World Bank Open Data https://data.worldbank.org/
https://analyticsindiamag.com/10-biggest-data-breaches-that-made-headlines-in-2020/
https://data.mendeley.com/
https://github.com/iamtekson/geospatial-data-download-sites
https://eugeneyan.com/writing/data-discovery-platforms/
65.https://medium.com/towards-artificial-intelligence/best-datasets-for-machine-learning-data-science-computer-vision-nlp-ai-c9541058cf4f
https://github.com/MTG/freesound-datasets
https://dataform.co/
https://github.com/rfordatascience/tidytuesday https://www.youtube.com/watch?v=vCBeGLpvoYM
https://www.analyticsvidhya.com/blog/2020/12/top-15-datasets-of-2020-that-every-data-scientist-should-add-to-their-portfolio/?utm_source=linkedin&utm_medium=AV|link|high-performance-blog|blogs|44181|0.375
66.https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
https://archive.org/details/datasets
https://commoncrawl.org/
2.Feature engineering
Data cleaning-Pyjanitor-https://analyticsindiamag.com/beginners-guide-to-pyjanitor-a-python-tool-for-data-cleaning/
Remove duplicate data in dataset
a.Handle missing value
Types of missing value
i.missing completely at random(no correlation b/w missing and observed data) we can delete no disturbance of data distribution
ii.missing at random (randomness in missing data, missing value have correlation by data) we can't delete because disturbance of data distribution
iii.missing not at random (there is reason for missing value and directly related to value)
1.if missing data too small then delete it a.row deletion b.column deletion c.pairwise deletion
2.replace by statistical method mean(influenced by outiler),median(not influenced by outiler),mode
3.apply classifier algorithm to predict missing value
4.Iterative imputer,knn imputer, multivariate imputation
5.apply unsupervised
6.Random Sample Imputation
7.Adding a variable to capture NAN(missing term)
8.Arbitrary Value Imputation
9.hot deck Imputation
10.regression Imputation
11.End of Distribution Imputation
12.Arbitrary Value Imputation
13.Frequent Category Imputation
14.MICE Imputation
15.autoimpute-https://github.com/kearnz/autoimpute
https://stefvanbuuren.name/fimd/want-the-hardcopy.html
b.Handle imbalance
1.Under Sampling - mostly not prefer because lost of data
2.Over Sampling (RandomOverSampler (here new points create by same dot)) , SMOTETomek(new points create by nearest point so take long time),BorderLine Smote,KMeans Smote,SVM Smote,ADASYN,Smote-NC https://towardsdatascience.com/5-smote-techniques-for-oversampling-your-imbalance-data-b8155bdbe2b5
https://towardsdatascience.com/7-over-sampling-techniques-to-handle-imbalanced-data-ec51c8db349f
3.class_weight give more importance(weight) to that small class
4.use Stratified kfold to keep the ratio of classess constantly
5.Weighted Neural Network
https://machinelearningmastery.com/framework-for-imbalanced-classification-projects/
c.Remove noise data
d.Format data
e.Handle categorical data Ordinal,Nominal,cyclic,binary categorical variables
1.One Hot Encoding
2.Count Or Frequency Encoding
3.Target Guided Ordinal Encoding
4.Mean Encoding
5.Probability Ratio Encoding
6.label encoding
7.probability ratio encoding
8.woe(Weight_of_evidence)
9.one hot encoding with multi category (keep most frequently repeated only)
10.feature hashing
11.sparse csr matrix
12.entity embeddings
13.binary encoding
14.Rare label encoding
15.Leave-one-out(Loo) encoding
https://towardsdatascience.com/beyond-one-hot-17-ways-of-transforming-categorical-features-into-numeric-features-57f54f199ea4
f.Scaling of data
1.Normalisation
2.Standardization
3.Robust Scaler not influenced by outliers because using of median,IQR
4. Min Max Scaling
5.Mean normalization
6.maximum absolute scaling
Q-Q plot or Shapiro-Wilk Normality Test is used to check whether feature is guassian or normal distributed required for linear regression,logistic regression to Improve performance if not distributed then use below methods to bring it guassian distribution
a.Guassian Transformation
b.Logarithmic Transformation
c.Reciprocal Trnasformation
d.Square Root Transformation
e.Exponential Transdormation
f.BoxCOx Transformation
g.log(1+x) Transformation
h.johnson
g.Remove low variance feature by using VarianceThreshold
h.Same variable(only 1 variable) in feature then remove feature
i.Outilers removing outilers depond on problem we are solving
2 type of outilers available: Global outiler, Local outiler
eg: incase of fraud detection outilers are very important
methods to find outiler: Standard Deviation,zscore,boxplot,scatter plot,IQR,TensorFlow_Data_Validation
Automatic Outlier Detection:Isolation Forest,Local Outlier Factor,Minimum Covariance Determinant,Robust Random Cut Forest,DBScan Clustering
outiler treatment: mean/median/random imputation,drop,discretization (binning)
if outiler present then use robust scaling
alibi-detect https://github.com/SeldonIO/alibi-detect#adversarial-detection https://docs.seldon.io/projects/alibi-detect/en/latest/
https://medium.com/towards-artificial-intelligence/outlier-detection-and-treatment-a-beginners-guide-c44af0699754
j.Anomaly
clustering techniques to find it
Isolation Forest(for Big Data)
k.Sampling techniques
a.biased sampling
b.unbiased sampling
3.Exploratory Data Analysis(eda)
Explore the dataset by using python or microsoft excel or tableau or powerbi, etc...
Data visualization (Matplotlib,Seaborn,Plotly,pyqtgraph,Bokeh,Pygal,Dash,Pydot,Geoplotlib,ggplot,visualizer,etc...)
Scatterplot,multi line plot,bubble chart,bar chart,histogram,boxplot,distplot,bubble charts,area plot,heat map,index plot,violin plot,time series plot,density plot,dot plot,strip plot,plotly,Choropleth Map,Kepler,PDF,Kernel density function,networkx,Scatter_matrix,Bootstrap_plot,functionvis,Higher-Dimensional Plots,3-D Plots,Word Clouds,HoloViz
https://towardsdatascience.com/top-6-python-libraries-for-visualization-which-one-to-use-fe43381cd658
๐๐ฒ๐ฟ๐ฎ๐ ๐ ๐ผ๐ฑ๐ฒ๐น ๐๐ถ๐๐๐ฎ๐น๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ด๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ผ๐ฟ(ann-visualizer)- ๐ฝ๐ถ๐ฝ๐ฏ ๐ถ๐ป๐๐๐ฎ๐น๐น ๐ด๐ฟ๐ฎ๐ฝ๐ต๐๐ถ๐
univariate and bivariate and multivariate analysis
model visualization Tensorboard,netron,playground tensorflow,plotly,TensorDash,Dash,Microscope,Lucid
distributions(discerte,continous)
data distributions-normal distribution,Standard Normal Distribution,Student's t-Distribution,Bernoulli Distribution,Binomial Distribution,Poisson Distribution,๏ทUniform Distribution,F Distribution,Covariance and Correlation
Types of Statistics
1.Descriptive
2.Inferential
Types of data
1) Categorical (nomial,ordinal)
2) Numerical (discerte,continous)
random variable(discerte random variable ,continous random variable)
Central Limit Theorem,Bayes Theorem,Confidence Interval,Hypothesis Testing,z test, t test,f test,Confidence Interval,1 tail test, 2 tail test,chisquare test,anova test,A/B testing
4.Feature selection
1.Filter methods (correleation,chisquare test,Ttest,anova test,mutal information,hypothesis test,information gain etc...)
2.Wrapper methods (recursive feature eliminiation,boruta,forward selection,backwaed elimination,stepwise selection etc...)
3.Embedded method (lasso,ridge regression,elasticnet,tree based etc...)
4.Feature Importance
a.ExtraTreesClassifier,ExtraTreesregressor
b.SelectKBest
c.Logistic Regression
d.Random_forest_importance
e.decision tree
f.Linear Regression
g.xgboost
5.curse of dimensionality (as dimension increases performance decreases)
6.highly correleated features then can take any 1 feature (multicollinearity)
7.dimension reduction
8.lasso regression to penalise unimportant features
9.VarianceThreshold
10.model based selection
11.Mutual Information Feature Selection
12.remove features with very low variance (quasi constant feature dropping)
13.Univariate feature selection
14.importance of feature (random forest importance)
15.feature importance with decision trees
16.PyImpetus
17.drop constant features (variance=0)
18.variance inflation factor(vif)
19.Recursive Feature Elimination
20.exchaustive feature selection
https://machinelearningmastery.com/feature-selection-with-real-and-categorical-data/ https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/
https://www.analyticsvidhya.com/blog/2020/10/a-comprehensive-guide-to-feature-selection-using-wrapper-methods-in-python/
5.Data splitting
Splitting ratio of data deponds on size of dataset available
Training data,Validation data,Testing data
6.Model selection
Machine learning
A.Supervised learning (have label data)
1.Regression (output feature in continous data form)
linear regression,polynomial regression,Robust Regression,support vector regression,Decision Tree Regression,Random Forest Regression,
least square method,Random Forest Regression,xgboost,ridge(L2 Regularization),lasso(L1 Regularization (more sparse)),catboost,gradientboosting,adaboost,
elsatic net,light gbm,ordinary least squares,cart
use cases:
2.Classification (output feature in categorical data form)
Logistic Regression,K-Nearest Neighbors,Support Vector Machine,Kernel SVM,Naive Bayes,Decision Tree Classification,
Random Forest Classification,xgboost,adaboost,Gradient Boost,catboost,gaussian NB,LGBMClassifier,LinearDiscriminantAnalysis, Extreme Gradient Boosting Machine, passive aggressive classifier algorithm,cart,c4.5,c5.0
B.Unsupervised learning(no label(target) data)
1.Dimensionality reduction - PCA,SVD,LDA,som,tsne,plsr,pcr,autoencoders,kpca,lsa
2.Clustering :https://scikit-learn.org/stable/modules/clustering.html
3.Association Rule Learning - support,lift,confidence,aprior,elcat,Fp-growth,Fp-tree construction, association_rules
4.Recommendation system -
a.collaborative Recommendation system (model based, memory based(item based,user based)) user-item interaction matrix
b.content based Recommendation system
similarity based(user-user similarity,item-item similarity)
matrix factorization
c.utility based Recommendation system
d.knowledge based Recommendation system
e.demographic based Recommendation system
f.hybrid based Recommendation system
g.Average Weighted Recommendation
h.using K Nearest Neighbor
i.cosine distance recommender system
j.TensorFlow Recommenders https://www.tensorflow.org/recommenders
k.suprise baseline model
https://analyticsindiamag.com/top-open-source-recommender-systems-in-python-for-your-ml-project/
C.Ensemble methods
1.Stacking models
2.Bagging models
3.Boosting models
4.Blending
5.Voting (Hard Voting,Soft Voting)
D.Reinforcement learning
2 types a)model free b)model based
agent,environment,policy(On-Policy vs Off-Policy),reward function,value function,state,action,episode,actor-critic
agent apply action to environment get corresponding reward so that it learn environment
1.Q-Learning
2.Deep Q-Learning
3.Deep Convolutional Q-Learning
Deep Deterministic Policy Gradient
4.Twin Delayed DDPG,DQN
5.A3C (Actor Critic)
6.Advantage weighted actor critic (AWAC).
7.XCS
8.genetic algorithm,sarsa
https://simoninithomas.github.io/deep-rl-course/
Environments-OpenAI Gym, DeepMind Lab, Unity ML-Agents
https://analyticsindiamag.com/8-best-free-resources-to-learn-deep-reinforcement-learning-using-tensorflow/
https://neptune.ai/blog/best-reinforcement-learning-tutorials-examples-projects-and-courses
https://neptune.ai/blog/best-reinforcement-learning-tutorials-examples-projects-and-courses?utm_source=twitter&utm_medium=tweet&utm_campaign=blog-best-reinforcement-learning-tutorials-examples-projects-and-courses
Open AI Gym - https://gym.openai.com/
KerasRL https://github.com/keras-rl/keras-rl
pyqlearning
tensorforce https://tensorforce.readthedocs.io/en/latest/index.html
rl_coach https://github.com/IntelLabs/coach#installation MushroomRL https://mushroomrl.readthedocs.io/en/latest/
TFAgents https://github.com/tensorflow/agents (https://www.tensorflow.org/agents) https://deepmind.com/blog/article/trfl
Stable Baselines https://github.com/openai/baselines
https://www.youtube.com/playlist?list=PL_iWQOsE6TfURIIhCrlt-wj9ByIVpbfGc
https://neptune.ai/blog/the-best-tools-for-reinforcement-learning-in-python?utm_source=twitter&utm_medium=tweet&utm_campaign=blog-the-best-tools-for-reinforcement-learning-in-python
Semi-Supervised Learning-small amount of labeled data with a large amount of unlabeled data during training
E.Deep-learning (use when have huge data and data is highly complex and state of art for unstructured data)
Frameworks:Pytorch,Tensorflow,Keras,caffe,theano,MXNet,Matlab,Microsoft Cognitive Toolkit,opacus(Train PyTorch models with Differential Privacy)
1.Multilayer perceptron(MLP)
1.Regression task
2.Classification task
2.Convolutional neural network ( use for image data)
1.Classification of image
create own model,Lenet,Alexnet,Resenet,GoogleNet,Inception,Vgg,Efficient,Nasnet,STN,nasneta,senet,amoebanetc
2.Localization of object in image
3.Object detection and object segmentation
rcnn,fastrcnn,fastercnn,TensorFlow Object Detection,yolo v1,yolo v2,yolo v3,yolo v4,fast yolo,yolo tiny,yolo lite,yolo tiny++,yolo act++,
maskrcnn,ssd,detectron,detectron2,mobilenet,retinanet,R-fcn,detr facebook,U-net,UNet++,EfficientDet
3 kind of object segmentation are available semantic segmentation,instance segmentation,panoptic segmentation
PyTorch based low code object detection-https://github.com/alankbi/detecto
https://awesomeopensource.com/project/hoya012/deep_learning_object_detection
4.objecttracking (mean shit and optical flow and kalman filter)
Tracktor++,Trackrcnn,Jde,DeepSORT
5.Deepdream,Neural style transfer, Pose estimation
CNNs 'see' - FilterVisualizations, Heatmaps,Saliency Maps,Heat Map Visualizations
imageai.Detection for Object detection
DEEP LEARNING METHODS FOR 2D :OpenPose,DeepPose,MultiPoseNet,AlphaPose,VIBE,DeeperCut,Mask RCNN,DeepCut,Convolutional Pose Machines,PoseNet
3D POSE ESTIMATION
DEEP LEARNING METHODS FOR 3D:3D human pose estimation= 2D pose estimation + matching,Integral Human Pose Regression,Towards 3D Human Pose Estimation in the
Wild: a Weakly-supervised Approach,A Simple Yet Effective Baseline for 3d Human Pose Estimation,
Data Augmentation apply to increase size of dataset and performance of model
low code object detection - detecto https://github.com/alankbi/detecto
Object Detection with 10 lines of code-https://www.datasciencecentral.com/profiles/blogs/object-detection-with-10-lines-of-code
Remo Improves Image Management https://www.freecodecamp.org/news/manage-computer-vision-datasets-in-python-with-remo/
3.Recurrent neural network (use when series of data)
1.RNN
2.GRU
3.LSTM (have memory cell,forget gate etc..)
all above 3 models have bidirectional also based on problem statement use bidirectional models
4.Generative adversarial network https://poloclub.github.io/ganlab/ https://developers.google.com/machine-learning/gan/training
Cycle gan,Dcgan,SRGAN,InfoGAN,stargan,attan gan,stylegan,,PixelRNN,DiscoGAN,lsGAN,Conditional GAN(Pix2Pix),Progressive GANs( produces higher resolution images,Image-to-Image Translation),Face Inpainting,Super-resolution
https://github.com/hindupuravinash/the-gan-zoo
5.Autoencoder
1.sparse Autoencoder
2.denoising Autoencoder
3.Contractive Autoencoder
4.stacked Autoencoder
5.deep Autoencoder
6.variational autoencoder
6.BoltzmannMachines,Restricted Boltzmann Machine,deep belief network,deep BoltzmannMachines
7.Self Organizing Maps (SOM)
8.Natural language processing
Clean data(removing stopwords depond on problem ,lowering data,tokenization,postagging,stemmimg or lemmatization depond on problem,skipgram,n-gram,chunking)
Nltk,spacy,genism,textblob,inltk,Pattern,stanza,OpenNLP,polygot,corenlp,polyglot,PyDictionary,Huggiing face,spark nlp,allen nlp,rasa nlu,Megatron,texthero,Flair,textacy,finetune,gluon-nlp,VnCoreNLP libraries
NLU,NLG,NER,text summarization,Sentiment Analysis,Text Classifications,machine translation,chat bot,Text Generation,Speech Recognition
1.bag of words
2.Tfidf
3.wordembedding
a.using pretrained model
i)word2vec( cbow,skipgram)
ii)glove
iiI)fasttext
b.creating own embedding (use when have huge data)
i)word2vec library
ii)keras embedding
elmo (store semantic of word)
4.Document embedding-Doc2vec
5.sentence embedding
sense2vec,SENT2VEC,Universal sentence encoder
6.using rnn,lstm,gru
for above 3 models have bidirectional also
7.Encoder and Decoder(sequence to sequence), ProphetNet(new pretrained seq2seq model)
8.attention
self attention,Global Attention,Multi-Head Attention,Local Attention (monotonic,predictive) https://github.com/uzaymacar/attention-mechanisms
9.Transformer (big breakthrough in NLP) - http://jalammar.github.io/illustrated-transformer/
Shrinking Transformers (reduce size) 1.quantization,distillation,pruning,
10.BERT,Quantized MobileBERT,ALBERT,Electra,Transformer-XL,DistilBERT,ELMo,ROBERTA,XLNet,XLM-RoBERTa,T5,DISTILBERT,GPT,GPT2,GPT3,PRADO,PET,BORT
http://jalammar.github.io/ http://jalammar.github.io/illustrated-bert/ http://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
11.Speech
speech to text
text to speech
Acoustic model,Speaker diarisation,apis
googletrans (google Translator)
https://medium.com/towards-artificial-intelligence/natural-language-processing-nlp-with-python-tutorial-for-beginners-1f54e610a1a0
MT5-https://venturebeat.com/2020/10/26/google-open-sources-mt5-a-multilingual-model-trained-on-over-101-languages/?utm_content=144321587&utm_medium=social&utm_source=linkedin&hss_channel=lcp-3740012
VADER does not require any training data https://pypi.org/project/vaderSentiment/ https://analyticsindiamag.com/sentiment-analysis-made-easy-using-vader/
APPLICATIONS OF MACHINE TRANSLATIO-Text-to-text,Text-to-speech,Speech-to-text,Speech-to-speech,Image (of words)-to-text
Google-GNMT (Tensorflow),Facebook-fairseq (Torch),Amazon-Sockeye (MXNet),NEMATUS (Theano),THUMT (Theano),OpenNMT (PyTorch),StanfordNMT (Matlab),DyNet-lamtram(CMU),EUREKA(MangoNMT
https://www.kdnuggets.com/2020/05/best-nlp-deep-learning-course-free.html https://analyticsindiamag.com/flair-hands-on-guide-to-robust-nlp-framework-built-upon-pytorch/
https://medium.com/modern-nlp/nlp-metablog-a-blog-of-blogs-693e3a8f1e0c
classification,clustering,recommender systems,topic modelling,sentiment analysis,semantic analysis,summarization,machine translation,conversational interface,named entity recognition
F.Time Series
here data split is different (train,test,validate)
here handling missing data different
generally used to impute data in Time Series
1.ffill
2.bfill
3.do mean of previous or future x samples and impute
4.take previous season value and impute (data with trend)
5.mean,mode,median,random sample imputation (data without trend and without seasonality)
6.linear interpolation(data with trend and without seasonality)
7.seasonal +interpolation(data with trend and with seasonality)
here model selection deponds on different property of data like stationary,trend,seasonality,cyclic
adfuller test for Stationarity
models
1.Arima , auto arima ,seasonal arima
2.Autoregressive
3.Moving average,Exponential Moving average,Exponential Smoothing
4.Lstm(neural network)
5.Autoregressive
6.Navie forecasts
7.Smoothing (moving average,exponential smoothing)
8.Facebook prophet (note:expceted date column as ds and target column as y)
NeuralProphet Model- https://ourownstory.github.io/neural_prophet/model-overview/
9.Holts winter,Holts linear trend
10.AutoTS-https://analyticsindiamag.com/hands-on-guide-to-autots-effective-model-selection-for-multiple-time-series/
11.Temporal Convolutional Neural
12.Atspy For Automating The Time-Series Forecasting-https://analyticsindiamag.com/hands-on-guide-to-atspy-for-automating-the-time-series-forecasting/
13.Darts-https://analyticsindiamag.com/hands-on-guide-to-darts-a-python-tool-for-time-series-forecasting/
14.Bayesian Neural Network , TsEuler
15.PyFlux (easy way to compare different models)-https://analyticsindiamag.com/pyflux-guide-python-library-for-time-series-analysis-and-prediction/
16.Orbit , DeepAR ,NeuralProphet(https://github.com/ourownstory/neural_prophet https://ourownstory.github.io/neural_prophet/model-overview/)
best article-https://www.analyticsvidhya.com/blog/2018/02/time-series-forecasting-methods/,
time series visualization tool https://plotjuggler.io/
https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/
https://www.machinelearningplus.com/time-series/time-series-analysis-python/
https://github.com/Apress/hands-on-time-series-analylsis-python
https://otexts.com/fpp2/simple-methods.html
https://analyticsindiamag.com/top-time-series-deep-learning-methods/
G.Semi supervised learning,Self-Supervised Learning,Multi-Instance Learning
H.Active learning,Multi-Task Learning,Online Learning
I.Transfer learning(Inductive Transfer learning(similar domain,different task),Unsupervised Transfer Learning(different task,different domain but similar enough) ,Transductive Transfer Learning(similar task,different domain))
https://github.com/artix41/awesome-transfer-learning
J.Deep dream,Style transfer
K.One-shot learning,Zero-shot learning
TYPES OF ACTIVATION FUNCTIONS: LINEAR ACTIVATION,RELU,LEAKY RELU,SIGMOID ACTIVATION,TANH ACTIVATION,elu,PReLU,Softmax,Swish,Softplus
Optimizer- Gradient Descent(Batch Gradient Descent,Stochastic Gradient Descent,Mini batch Gradient Descent),sgd with momentum,Adagrad,RMSProp,Adam,AdaBelief
Regularization- L1, L2, dropout, early stopping, and data augmentation,batch normalisation,tree purning
Different Normalization Layers - https://towardsdatascience.com/different-normalization-layers-in-deep-learning-1a7214ff71d6
Hyperparameters Number of hidden layers,Dropout,activation function,Weights initialization , learning rate,epoch, iterations and batch size
Hyperparameter tuning
a.GridSearchCV (check every given parameter so take long time)
b.RandomizedSearchCV (search randomly narrow down our time)
c.Bayesian Optimization , Hyperopt
d.Sequential Model Based Optimization(Tuning a scikit-learn estimator with skopt)
e.Optuna
f.Genetic Algorithms
g.Keras tuner
h.Scikit-Optimize
i.ray[tune] and aisaratuners https://towardsdatascience.com/choosing-a-hyperparameter-tuning-library-ray-tune-or-aisaratuners-b707b175c1d7
https://towardsdatascience.com/10-hyperparameter-optimization-frameworks-8bc87bc8b7e3
Cross validation techniques- https://towardsdatascience.com/understanding-8-types-of-cross-validation-80c935a4976d
1.Loocv
2.Kfoldcv
3.Stratfied cross validation
4.Time Series cross-validation
5.Holdout cross-validation
6.Repeated cross-validation
Tensorboard,Neptune to visualization of model performance
Distributed Training with TensorFlow
6.Testing model
Generally used metrics
Always check bias variance tradeoff to know how model is performing
Model can be overfitting(low bias,high variance),underfitting(high bias,high variance),good fit(low bias,low variance)
https://scikit-learn.org/stable/modules/model_evaluation.html https://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model
https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
1.Regression task - mean-squared-error, Root-Mean-Squared-Error,mean-absolute error, Rยฒ, Adjusted Rยฒ,Cross-entropy loss,Mean percentage error
2.Classification task-Accuracy,confusion matrix,Precision,Recall,F1 Score,Binary Crossentropy,Categorical Crossentropy,AUC-ROC curve,log loss,Average precision,Mean average precision
3.Reinforcement learning - generally use rewards
4.Incase of machine translation use bleu score
5.Clustering then use External: Adjusted Rand index, Jaccard Score, Purity Score Internal:silhouette_score, Davies-Bouldin Index, Dunn Index
6.Object Detection loss-localization loss,classification loss,Focal Loss,IOU,L2 loss
7.Distance Metrics - Euclidean Distance,Manhattan Distance,Minkowski Distance,Hamming Distance
metric-Built-in metrics, Custom metric without external parameters,Custom metric with external parameters,Subclassing custom metric layer
https://medium.com/swlh/custom-loss-and-custom-metrics-using-keras-sequential-model-api-d5bcd3a4ff28
loss-Built-in loss, Custom loss without external parameters,Custom loss with external parameters,Subclassing loss layer
Docker and Kubernetes
7.deployment
1.Azure
2.Heroku
3.Amazon Web Services
4.Google cloud platform
MODEL DEPLOYMENT USING TF SERVING
TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines https://www.tensorflow.org/tfx
Models visualization using Tensorboard,netron, TensorBoard.dev
Python web Frameworks for App Development- Flask,Streamlit,fastapi,Django,Web2py,Pyramid,CherryPy,Voila,Kivy and Kivymd https://analyticsindiamag.com/top-8-python-tools-for-app-development/
snapyml Deploy AI Models For Free -http://snapyml.snapy.ai/
h20wave-apps https://github.com/h2oai/wave-apps https://h2oai.github.io/wave/docs/installation/
Web-Based GUI (Gradio)- https://analyticsindiamag.com/guide-to-gradio-create-web-based-gui-applications-for-machine-learning/
web application(dash)- https://dash.plotly.com/
Jupyter Notebook into an interactive dashboard (voila)-https://voila.readthedocs.io/en/stable/
high-level app and dashboarding solution(Panel)-https://panel.holoviz.org/
https://github.com/gradio-app/gradio
Tensorflow lite:Use of tensorflow lite to reduce size of model https://www.tensorflow.org/lite https://codelabs.developers.google.com/codelabs/recognize-flowers-with-tensorflow-on-android-beta/#0 https://tfhub.dev/s?deployment-format=lite https://www.tensorflow.org/lite/examples https://www.tensorflow.org/lite/microcontrollers https://www.tensorflow.org/lite/models
model optimization (architecture)
TinyML https://blog.tensorflow.org/2020/08/the-future-of-ml-tiny-and-bright.html
Post-training Quantization in TensorFlow Lite https://www.tensorflow.org/lite/performance/post_training_quantization
pruning
leverage of model architecture
Quantization:Use Quantization to reduce size of model
8.Mointoring model
CI CD pipeline used- circleci , jenkins
In real world project use pipeline -https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
1.easy debugging
2.better readability
BIG DATA: hadoop,apache spark
research paper-https://arxiv.org/ ,https://arxiv.org/list/cs.LG/recent, https://www.kaggle.com/Cornell-University/arxiv
code for Research Papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil
Summarise Research Papers - https://www.semanticscholar.org/
programming language for data science is Python,R,Julia,Java,Scala,JAVA SCRIPT(Tensorflow.js)
IDE:jupyter notebook,spyder,pycharm,visual studio
BEST ONLINE COURSES
1.COURSERA
2.UDEMY
3.EDX
4.DATACAMP
5.Udacity
6.https://www.skillbasics.com/
BEST YOUTUBE CHANNEL TO FOLLOW
1.Krish Naik-https://www.youtube.com/user/krishnaik06
2.Codebasics-https://www.youtube.com/channel/UCh9nVJoWXmFb7sLApWGcLPQ
3.Abhishek thakur-https://www.youtube.com/user/abhisheksvnit
4.AIEngineering-https://www.youtube.com/channel/UCwBs8TLOogwyGd0GxHCp-Dw
5.Ineuron-https://www.youtube.com/channel/UCb1GdqUqArXMQ3RS86lqqOw
6.Ken jee-https://www.youtube.com/c/KenJee1/featured
7.3Blue1Brown-https://www.youtube.com/c/3blue1brown/featured
8.The AI Guy -https://www.youtube.com/channel/UCrydcKaojc44XnuXrfhlV8Q
9.Unfold Data Science-https://www.youtube.com/channel/UCh8IuVJvRdporrHi-I9H7Vw
BEST BLOGS TO FOLLOW
https://www.cybrhome.com/topic/data-science-blogs
1.Towards data science-https://towardsdatascience.com/
2.Analyticsvidhya-https://www.analyticsvidhya.com/blog/?utm_source=feed&utm_medium=navbar
3.Medium-https://medium.com/
4.Machinelearningmastery-https://machinelearningmastery.com/blog/
5.ML+ -https://www.machinelearningplus.com/
BEST RESOURCES
1.paperswithcode-https://paperswithcode.com/methods
2.madewithml-https://madewithml.com/topics/ https://madewithml.com/courses/applied-ml-in-production/
Weights & Biases-https://wandb.ai/gallery sotabench-https://sotabench.com/
3.Deep learning-https://course.fullstackdeeplearning.com/#course-content
4.pytorch deep learning-https://atcold.github.io/pytorch-Deep-Learning/
PyTorch Lightning-https://github.com/PyTorchLightning/pytorch-lightning
jax- https://github.com/google/jax
incubator-mxnet - https://github.com/apache/incubator-mxnet
ignite-https://github.com/pytorch/ignite
fastText - https://github.com/facebookresearch/fastText
5.deep-learning-drizzle-https://deep-learning-drizzle.github.io/ https://deep-learning-drizzle.github.io/index.html
6.Fastaibook-https://github.com/fastai/fastbook , https://course.fast.ai/
neptune.ai-https://docs.neptune.ai/index.html
7.TopDeepLearning-https://github.com/aymericdamien/TopDeepLearning
8.NLP-progress-https://github.com/sebastianruder/NLP-progress
9.EasyOCR-https://github.com/JaidedAI/EasyOCR
10.Awesome-pytorch-list-https://github.com/bharathgs/Awesome-pytorch-list https://shivanandroy.com/awesome-nlp-resources/
11.free-data-science-books-https://github.com/chaconnewu/free-data-science-books
12.arcgis-https://github.com/Esri/arcgis-python-api https://geemap.org/
13.data-science-ipython-notebooks-https://github.com/donnemartin/data-science-ipython-notebooks
14.julia-https://github.com/JuliaLang/julia , https://docs.julialang.org/en/v1/
15.google-research-https://github.com/google-research/google-research
16.reinforcement-learning-https://github.com/dennybritz/reinforcement-learning
17.keras-applications-https://github.com/keras-team/keras-applications , https://github.com/keras-team/keras
18.opencv-https://github.com/opencv/opencv
19.transformers-https://github.com/huggingface/transformers
20.code implementations for research papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil
21.regarding satellite images - Geo AI,Arcgis,geemap
ersi arcgis-https://www.esri.com/en-us/arcgis/about-arcgis/overview
earthcube-https://www.earthcube.eu/
geemap-https://geemap.org/
22.Monk_Object_Detection-https://github.com/Tessellate-Imaging/Monk_Object_Detection
23.NLP-progress - https://github.com/sebastianruder/NLP-progress
24.interview-question-data-science-https://github.com/iNeuronai/interview-question-data-science-
25.recommenders-https://github.com/microsoft/recommenders
26.Awesome-NLP-Resources -https://github.com/Robofied/Awesome-NLP-Resources https://shivanandroy.com/awesome-nlp-resources/ https://github.com/keon/awesome-nlp
27.Tool for visualizing attention in the Transformer model-https://github.com/jessevig/bertviz
28.TransCoder-https://github.com/facebookresearch/TransCoder
29.Tessellate-Imaging-https://github.com/Tessellate-Imaging/monk_v1
Monk_Object_Detection-https://github.com/Tessellate-Imaging/Monk_Object_Detection/tree/master/application_model_zoo
Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials- https://github.com/TarrySingh/Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials
30.Machine-Learning-with-Python-https://github.com/tirthajyoti/Machine-Learning-with-Python
31.huggingface contain almost all nlp pretrained model and all tasks related to nlp field
https://github.com/huggingface https://github.com/huggingface/transformers https://huggingface.co/transformers/ https://huggingface.co/transformers/master/ https://github.com/huggingface/tokenizers
32.multi-task-NLP-https://github.com/hellohaptik/multi-task-NLP
33.gpt-2 - https://github.com/openai/gpt-2
34.Powerful and efficient Computer Vision Annotation Tool (CVAT)-https://github.com/openvinotoolkit/cvat, https://github.com/abreheret/PixelAnnotationTool
https://github.com/UniversalDataTool/universal-data-tool http://www.robots.ox.ac.uk/~vgg/software/via/
35.Data augmentation for NLP-https://github.com/makcedward/nlpaug
36.awesome Data Science-https://github.com/academic/awesome-datascience
37.mlops-https://github.com/visenger/awesome-mlops
38.gym-https://github.com/openai/gym
39.Super Duper NLP Repo-https://notebooks.quantumstat.com/ https://models.quantumstat.com/ https://miro.com/app/board/o9J_kqndLls=/ https://datasets.quantumstat.com/
40.papers summarizing the advances in the field-https://github.com/eugeneyan/ml-surveys
41.deep-translator-https://github.com/nidhaloff/deep-translator
42.detext-https://github.com/linkedin/detext
43.nlpaug-https://github.com/makcedward/nlpaug
44.ipython-sql-https://github.com/catherinedevlin/ipython-sql
45.libra-https://github.com/Palashio/libra
46.opencv-https://github.com/opencv/opencv
47.learnopencv-https://github.com/spmallick/learnopencv , https://www.learnopencv.com/
48.math is fun-https://www.mathsisfun.com/ , https://pabloinsente.github.io/intro-linear-algebra, https://hadrienj.github.io/posts/Deep-Learning-Book-Series-Introduction/
49.DEEP LEARNING WITH PYTORCH: A 60 MINUTE BLITZ - https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
50.Spark Release 3.0.1-https://spark.apache.org/releases/spark-release-3-0-1.html
51.for more cheatsheets-https://github.com/FavioVazquez/ds-cheatsheets , https://medium.com/swlh/the-ultimate-cheat-sheet-for-data-scientists-d1e247b6a60c
52.text2emotion-https://pypi.org/project/text2emotion/
53.ExploriPy-https://analyticsindiamag.com/hands-on-tutorial-on-exploripy-effortless-target-based-eda-tool/
54.TCN-https://github.com/philipperemy/keras-tcn
55.deeplearning-models-https://github.com/rasbt/deeplearning-models
56.earthengine-py-notebooks-https://github.com/giswqs/earthengine-py-notebooks
57.NLP-progress -https://github.com/sebastianruder/NLP-progress
58.numerical-linear-algebra -https://github.com/fastai/numerical-linear-algebra
59.Super Duper NLP Repo- https://notebooks.quantumstat.com/
60.reinforcement learning by using PyTorch-https://github.com/SforAiDl/genrl
61.chatbot- from scratch,google dialogflow,rasa nlu,azure luis, chatterbot,Amazon lex,Wit.ai,Luis.ai,IBM Watson etc...
https://blog.ubisend.com/optimise-chatbots/chatbot-training-data
- No Code Machine Learning / Deep Learning
Teachable Machine-https://teachablemachine.withgoogle.com/
Microsoft Lobe -https://lobe.ai/
WEKA - https://www.cs.waikato.ac.nz/ml/weka/
Monk_Gui-https://github.com/Tessellate-Imaging/Monk_Gui
ENNUI-https://math.mit.edu/ennui/ https://github.com/martinjm97/ENNUI https://www.youtube.com/watch?v=4VRC5k0Qs2w
Knime https://www.knime.com/
Accord.net http://accord-framework.net/
Rapid Miner https://rapidminer.com/
opennn https://www.opennn.net/
orange https://orange.biolab.si/
OpenBlender https://openblender.io/#/welcome
64.tensorflow development-https://blog.tensorflow.org/
TensorFlow Hub (trained ready-to-deploy machine learning models in one place) - https://tfhub.dev/
TensorBoard.dev - https://tensorboard.dev/
tutorials-https://www.tensorflow.org/tutorials https://www.tensorflow.org/guide
TensorFlow Graphics - https://www.tensorflow.org/graphics Lattice-https://www.tensorflow.org/lattice
TensorFlow Probability-https://www.tensorflow.org/probability TensorFlow Privacy- tensorflow-privacy
63.Data Science in the Cloud-Amazon SageMaker,Amazon Lex,Amazon Rekognition,Azure Machine Learning (Azure ML) Services,Azure Service Bot framework,Google Cloud AutoML
64.platforms to build and deploy ML models -Uber has Michelangelo,Google has TFX,Databricks has MLFlow,Amazon Web Services (AWS) has Sagemaker
65.Time Complexity Of Machine Learning Models -https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms/
66.ML from scratch-https://dafriedman97.github.io/mlbook/content/introduction.html
67.turn-on visual training for most popular ML algorithms https://github.com/lucko515/ml_tutor https://pypi.org/project/ml-tutor/
68.mlcourse.ai is a free online- https://mlcourse.ai/
69.using pretrained model provided by tfhub- https://tfhub.dev/
70.Deep-Learning-with-PyTorch- https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf
71.MIT 6.S191 Introduction to Deep Learning-http://introtodeeplearning.com/
72.R for Data Science-https://r4ds.had.co.nz/ ,Fundamentals of Data Visualization-https://clauswilke.com/dataviz/
74.machine learning in JavaScript-https://www.tensorflow.org/js https://www.tensorflow.org/js/models https://tensorflow-js-object-detection.glitch.me/
TensorFlow.jl Julia with TensorFlow https://malmaud.github.io/tfdocs/ https://malmaud.github.io/TensorFlow.jl/latest/tutorial.html
Sonnet is a library built on top of TensorFlow 2 https://github.com/deepmind/sonnet
TensorFlow Federated (TFF) ( facilitate open research and experimentation with Federated Learning)-https://www.tensorflow.org/federated
TFX is an end-to-end platform for deploying production ML pipelines https://www.tensorflow.org/tfx https://github.com/tensorflow/tfx
Federated Learning -https://www.tensorflow.org/federated/tutorials/federated_learning_for_image_classification
Neural Structured Learning-https://www.tensorflow.org/neural_structured_learning/tutorials/graph_keras_mlp_cora
Responsible AI-https://www.tensorflow.org/resources/responsible-ai
https://www.tensorflow.org/graphics
Multilingual Representations for Indian Languages https://tfhub.dev/google/MuRIL/1
75.free list of AI/ Machine Learning Resources/Courses-https://www.marktechpost.com/free-resources/
https://www.theinsaneapp.com/2020/11/free-machine-learning-data-science-and-python-books.html
65 Machine Learning and Data books for free- https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189
https://github.com/chaconnewu/free-data-science-books
http://introtodeeplearning.com/
https://www.youtube.com/playlist?app=desktop&list=PLypiXJdtIca5ElZMWHl4HMeyle2AzUgVB https://mit6874.github.io/
76.Code for Research Papers-https://chrome.google.com/webstore/detail/find-code-for-research-pa/aikkeehnlfpamidigaffhfmgbkdeheil
77.Natural Language Processing 365- https://ryanong.co.uk/natural-language-processing-365/
78.Top Computer Vision Google Colab Notebooks- https://www.qblocks.cloud/creators/computer-vision-google-colab-notebooks
79.For practice -https://www.confetti.ai/exams
81.Mathematics of Machine Learning,deep learning-https://towardsdatascience.com/the-mathematics-of-machine-learning-894f046c568
https://github.com/hrnbot/Basic-Mathematics-for-Machine-Learning
https://towardsdatascience.com/the-roadmap-of-mathematics-for-deep-learning-357b3db8569b
https://towardsai.net/p/data-science/how-much-math-do-i-need-in-data-science-d05d83f8cb19
https://www.mltut.com/how-to-learn-math-for-machine-learning-step-by-step-guide/
https://www.datasciencecentral.com/profiles/blogs/free-online-book-machine-learning-from-scratch
https://www.youtube.com/playlist?list=PLRDl2inPrWQW1QSWhBU0ki-jq_uElkh2a https://github.com/jonkrohn/ML-foundations
https://ocw.mit.edu/resources/res-18-001-calculus-online-textbook-spring-2005/textbook/
82.Googleai-https://ai.google/education
83.ELI5 is a Python package which helps to debug machine learning classifiers and explain their predictions
PyBrain is a modular Machine Learning Library for Python
84.Best Online Courses for Machine Learning and Data Science-https://www.mltut.com/best-online-courses-for-machine-learning-and-data-science/
AI Expert Roadmap-https://i.am.ai/roadmap/#data-science-roadmap
85.FastAPI-https://fastapi.tiangolo.com/deployment/deta/
86.Yann LeCunโs Deep Learning Course at CDS-https://cds.nyu.edu/deep-learning/ https://atcold.github.io/pytorch-Deep-Learning/
https://atcold.github.io/pytorch-Deep-Learning/
https://www.cs.cmu.edu/~ninamf/courses/601sp15/lectures.shtml
87.Four Important Computer Vision Annotation Tools https://heartbeat.fritz.ai/4-important-computer-vision-annotation-tools-you-need-to-know-in-2020-9f964931ed7
88.Python Data Science Handbook https://jakevdp.github.io/PythonDataScienceHandbook/
89.for low code object detection (detecto)- https://github.com/alankbi/detecto
90.1 line for hundreds of NLP models and algorithms- https://github.com/JohnSnowLabs/nlu
91.AudioFeaturizer when deal with audio data- https://pypi.org/project/AudioFeaturizer/
liborsa library https://librosa.org/doc/latest/index.html
MAGENTA-https://magenta.tensorflow.org/
92.Palladium-https://palladium.readthedocs.io/en/latest/
93.KNIME-https://www.knime.com/
94.Facebook Open Sourced New Frameworks to Advance Deep Learning Research https://www.kdnuggets.com/2020/11/facebook-open-source-frameworks-advance-deep-learning-research.html
95.PYTORCH - https://pytorch.org/ https://pytorch.org/ecosystem/ https://pytorch.org/tutorials/ https://pytorch.org/docs/stable/index.html https://github.com/pytorch/pytorch
PYTORCH Lightning https://pytorchlightning.ai/community#projects https://seannaren.medium.com/introducing-pytorch-lightning-sharded-train-sota-models-with-half-the-memory-7bcc8b4484f2
๐ข๐ฝ๐ฎ๐ฐ๐๐ (๐๐ฟ๐ฎ๐ถ๐ป๐ถ๐ป๐ด ๐ฃ๐๐ง๐ผ๐ฟ๐ฐ๐ต ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐๐ถ๐๐ต ๐ฑ๐ถ๐ณ๐ณ๐ฒ๐ฟ๐ฒ๐ป๐๐ถ๐ฎ๐น ๐ฝ๐ฟ๐ถ๐๐ฎ๐ฐ๐)-https://opacus.ai/
96.Atlas web-based dashboard -https://www.atlas.dessa.com/
97.Pytest (test code) https://docs.pytest.org/en/latest/index.html (test code)
98.keras- https://keras.io/ https://keras.io/api/ https://keras.io/examples/
99.High-Performance Jupyter Notebook - BlazingSQL Notebooks https://blazingsql.com/notebooks
100.CV-pretrained-model- https://github.com/balavenkatesh3322/CV-pretrained-modelCV-pretrained-model-
101.Kubeflow Machine Learning Toolkit for Kubernetes https://www.kubeflow.org/
102.Daily AI updates to your inbox- https://sago-ai.news/#/
103.Three API styles - Sequential Model,functional API,Model subclassing
104.Deep Learning Toolkit for Medical Image Analysis -https://github.com/DLTK/DLTK
106.Interpret The ML Model lime(explain black box models)- https://lime-ml.readthedocs.io/en/latest/
interpret https://github.com/interpretml/interpret
eli5 https://eli5.readthedocs.io/en/latest/
skater https://oracle.github.io/Skater/
what if tool https://pair-code.github.io/what-if-tool/ https://pair-code.github.io/what-if-tool/demos/uci.html
DeepLIFT https://github.com/kundajelab/deeplift
Responsible AI-https://www.tensorflow.org/resources/responsible-ai
https://christophm.github.io/interpretable-ml-book/
107.deep-learning-drizzle -https://deep-learning-drizzle.github.io/
108.Machine Learning University - https://aws.amazon.com/machine-learning/mlu/
109.mlflow https://mlflow.org/ An open source platform for the machine learning lifecycle
https://azure.microsoft.com/en-us/services/machine-learning/
https://github.com/VertaAI/modeldb
110.Data Preparation / ETL https://airflow.apache.org/ https://intake.readthedocs.io/en/latest/
111.fairlearn https://github.com/fairlearn/fairlearn/blob/master/README.md Evaluating fairness of AI/ML models and training data and for mitigating bias in models determined to be unfair.
AI Fairness 360 evaluating fairness of AI/ML models and training data and mitigating bias in current models https://aif360.mybluemix.net/
112.MONAI Framework For Medical Imaging Research https://analyticsindiamag.com/monai-datatsets-managers/
113.OpenVINO https://opencv.org/openvino-model-optimization/ https://opencv.org/how-to-speed-up-deep-learning-inference-using-openvino-toolkit-2/
115.https://github.com/khuyentran1401/Data-science
116.Pytest for Data Scientists https://towardsdatascience.com/4-lessor-known-yet-awesome-tips-for-pytest-2117d8a62d9c
117.mlflow https://mlflow.org/docs/latest/index.html
pipeline https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
118.algorithm to use by problem https://www.datasciencecentral.com/profiles/blogs/which-machine-learning-deep-learning-algorithm-to-use-by-problem
119.Connect the world to your data and fuel your ML.
OpenBlender Enrich ML Models with adding new Variables from Any Source to Boost Performance https://www.youtube.com/channel/UCCFN8DDrA6k7eHYLvZGdNVA https://openblender.io/
Follow leaders in the field to update yourself in the field
1.Linkedin
2.Twitter
CPU/GPU/TPU
1.Google cloab (FREE)
2.Kaggle kernel(read terms and conditions before use) (FREE)
3.Paperspace Gradient(read terms and conditions before use)
4.knime - https://www.knime.com/(read terms and conditions before use)
5.RapidMiner (read terms and conditions before use)
https://github.com/zszazi/Deep-learning-in-cloud
So what next ?
participate online competition and do project and apply to intership ,job,solving real world problems, etc...
applications of data science in many industry
1.E-commerce- Identifying consumers,Recommending Products,Analyzing Reviews
2.Manufacturing- Predicting potential problems,Monitoring systems,Automating manufacturing units, Maintenance Scheduling,Anomaly Detection
3.Banking- Fraud detection,Credit risk modeling,Customer lifetime value
4.Healthcare- Medical image analysis, Drug discovery,Bioinformatics,Virtual Assistants,image segmentation
5.Transport- Self-driving cars,Enhanced driving experience,Car monitoring system,Enhancing the safety of passengers
6.Finance- Customer segmentation,Strategic decision making,Algorithmic trading,Risk analytics
7.Marketing (Added from comments Credits: Jawad Ali)- LTV predictions,Predictive analytics for customer behavior,Ad targeting
and many more fields - https://www.topbots.com/enterprise-ai-companies-2020/ , https://venturebeat.com/2020/10/21/the-2020-data-and-ai-landscape/
Research blogs
1.https://ai.facebook.com/ https://ai.facebook.com/blog/
3.https://deepmind.com/blog https://deepai.org/definitions
5.https://www.malongtech.com/en/research.html
6.https://blogs.nvidia.com/blog/tag/artificial-intelligence/
7.https://blog.tensorflow.org/
https://www.kdnuggets.com/2020/01/top-10-ai-ml-articles-to-know.html
RESEARCH LABS IN THE WORLD
https://ai.facebook.com/ https://ai.googleblog.com/ https://research.google/ https://ai.google/research/
1.The Alan Turing Institute:https://www.turing.ac.uk/
2.J.P. Morgan AI Research Lab:https://www.jpmorgan.com/insights/tec...
3.Oxford ML Research Group:http://www.robots.ox.ac.uk/~parg/proj...
4.Microsoft Research Lab- AI:https://www.microsoft.com/en-us/resea...
5.Berkeley AI Research:https://bair.berkeley.edu/
6.LIVIA:https://en.etsmtl.ca/Unites-de-recher...
7.MIT Computer Science and Artificial :https://www.csail.mit.edu/
online competitions:
1.Kaggle-https://www.kaggle.com/
2.hackerearth-https://www.hackerearth.com/challenges/
3.machinehack-https://www.machinehack.com/
4.analyticsvidhya-https://datahack.analyticsvidhya.com/contest/all/
5.zindi-https://zindi.africa/competitions
6.crowdai-https://www.crowdai.org/
7.driven data-https://www.drivendata.org/
8.dockship-https://dockship.io/
9.SIGNATE Competition- https://signate.jp/about?rf=competition_about
9.International Data Analysis Olympiad (IDAHO)
10.Codalab
11.Iron Viz
12.Data Science Challenges
13.Tianchi Big Data Competition
Some useful content :
- H20.ai automl, google automl,google ml kit(https://developers.google.com/ml-kit) ,Azure Cognitive Services,Azure Machine Learning Service,amazon ml,Azure Machine Learning Studio,Google Cloud Platform,Weka,Microsoft Cognitive Toolkit,Google Cloud AutoML,DataRobot AutoML,Databricks AutoML,Azure ML,azure machine learning studio,IBM Watson ml studio,AWS Sagemaker Studio,Google AI Platform,Databricks,Domino Data Lab
https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet
https://codegnan.com/blog/35-best-data-sciecne-tools-for-beginners-to-master/
-
Tpot
-
autopandas
-
AutoGluon https://analyticsindiamag.com/how-to-automate-machine-learning-tasks-using-autogluon/
-
autosklearn,autokeras,LightAutoML (https://github.com/sberbank-ai-lab/LightAutoML)
-
autoviml
๐ฎ๐๐๐ผ๐บ๐ฎ๐๐ฒ ๐บ๐ผ๐๐ ๐ผ๐ณ ๐๐ต๐ฒ ๐ฑ๐ฎ๐๐ฎ ๐๐ฐ๐ถ๐ฒ๐ป๐ฐ๐ฒ https://github.com/Muhammad4hmed/GML
CodeLess https://pypi.org/project/codeless/ https://github.com/porky5191/codeless_demo_project
-
autoViz
-
hyperopt
-
sweetviz (EDA purpose) - https://pypi.org/project/sweetviz/
-
pandasprofiling(display whole EDA) - https://pypi.org/project/pandas-profiling/ https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/index.html
-
autokeras,AutoSklearn,Neural Network Intelligence
FeatureTools automated feature engineering.
MLBox,Lightwood,mindsdb(machine learning models using SQL queries),mljar-supervised,Ludwig(deep learning models without the need to write code)
AdaNet is a lightweight TensorFlow-based framework
-
pycaret- https://pycaret.org/
mindsdb Machine Learning in 5 Lines of Code https://mindsdb.com/
12.Auto_Timeseries by auto_ts
13.AutoNLP_Sentiment_Analysis by autoviml
14.automl lazypredict https://github.com/shankarpandala/lazypredict
AutoFeat-https://analyticsindiamag.com/guide-to-automatic-feature-engineering-using-autofeat/
15.bamboolib or pandas-ui or pandas-summary or pandas_visual_analysis or Dtale(get code also) (python package for easy data exploration & transformation)
Automating EDA using Pandas Profiling, Sweetviz and Autoviz,DataPrep,vaex,Datapane,Sweetviz,PandasGUI,Datatable,Dora,Pywedge,D-Tale,lux,Dabl,Pretty pandas,AWS Glue DataBrew,speedML,edaviz,Altair,voyager,Mito
https://github.com/mstaniak/autoEDA-resources
ExploriPy import EDA-https://analyticsindiamag.com/hands-on-tutorial-on-exploripy-effortless-target-based-eda-tool/
Lens- Statistical Analysis of Data https://analyticsindiamag.com/hands-on-tutorial-on-lens-python-tool-for-swift-statistical-analysis/
Datacleaner-https://analyticsindiamag.com/tutorial-on-datacleaner-python-tool-to-speed-up-data-cleaning-process/
Datacleaner :dora ,Voilร -Jupyter Notebooks quickly into standalone web applications , Plotly Dash - for more advanced and production level dashboards
featurewiz(Select the best features from your data set fast with a single line of code) - https://github.com/AutoViML/featurewiz
explainerdashboard https://medium.com/analytics-vidhya/explainer-dashboard-build-interactive-dashboards-for-machine-learning-models-fda63e0eab9
Panel - web apps
Datapane ( Build Interactive Reports) https://towardsdatascience.com/introduction-to-datapane-a-python-library-to-build-interactive-reports-4593fd3cb9c8
16.CUPY (array process parallel in gpu) https://pypi.org/project/cupy/
17.Dabl-automate the known 80% of Data Science which is data preprocessing, data cleaning, and feature engineering https://pypi.org/project/dabl/
18.dask (parallel comptataion) https://docs.dask.org/en/latest/ https://medium.com/rapids-ai/reading-larger-than-memory-csvs-with-rapids-and-dask-e6e27dfa6c0f#cid=av01_so-nvsh_en-us
Modin , Vaex , Dask,cuDF
19.dataprep (Understand your data with a few lines of code in seconds)
data-preparation-tools - https://improvado.io/blog/data-preparation-tools
20.Dora library is another data analysis library designed to simplify exploratory data analysis. https://pypi.org/project/Dora/
21.FastAPI is a modern, fast (high-performance), web framework for building APIs. https://fastapi.tiangolo.com/
22.faster Hyper Parameter Tuning(sklearn-nature-inspired-algorithms) https://pypi.org/project/sklearn-nature-inspired-algorithms/
23.FlashText (A library faster than Regular Expressions for NLP tasks) https://pypi.org/project/flashtext/
24.Guietta (tool that makes simple GUIs simple) https://pypi.org/project/guietta/
pandas-visual-analysis -https://analyticsindiamag.com/hands-on-guide-to-pandas-visual-analysis-way-to-speed-up-data-visualization/
25.hummingbird (make code fastly exexcute) https://pypi.org/project/Hummingbird/
CUML- increase the speed of training your machine learning model https://towardsdatascience.com/train-your-machine-learning-model-150x-faster-with-cuml-69d0768a047a
https://docs.rapids.ai/api/cuml/stable/
26.memory-profiler (tell memory consumption line by line) https://pypi.org/project/memory-profiler/
Cython A Speed-Up Tool for your Python Function https://towardsdatascience.com/cython-a-speed-up-tool-for-your-python-function-9bab64364bfd
Python Tricks for Keeping Track of Your Data https://towardsdatascience.com/python-tricks-for-keeping-track-of-your-data-aef3dc817a4e
27.numexpr (incerease speed of execution of numpy) https://github.com/pydata/numexpr
pypolars instead of pandas (beating-pandas-performance)
Numba (optimise performance of numpy and high performance python compiler) http://numba.pydata.org/
28.pandarallel (simple and efficient tool to parallelize your pandas computation on all your CPUs) https://pypi.org/project/pandarallel/
29.PDFTableExtract(by PyPDF2) https://github.com/ashima/pdf-table-extract
Camelot-https://towardsdatascience.com/extracting-tabular-data-from-pdfs-made-easy-with-camelot-80c13967cc88
30.PyImpuyte(Python package that simplifies the task of imputing missing values in big datasets) https://pypi.org/project/PyImpuyte/
31.libra(Automates the end-to-end machine learning process in just one line of code) https://pypi.org/project/libra/
32.debug code by puyton -m pdp -c continue
33.cURL (This is a useful tool for obtaining data from any server via a variety of protocols including HTTP.) https://stackabuse.com/using-curl-in-python-with-pycurl/
34.csvkit https://pypi.org/project/csvkit/
35.IPython IPython gives access to enhanced interactive python from the shell.
36.pip install faker (Create our own Dataset) https://pypi.org/project/Faker/
37.Python debugger %pdb
38.๐๐๐๐๐-From notebooks to standalone web applications and dashboards https://voila.readthedocs.io/en/stable/ https://github.com/voila-dashboards/voila
39.๐๐๐๐๐๐๐ for timeseries data https://github.com/tslearn-team/tslearn
40.texthero text-based dataset in Pandas Dataframe quickly and effortlessly https://github.com/jbesomi/texthero
41.๐๐๐๐๐๐๐(web-based visualization libraries like your Jupyter Notebook with zero dependencies) https://pypi.org/project/kaleido/
42.Vaex- Reading And Processing Huge Datasets in seconds https://github.com/vaexio/vaex
43.Uberโs Ludwig is an Open Source Framework for Low-Code Machine Learning https://eng.uber.com/introducing-ludwig/
44.Google's TAPAS, a BERT-Based Model for Querying Tables Using Natural Language https://github.com/google-research/tapas
45.RAPIDS open GPU Data Science https://rapids.ai/
RAPIDS cuML
46.pyforest Lazy-import of all popular Python Data Science libraries. Stop writing the same imports over and over again. https://pypi.org/project/pyforest/0.1.1/
47.Modin Get faster Pandas with Modin https://github.com/modin-project/modin
48.Text2Code for Jupyter notebook - https://github.com/deepklarity/jupyter-text2code , https://towardsdatascience.com/data-analysis-made-easy-text2code-for-jupyter-notebook-5380e89bb493
49.Openrefine Tool-For Data Preprocessing Without Code https://analyticsindiamag.com/openrefine-tutorial-a-tool-for-data-preprocessing-without-code/
50.Microsoft Releases Latest Version Of DeepSpeed deep learning optimisation library known as DeepSpeed- https://github.com/microsoft/DeepSpeed
51.4-pandas-tricks-https://towardsdatascience.com/4-pandas-tricks-that-most-people-dont-know-86a70a007993
52.tkinter to deploy machine learning model-https://analyticsindiamag.com/complete-tutorial-on-tkinter-to-deploy-machine-learning-model/
53.autoplotter is a python package for GUI based exploratory data analysis-https://github.com/ersaurabhverma/autoplotter
54.3 NLP Interpretability Tools For Debugging Language Models-https://www.topbots.com/nlp-interpretability-tools/
55.New Algorithm For Training Sparse Neural Networks (RigL)-https://analyticsindiamag.com/rigl-google-algorithm-neural-networks/
56.Read Data from pdf and Word-PyPDF2,PDFMiner,PDFQuery,tabula-py,pdflib for Python,PDFTables,PyFPDF2
OpenCV to Extract Information From Table Images-https://analyticsindiamag.com/how-to-use-opencv-to-extract-information-from-table-images/
57.Text Annotation-https://towardsdatascience.com/tortus-e4002d95134b
58.GDMix, A Framework That Trains Efficient Personalisation Models - https://analyticsindiamag.com/linkedin-open-sources-gdmix-a-framework-that-trains-efficient-personalisation-models/
59.Learn Machine Learning Concepts Interactively-https://towardsdatascience.com/learn-machine-learning-concepts-interactively-6c3f64518da2
60.Folium, Python Library For Geographical Data Visualization-https://analyticsindiamag.com/hands-on-tutorial-on-folium-python-library-for-geographical-data-visualization/
61.GPU Technology Conference (GTC) Keynote Oct 2020-https://www.youtube.com/watch?v=Dw4oet5f0dI&list=PLZHnYvH1qtOYOfzAj7JZFwqtabM5XPku1
62.jiant nlp task-https://github.com/nyu-mll/jiant
63.painted your machine learning model-https://koaning.github.io/human-learn/
64.Vector AI-https://github.com/vector-ai/vectorai
65.NVIDIA NeMo(for Conversational AI)-https://github.com/NVIDIA/NeMo
66.Deep Learning Models Without Coding(DeepCognition)-https://analyticsindiamag.com/how-to-use-deepcognition-to-build-drag-and-drop-deep-learning-models-without-coding/
67.100 Machine Learning Projects-https://medium.com/@amankharwal/100-machine-learning-projects-aff22b22dd6e
68.Question generation using Natural Language Processing-https://github.com/ramsrigouthamg/Questgen.ai
69.PixelLib(image segmentation,Blur Background,Gray Background,Background Colour Change,Background Change)-https://github.com/ayoolaolafenwa/PixelLib
70.High-Resolution 3D Human Digitization-https://shunsukesaito.github.io/PIFuHD/
71.AI model that translates 100 languages without relying on English data - https://ai.facebook.com/blog/introducing-many-to-many-multilingual-machine-translation/
72.800 free textbooks - https://open.umn.edu/opentextbooks
73.TensorDash is an application that lets you remotely monitor your deep learning model's metrics and notifies you when your model training is completed or crashed.
https://github.com/CleanPegasus/TensorDash
74.YellowBrick -select features, tune hyperparameters, select the best models, and understand the performance metrics.
75.Freely Available Python Books-https://rajukumarmishrablog.com/freely-available-python-books/
Collection of Python Cheat Sheets- https://rajukumarmishrablog.com/collection-of-python-cheat-sheets/
76.Add External Data to Your Pandas Dataframe - https://towardsdatascience.com/add-external-data-to-your-pandas-dataframe-with-a-one-liner-f060f80daaa4
https://www.openblender.io/#/welcome
77.visualize the model architecture-https://github.com/PerceptiLabs/PerceptiLabs
78.Train Conversational AI in 3 lines of code with NeMo and Lightning-https://towardsdatascience.com/train-conversational-ai-in-3-lines-of-code-with-nemo-and-lightning-a6088988ae37
79.Machine Learning for Healthcare by mit-https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-s897-machine-learning-for-healthcare-spring-2019/
80.pydot is an interface to Graphviz ,AutoGraph-Easy control flow for graphs,Neo4j-Graph Data Science Library,pyRDF2Vec-Representations of Entities in a Knowledge Graph,igraph,NetworkX
https://www.kdnuggets.com/2019/05/60-useful-graph-visualization-libraries.html
81.HTML tables into Google Sheets -https://towardsdatascience.com/import-html-tables-into-google-sheets-effortlessly-f471eae58ac9
82.Gradio - take input frpm user https://gradio.app/getting_started
- Mito, an editable spreadsheet inside your Jupyter Notebook. - https://trymito.io/
84.Google Introduces Document AI (DocAI) https://www.marktechpost.com/2020/11/05/google-introduces-document-ai-docai-platform-for-automated-document-processing/
85.100 Machine Learning Projects-https://amankharwal.medium.com/100-machine-learning-projects-aff22b22dd6e
86.https://towardsdatascience.com/25-hot-new-data-tools-and-what-they-dont-do-31bf23bd8e56
87.Opacus: A high-speed library for training PyTorch models-https://ai.facebook.com/blog/introducing-opacus-a-high-speed-library-for-training-pytorch-models-with-differential-privacy
88.lazynlp https://github.com/chiphuyen/lazynlp
89.yfinance to get finance data
90.Pseudo-Labeling (deal with small datasets)https://towardsdatascience.com/pseudo-labeling-to-deal-with-small-datasets-what-why-how-fd6f903213af
91.Project List A - Comparatively Easy Wine Quality Analysis,Boston Housing Prediction,Spam Email Classification,Survival Prediction - Titanic Disaster,Stock Market Prediction Class of Flower Prediction,Bigmart Sales Prediction,Air Pollution Prediction,IMDB Prediction,Optimizing Product Price,Web Traffic Time Series Forecasting,Insurance Purchase Prediction,Tweet Classification
Project List B - Comparatively Difficult,Domain-Specific Chatbot,Fake News Detection,Human Action Recognition,Video Classification,Driver Drowsiness Detection,Medical Report Gen Using CT Scans,Sign Language Detection,Image Caption Generator,Celebrity Voice Prediction,Speech Emotion Recognition,Job Recommendation System,Interest Level in Rental Properties,Google Ads Keywords Generator
https://medium.com/the-innovation/130-machine-learning-projects-solved-and-explained-605d188fb392
https://thecleverprogrammer.com/machine-learning/
https://data-flair.training/blogs/machine-learning-datasets/# https://data-flair.training/blogs/cartoonify-image-opencv-python/
https://medium.com/coders-camp/20-deep-learning-projects-with-python-3c56f7e6a721 https://amankharwal.medium.com/12-machine-learning-projects-on-object-detection-46b32adc3c37
https://amankharwal.medium.com/7-python-gui-projects-for-beginners-87ae2c695d78
https://amankharwal.medium.com/20-machine-learning-projects-for-portfolio-81e3dbd167b1 https://amankharwal.medium.com/4-chatbot-projects-with-python-5b32fd84af37
https://amankharwal.medium.com/30-python-projects-solved-and-explained-563fd7473003
https://www.aiquotient.app/projects https://www.aiquotient.app/
https://medium.com/coders-camp/20-machine-learning-projects-on-nlp-582effe73b9c
- Visual Programming (Orange) https://orange.biolab.si/
93.The Linux Command Handbook-https://www.freecodecamp.org/news/the-linux-commands-handbook/
94.130 Machine Learning Projects Solved and Explained-https://medium.com/the-innovation/130-machine-learning-projects-solved-and-explained-605d188fb392
95.DataBrew-do drag-and-drop data cleansing
96.stratascratch- https://www.stratascratch.com/
97.5 ways to celebrate TensorFlow's 5th birthday-https://blog.google/technology/ai/5-ways-celebrate-tensorflows-5th-birthday/
98.TensorFlow.js: Machine Learning in Javascript https://blog.tensorflow.org/2018/03/introducing-tensorflowjs-machine-learning-javascript.html
99.Language Interpretability Tool open-source platform for visualization and understanding of NLP models - https://pair-code.github.io/lit/
100.Deep Learning Hardware Guide https://towardsdatascience.com/another-deep-learning-hardware-guide-73a4c35d3e86
101.johnsnowlabs- https://nlp.johnsnowlabs.com/ https://nlp.johnsnowlabs.com/docs/en/quickstart https://nlp.johnsnowlabs.com/docs/en/licensed_release_notes
103.Edit a spreadsheet Generate Python https://trymito.io/?source=twitter1
104.Clarifai-https://www.clarifai.com/ https://analyticsindiamag.com/clarifai/
105.rapidly build and deploy machine learning models https://analyticsindiamag.com/top-10-datarobot-alternatives-one-must-know/
106.Hive Data full-stack AI https://thehive.ai/hive-data
107.real-time remote service to get the Keras callbacks to the telegram including the details of metrics https://github.com/ksdkamesh99/TensorGram
108.Language Interpretability Tool - https://pair-code.github.io/lit/demos/
109.Docly will handle the comments http://thedocly.io/
110.machine-learning-roadmap-2020 https://whimsical.com/machine-learning-roadmap-2020-CA7f3ykvXpnJ9Az32vYXva
111.Django models https://www.deploymachinelearning.com/#create-django-models https://www.deploymachinelearning.com/
112.freecodecamp - https://www.freecodecamp.org/learn
113.image_to_string (pytesseract)
Extract Tables in PDFs to pandas DataFrames - tabula-py
114.NLP Pipelines in a single line of code https://medium.com/analytics-vidhya/nlp-pipelines-in-a-single-line-of-code-500b3266ac7b
115.Best and Worst Cases of Machine-Learning Models https://medium.com/towards-artificial-intelligence/best-and-worst-cases-of-machine-learning-models-part-1-36cdb9296611
https://www.youtube.com/watch?v=mlumJPFvooQ&list=PLZoTAELRMXVM0zN0cgJrfT6TK2ypCpQdY
116.aitextgen #for ai text generation
117.http://introtodeeplearning.com/ http://cs231n.stanford.edu/ http://web.stanford.edu/class/cs224n/index.html#schedule https://www.youtube.com/playlist?list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A https://www.youtube.com/playlist?list=PLkFD6_40KJIwhWJpGazJ9VSj9CFMkb79A https://www.youtube.com/playlist?list=PLwRJQ4m4UJjPiJP3691u-qWwPGVKzSlNP https://www.youtube.com/playlist?list=PLoROMvodv4rMC6zfYmnD7UG3LVvwaITY5
117.https://data-flair.training/blogs/data-science-tutorials-home
118.Integrating Tableau With Python https://analyticsindiamag.com/tabpy/
Qlib https://analyticsindiamag.com/qlib/
119.Pystiche - Create Your Artistic Image Using Pystiche https://analyticsindiamag.com/pystiche/ https://pystiche.readthedocs.io/en/latest/index.html
I will be so happy that this repository helps you. Thank you for reading.
HAPPY LEARNING