Skip to content

trajanov/ML-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

ML Libraries and Resources

Data tools

Data exploration analysis

Missingno: Missing data visualizations

https://github.com/ResidentMario/missingno

Missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset.

NLP

Topic Modelling

ML models

Time Series

sktime

https://github.com/sktime/sktime-dl

A scikit-learn compatible Python toolbox for learning with time series. sktime currently supports:

  • State-of-the-art time series classification and regression algorithms,
  • Classical forecasting including reduction strategies to regression,
  • Benchmarking and post-hoc evaluation methods based on mlaut.

sktime-dl

https://github.com/sktime/sktime-dl

An extension package for deep learning with Keras for sktime, a scikit-learn compatible Python toolbox for learning with time series and panel data. Currently, classification models based off the the networks in dl-4-tsc have been implemented, as well as an example of a tuned network for future development.

Datasets and data access API

Finance data

Financial Modeling Prep API

[https://financialmodelingprep.com/developer/docs]

Financial Modeling Prep API provides real time stock price, company financial statements, major index prices, stock historical data, forex real time rate and cryptocurrencies. Access this free stock quote API in real time, get the company reports quarterly or annually format, and this API goes up to 10 years back in history. The API offer access to the following data:

  • Company Valuation
  • Stock Time Series
  • Stock Market
  • Cryptocurrencies
  • Forex (FX)

Text

News

Wikinews

[https://en.wikipedia.org/wiki/Wikinews]

Wikinews is a free-content news source wiki and a project of the Wikimedia Foundation. The site works through collaborative journalism. News are categorised by region and by topic. Included topics are:

  • Crime and law
  • Culture and entertainment
  • Disasters and accidents
  • Economy and business
  • Education
  • Environment
  • Health
  • Obituaries
  • Politics and conflicts
  • Science and technology
  • Sports
  • Wackynews

Dictionary

Stop-words list

[https://github.com/terrier-org/terrier-desktop/blob/master/share/stopword-list.txt]

List of 733 stop words

Sentiment

Relation extraction

Named Entity Extraction & Linking

Making Sense of Microposts

The dataset consists of tweets extracted from a collection of over 18 million tweets. The dataset includes event-annotated tweets provided by the Redites project. The task of the challenge is to automatically recognise entities and their types from English microposts, and link them to the corresponding English DBpedia 2014 resources.

Datasets for Entity Recognition

https://github.com/juand-r/entity-recognition-datasets This repository contains datasets from several domains annotated with a variety of entity types, useful for entity recognition and named entity recognition (NER) tasks.

Biomedical

Medical Information Extraction

https://www.figure-eight.com/dataset/medical-sentence-summary-and-relation-extraction/

This dataset contains 3,984 medical sentences extracted from PubMed abstracts and relationships between discrete medical terms were annotated. This dataset focuses primarily on “treat” and “cause” relationships, with 1,043 sentences containing treatment relations and 1,787 containing causal ones.

About

Collection of ML libraries, datasets, courses ...

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published