https://github.com/ResidentMario/missingno
Missingno provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset.
https://github.com/sktime/sktime-dl
A scikit-learn compatible Python toolbox for learning with time series. sktime currently supports:
- State-of-the-art time series classification and regression algorithms,
- Classical forecasting including reduction strategies to regression,
- Benchmarking and post-hoc evaluation methods based on mlaut.
https://github.com/sktime/sktime-dl
An extension package for deep learning with Keras for sktime, a scikit-learn compatible Python toolbox for learning with time series and panel data. Currently, classification models based off the the networks in dl-4-tsc have been implemented, as well as an example of a tuned network for future development.
[https://financialmodelingprep.com/developer/docs]
Financial Modeling Prep API provides real time stock price, company financial statements, major index prices, stock historical data, forex real time rate and cryptocurrencies. Access this free stock quote API in real time, get the company reports quarterly or annually format, and this API goes up to 10 years back in history. The API offer access to the following data:
- Company Valuation
- Stock Time Series
- Stock Market
- Cryptocurrencies
- Forex (FX)
[https://en.wikipedia.org/wiki/Wikinews]
Wikinews is a free-content news source wiki and a project of the Wikimedia Foundation. The site works through collaborative journalism. News are categorised by region and by topic. Included topics are:
- Crime and law
- Culture and entertainment
- Disasters and accidents
- Economy and business
- Education
- Environment
- Health
- Obituaries
- Politics and conflicts
- Science and technology
- Sports
- Wackynews
[https://github.com/terrier-org/terrier-desktop/blob/master/share/stopword-list.txt]
List of 733 stop words
The dataset consists of tweets extracted from a collection of over 18 million tweets. The dataset includes event-annotated tweets provided by the Redites project. The task of the challenge is to automatically recognise entities and their types from English microposts, and link them to the corresponding English DBpedia 2014 resources.
- 2014 http://ceur-ws.org/Vol-1141/
- 2015 http://ceur-ws.org/Vol-1395/
- 2016 http://microposts2016.seas.upenn.edu/challenge.html
https://github.com/juand-r/entity-recognition-datasets This repository contains datasets from several domains annotated with a variety of entity types, useful for entity recognition and named entity recognition (NER) tasks.
https://www.figure-eight.com/dataset/medical-sentence-summary-and-relation-extraction/
This dataset contains 3,984 medical sentences extracted from PubMed abstracts and relationships between discrete medical terms were annotated. This dataset focuses primarily on “treat” and “cause” relationships, with 1,043 sentences containing treatment relations and 1,787 containing causal ones.