Ten Thousand German News Articles Dataset for Topic Classification
-
Updated
Nov 7, 2022 - Python
Ten Thousand German News Articles Dataset for Topic Classification
Download, parse, store, and load text datasets instead of storing it in packages
NoiseMix - data generation for natural language
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of images, text, videos, meta data, and more. Ideal for machine learning and deep learning engineers. Download and extract data with just one line of code
Complete Web Scraping of TED.com for Metadata, Transcript, Audio, Video, Images using Parallel Programming
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
This repo contains LUO corpus for Named Entity Recognition. The text comes from the news domain and was scrapped from Radio Ramogi.
A bash script to scrap shakespeare works from shakespeare.mit.edu + Already scraped plays in txt format
the interface for text character analysis.
Data analysis project on Fake job posting dataset using Machine Learning and NLP basics
Awesomes for Open Source Large Language Models and Datasets.
Neural Network aided diagnosis of Schizophrenia via patient-centered text Data
Contains Adhola-English parallel sentences that can be used for Machine Translation.
Using a Python class called JobScraper, it receives job information from various APIs. The goal is to receive job information from various APIs and avoid network problems such as connection failure, timeout, or blocking
Compilation of texts from WoW alphas and betas. Used by https://github.com/The-Alpha-Project/Text-Crawler-Website
biographies, quotes and talk transcripts of John von Neumann
DupliPy is a quick and easy-to-use package that can handle text formatting and data augmentation tasks for NLP in Python. It now offers support for image augmentation tasks as well.
This python script will generate n pages of text with bbox and its ground truth labels. Also it supports various background colors, fonts etc. Additionally it can export the dataset as tfrecord
Add a description, image, and links to the text-datasets topic page so that developers can more easily learn about it.
To associate your repository with the text-datasets topic, visit your repo's landing page and select "manage topics."