Unidata
unidata.pro › home › unidata blog › datasets › dataset collections › 20 best financial datasets for machine learning
20 Best Financial Datasets for Machine Learning — Unidata
March 4, 2026 - It’s clean, updated, and widely supported in tools like yfinance for Python. Just don’t expect ground-truth labels like “buy” or “sell” — this one’s raw prices only. ... dataset_name: Yahoo Finance – S&P 500 type: Market Data access: Free (commercial use permitted via yfinance) format: CSV via yfinance, JSON via API wrappers ideal_for: LSTM models, trend detection, basic backtesting notes: No labeled targets (raw prices only)
Kaggle
kaggle.com › datasets
Find Open Datasets and Machine Learning Projects
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
[P] finance dataset
Hello everyone, I hope you are all doing well. I have been looking for hours but can’t find a dataset set with historical stock information such as… More on reddit.com
What finance libraries/APIs do you use?
ccxt to connect to centralised crypto exchanges FFN (financial functions in python) backtesting.py polars and parquet (instead of pandas and csv) yfinance I used in the past and the experience was pretty decent More on reddit.com
Best python library to use to process to large data sets
Well you can use polars or pandas they have great api. First you load the dataset to dataframe then it's like using excel with code. I prefer polars because it's faster but pandas has more documentation. More on reddit.com
Python Datasets
Python has a few datatypes, far fewer then most languages. We have your single values, strings, ints, floats. Our singletons True, False, None. A list (which is not precisely an array). Our hashmaps, set() and dicts. We can add more matrix style, and precision by importing things like numpy. What makes Python’s data sets powerful, yet sub optimal, is that everything is a reference in memory. In this way we don’t make arrays like int[], in which it’s a list that must be integers, which can be more memory efficient. That would be in “type strict” languages. What this means for Python programmers…is a lot less work to do thing a bit slower, but easier to program, maintain and read. What’s in portent is we nest types, list[dict[str, list[int]] really easily and can automatically access everything. Beyond that we have classes, in which we can have an object with attributes set for us, this comes closer to a type, as we can methods or functions that use those datasets. Everything g in coding is building up from simple steps doing complex logic. Really mastering dictionary, and list of dictionaries will help you out a lot. More on reddit.com
What are the best free financial datasets for machine learning?
Some of the best free options include Yahoo Finance for market prices, FRED for macroeconomic indicators, World Bank Open Data for global metrics, and Kaggle’s cryptocurrency archives. These datasets are clean, well-documented, and suitable for a wide range of ML tasks — from forecasting to risk modeling.
unidata.pro
unidata.pro › home › unidata blog › datasets › dataset collections › 20 best financial datasets for machine learning
20 Best Financial Datasets for Machine Learning — Unidata
What are the best practices for handling missing data in financial datasets?
Start by identifying the pattern: is the data missing at random, completely at random, or not at random? For time-series data, use forward or backward filling with caution to avoid data leakage. In tabular datasets, consider statistical imputation (mean, median, mode) or model-based methods like KNN or regression imputation. Always document your imputation strategy and validate that it doesn’t distort downstream predictions — especially in regulated environments.
labelyourdata.com
labelyourdata.com › home › articles › financial datasets: top resources for ml engineers (free & paid)
Financial Datasets: Top Resources for ML Engineers in 2026 (Free ...
What is a financial dataset?
A financial dataset contains structured or unstructured data related to markets, transactions, assets, or economic indicators. It’s used to train machine learning models for tasks like forecasting, risk analysis, and fraud detection. For instance, a financial transaction dataset typically includes timestamped records of purchases, transfers, or payments, and is used to detect suspicious activity, predict spending behavior, or train fraud detection models.
labelyourdata.com
labelyourdata.com › home › articles › financial datasets: top resources for ml engineers (free & paid)
Financial Datasets: Top Resources for ML Engineers in 2026 (Free ...
Videos
GitHub
github.com › virattt › financial-datasets
GitHub - virattt/financial-datasets: Financial datasets for LLMs 🧪
Financial Datasets is an open-source Python library that lets you create question & answer financial datasets using Large Language Models (LLMs).
Starred by 419 users
Forked by 66 users
Languages Python
Medium
blog.cambridgespark.com › 50-free-machine-learning-datasets-part-two-financial-and-economic-datasets-6620274ee593
50 free Machine Learning datasets: finance and economics | by Cambridge Spark | Cambridge Spark
February 4, 2019 - Quandl’s a great portal for finding economic and financial data, which is useful for building models to predict economic indicators or stock prices. To download data, you’ll need to register on the site. Exporting data in Python also requires you to download the Quandl Python Package. It’s also worth noting that you’ll need to filter to ‘free’ to list the free financial datasets, otherwise you’ll need to purchase a licence (costing $1200+) to access premium datasets.
Weights & Biases
wandb.ai › byyoung3 › ml-news › reports › A-survey-of-financial-datasets-for-machine-learning--Vmlldzo2NzczMjc3
A survey of financial datasets for machine learning | ml-news
1 month ago - Weights & Biases, developer tools for machine learning
Reddit
reddit.com › r/machinelearning › [p] finance dataset
r/MachineLearning on Reddit: [P] finance dataset
March 15, 2025 - r/datasets • · upvotes · · comments · Anyone tried Financial Data API? r/quant • · upvotes · · comments · Three Clojure libraries for financial data acquisition: clj-yfinance, ecbjure, edgarjure · r/Clojure • · upvotes · · comments · [D] Some concerns about the current state of machine learning research ·
iMerit
imerit.net › home › 20 best finance economic datasets for machine learning
20 Best Finance Economic Datasets for Machine Learning | iMerit
April 18, 2025 - Quandl: One of the premier sources for financial datasets, Quandl has been used by over 250,000 analysts, asset managers, and investment banks for years. The data has consistently proven to be reliable, accurate, and useful in prediction modeling. EU Open Data Portal: This repository of information originally published by EU institutions and agencies in 2012 contains information ranging from the environment & employment to science & education. World Bank Open Data: Among financial datasets, World Bank Open Data is unique.
GitHub
github.com › firmai › financial-machine-learning
GitHub - firmai/financial-machine-learning: A curated list of practical financial machine learning tools and applications. · GitHub
Collections of news/articles on various topics including quant trading and machine learning. Some articles are from ycombinator message board and rediit algotrading forum ... open source library maintained by hudson and thames though much of the content has moved to a subscription model. Idea is to implement academic research in python code and aggregate it as a package. Sources from Journal of financial data science / journal of portfolio management / journal of algorithmic finance / cambridge university press
Starred by 8.5K users
Forked by 1.4K users
Languages Python
Kaggle
kaggle.com › questions-and-answers › 65198
Finance Projects in machine learning
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
Central Washington University
libguides.lib.cwu.edu › c.php
Economic and Financial DataSets - Data Sets - LibGuides at Central Washington University
March 11, 2026 - This is an Open-Access resource programmatic access to US, Asia Pacific, European, and other global market statistics (stock prices, company filings, economic indicators such as GDP & inflation, foreign exchange rates, etc.). CWU affiliates can refer to the following data guide for sample Python code scripts or reach out to support@alphavantage.co for 24/7 technical support. Bureau of Economic Analysis / U.S. Department of Commerce This link opens in a new window · Bureau of Economic Analysis (BEA) promotes a better understanding of the U.S. economy by providing the most timely, relevant, and accurate economic accounts data in an objective and cost-effective manner. ... The stock API industry, and financial data in general, have nuances that explain quality, evolution, and how knowing this can benefit your algorithms.
Training The Street
trainingthestreet.com › news & blog › python for finance
Python for the Finance Industry | Seven Common Use Cases
June 4, 2025 - It comes bundled with hundreds of essential packages—like NumPy, pandas, matplotlib, and scikit-learn—and includes powerful tools like Jupyter Notebook and Spyder IDE. Anaconda simplifies environment setup, especially for professionals working with large datasets and statistical models. With a single install, you get everything needed to run Python code for financial analysis, machine learning, and visualization—without worrying about dependency conflicts.
arXiv
arxiv.org › abs › 2304.13174
[2304.13174] Dynamic Datasets and Market Environments for Financial Reinforcement Learning
April 25, 2023 - The open-source codes for the data ... UTC (3,090 KB) ... View a PDF of the paper titled Dynamic Datasets and Market Environments for Financial Reinforcement Learning, by Xiao-Yang Liu and 8 other authors...
data.world
data.world › datasets › finance
Sign in | data.world
Skip to main content · Loading · About data.world · Terms & Privacy · © 2026 data.world, inc
Twine
twine.net › home › the best finance datasets of 2022
The Best Finance Datasets of 2022 | Twine
September 29, 2022 - One of the premier sources for financial datasets, Quandl has been used by over 250,000 analysts, asset managers, and investment banks for years. The data has consistently proven to be reliable, accurate, and useful in prediction modeling. ... This dataset covers population demographics throughout the world, along with a wide variety of economic and development indicators that are useful for predictive modeling.