Assuming no header in the CSV file:

import pandas
import random

n = 1000000 #number of records in file
s = 10000 #desired sample size
filename = "data.txt"
skip = sorted(random.sample(range(n),n-s))
df = pandas.read_csv(filename, skiprows=skip)

would be better if read_csv had a keeprows, or if skiprows took a callback func instead of a list.

With header and unknown file length:

import pandas
import random

filename = "data.txt"
n = sum(1 for line in open(filename)) - 1 #number of records in file (excludes header)
s = 10000 #desired sample size
skip = sorted(random.sample(range(1,n+1),n-s)) #the 0-indexed header will not be included in the skip list
df = pandas.read_csv(filename, skiprows=skip)
Answer from dlm on Stack Overflow
🌐
Pandas
pandas.pydata.org β€Ί docs β€Ί reference β€Ί api β€Ί pandas.read_csv.html
pandas.read_csv β€” pandas 3.0.1 documentation
Encoding to use for UTF when reading/writing (ex. 'utf-8'). List of Python standard encodings . encoding_errorsstr, optional, default β€˜strict’ Β· How encoding errors are treated. List of possible values . ... If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting. If it is necessary to override values, a ParserWarning will be issued. See csv.Dialect documentation for more details.
🌐
DataCamp
datacamp.com β€Ί tutorial β€Ί pandas-read-csv
pandas read_csv() Tutorial: Importing Data | DataCamp
December 23, 2025 - For example, pd.read_csv('file.csv', comment='#'). Use the header parameter to specify which row to use as the column names. If there are multiple header rows, you can also use the names parameter to assign new column names. If the file structure is complex, you might need to pre-process the ...
🌐
W3Schools
w3schools.com β€Ί python β€Ί pandas β€Ί pandas_csv.asp
Pandas Read CSV
CSV files contains plain text and is a well know format that can be read by everyone including Pandas. In our examples we will be using a CSV file called 'data.csv'.
🌐
Medium
medium.com β€Ί analytics-vidhya β€Ί make-the-most-out-of-your-pandas-read-csv-1531c71893b5
Make the Most Out of your pandas.read_csv() | by Melissa Rodriguez | Analytics Vidhya | Medium
December 17, 2019 - Here is the csv file and code I tried first to import the fertility rate data used for my previous blogs: ... For my analysis I want to use all columns except the ones named Indicator Name and Indicator Code. Also the column for 2018 year is empty so I do not need it as well. #import pandas library import pandas as pd#import fertility rate data df = pd.read_csv('data/API_SP.DYN.TFRT.IN_DS2_en_csv_v2_41035.csv', skiprows = 4)#remove unnecesary columns: df = df.drop(columns = ['Indicator Name','Indicator Code','Unnamed: 63','2018'])#renaming columns df.rename(columns={'Country Name':'CountryName', 'Country Code':'CountryCode3'}, inplace=True)df.head()
Top answer
1 of 13
110

Assuming no header in the CSV file:

import pandas
import random

n = 1000000 #number of records in file
s = 10000 #desired sample size
filename = "data.txt"
skip = sorted(random.sample(range(n),n-s))
df = pandas.read_csv(filename, skiprows=skip)

would be better if read_csv had a keeprows, or if skiprows took a callback func instead of a list.

With header and unknown file length:

import pandas
import random

filename = "data.txt"
n = sum(1 for line in open(filename)) - 1 #number of records in file (excludes header)
s = 10000 #desired sample size
skip = sorted(random.sample(range(1,n+1),n-s)) #the 0-indexed header will not be included in the skip list
df = pandas.read_csv(filename, skiprows=skip)
2 of 13
84

@dlm's answer is great but since v0.20.0, skiprows does accept a callable. The callable receives as an argument the row number.

Note also that their answer for unknown file length relies on iterating through the file twice -- once to get the length, and then another time to read the csv. I have three solutions here which only rely on iterating through the file once, though they all have tradeoffs.

Solution 1: Approximate Percentage

If you can specify what percent of lines you want, rather than how many lines, you don't even need to get the file size and you just need to read through the file once. Assuming a header on the first row:

import pandas as pd
import random
p = 0.01  # 1% of the lines
# keep the header, then take only 1% of lines
# if random from [0,1] interval is greater than 0.01 the row will be skipped
df = pd.read_csv(
         filename,
         header=0, 
         skiprows=lambda i: i>0 and random.random() > p
)

As pointed out in the comments, this only gives approximately the right number of lines, but I think it satisfies the desired usecase.

Solution 2: Every Nth line

This isn't actually a random sample, but depending on how your input is sorted and what you're trying to achieve, this may meet your needs.

n = 100  # every 100th line = 1% of the lines
df = pd.read_csv(filename, header=0, skiprows=lambda i: i % n != 0)

Solution 3: Reservoir Sampling

(Added July 2021)

Reservoir sampling is an elegant algorithm for selecting k items randomly from a stream whose length is unknown, but that you only see once.

The big advantage is that you can use this without having the full dataset on disk, and that it gives you an exactly-sized sample without knowing the full dataset size. The disadvantage is that I don't see a way to implement it in pure pandas, I think you need to drop into python to read the file and then construct the dataframe afterwards. So you may lose some functionality from read_csv or need to reimplement it, since we're not using pandas to actually read the file.

Taking an implementation of the algorithm from Oscar Benjamin here:

from math import exp, log, floor
from random import random, randrange
from itertools import islice
from io import StringIO

def reservoir_sample(iterable, k=1):
    """Select k items uniformly from iterable.

    Returns the whole population if there are k or fewer items

    from https://bugs.python.org/issue41311#msg373733
    """
    iterator = iter(iterable)
    values = list(islice(iterator, k))

    W = exp(log(random())/k)
    while True:
        # skip is geometrically distributed
        skip = floor( log(random())/log(1-W) )
        selection = list(islice(iterator, skip, skip+1))
        if selection:
            values[randrange(k)] = selection[0]
            W *= exp(log(random())/k)
        else:
            return values

def sample_file(filepath, k):
    with open(filepath, 'r') as f:
        header = next(f)
        result = [header] + sample_iter(f, k)
    df = pd.read_csv(StringIO(''.join(result)))

The reservoir_sample function returns a list of strings, each of which is a single row, so we just need to turn it into a dataframe at the end. This assumes there is exactly one header row, I haven't thought about how to extend it to other situations.

I tested this locally and it is much faster than the other two solutions. Using a 550 MB csv (January 2020 "Yellow Taxi Trip Records" from the NYC TLC), solution 3 runs in about 1 second, while the other two take ~3-4 seconds.

In my test this is even slightly (~10-20%) faster than @Bar's answer using shuf, which surprises me.

🌐
GeeksforGeeks
geeksforgeeks.org β€Ί pandas β€Ί python-read-csv-using-pandas-read_csv
Pandas Read CSV in Python - GeeksforGeeks
In this example, we will take a CSV file and then add some special characters to see how the sep parameter works. ... import pandas as pd data = """totalbill_tip, sex:smoker, day_time, size 16.99, 1.01:Female|No, Sun, Dinner, 2 10.34, 1.66, Male, No|Sun:Dinner, 3 21.01:3.5_Male, No:Sun, Dinner, 3 23.68, 3.31, Male|No, Sun_Dinner, 2 24.59:3.61, Female_No, Sun, Dinner, 4 25.29, 4.71|Male, No:Sun, Dinner, 4""" with open("sample.csv", "w") as file: file.write(data) print(data)
Published Β  February 18, 2026
Find elsewhere
🌐
AskPython
askpython.com β€Ί home β€Ί how to read csv with headers using pandas?
How to Read CSV with Headers Using Pandas? - AskPython
January 21, 2026 - When you call pd.read_csv(), Pandas scans the first row of your CSV file and treats it as column names. This behavior is controlled by the header parameter, which defaults to header=0.
🌐
PyImageSearch
pyimagesearch.com β€Ί home β€Ί blog β€Ί read csv file using pandas read_csv (pd.read_csv)
Read csv file using Pandas read_csv (pd.read_csv) - PyImageSearch
November 30, 2024 - Below is a simple example to demonstrate how to use the usecols parameter with pd.read_csv. # Import the pandas library import pandas as pd # Assume we have a CSV file named 'large_video_game_sales.csv' # We only need the columns 'Name', 'Platform', and 'Global_Sales' # Load specific columns using the usecols parameter specific_columns = pd.read_csv( './data/large_video_game_sales.csv', usecols=['Name', 'Platform', 'Global_Sales'] ) # Display the first few rows to verify the data print(specific_columns.head())
🌐
Spark By {Examples}
sparkbyexamples.com β€Ί home β€Ί pandas β€Ί pandas read_csv() with examples
Pandas read_csv() with Examples - Spark By {Examples}
June 5, 2025 - In this article, I will explain the usage of some of these options with examples. To read a CSV file with comma delimiter use pandas.read_csv() and to read tab delimiter (\t) file use read_table().
Top answer
1 of 3
4

just some explanation aside. Before you can use pd.read_csv to import your data, you need to locate your data in your filesystem.

Asuming you use a jupyter notebook or pyton file and the csv-file is in the same directory you are currently working in, you just can use:

import pandas as pd SouthKoreaRoads_df = pd.read_csv('SouthKoreaRoads.csv')

If the file is located in another directy, you need to specify this directory. For example if the csv is in a subdirectry (in respect to the python / jupyter you are working on) you need to add the directories name. If its in folder "data" then add data in front of the file seperated with a "/"

import pandas as pd SouthKoreaRoads_df = pd.read_csv('data/SouthKoreaRoads.csv')

Pandas accepts every valid string path and URLs, thereby you could also give a full path.

import pandas as pd SouthKoreaRoads_df = pd.read_csv('C:\Users\Ron\Desktop\Clients.csv')

so until now no OS-package needed. Pandas read_csv can also pass OS-Path-like-Objects but the use of OS is only needed if you want specify a path in a variable before accessing it or if you do complex path handling, maybe because the code you are working on needs to run in a nother environment like a webapp where the path is relative and could change if deployed differently.

please see also:

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html https://docs.python.org/3/library/os.path.html

BR

2 of 3
0
SouthKoreaRoads = pd.read_csv("./SouthKoreaRoads.csv")

Try this and see whether it could help!

🌐
Programiz
programiz.com β€Ί python-programming β€Ί pandas β€Ί csv
Pandas CSV (With Examples)
In this example, we read a CSV file using the read_csv() method. We specified some arguments while reading the file to load the necessary data in appropriate format. ... We used read_csv() to read data from a CSV file into a DataFrame. Pandas also provides the to_csv() function to write data ...
🌐
Pandas
pandas.pydata.org β€Ί docs β€Ί dev β€Ί reference β€Ί api β€Ί pandas.read_csv.html
pandas.read_csv β€” pandas documentation
Encoding to use for UTF when reading/writing (ex. 'utf-8'). List of Python standard encodings . encoding_errorsstr, optional, default β€˜strict’ Β· How encoding errors are treated. List of possible values . ... If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quoting. If it is necessary to override values, a ParserWarning will be issued. See csv.Dialect documentation for more details.
🌐
Fabi
fabi.ai β€Ί blog β€Ί how-to-read-a-csv-with-python-pandas-made-easy
How to read a CSV with Python pandas (made easy) | Fabi.ai
In this tutorial we explore using Python pandas pd.read_csv() to read a CSV file into a DataFrame. We also explore advanced parameters that are commonly used and useful along with some other ways to analyze CSV data using Python.
🌐
MachineLearningPlus
machinelearningplus.com β€Ί pandas β€Ί pandas-read_csv-completed
Pandas read_csv() - How to read a csv file in Python - MachineLearningPlus
March 8, 2022 - Syntax: pandas.read_csv( filepath_or_buffer, sep, header, index_col, usecols, prefix, dtype, converters, skiprows, skiprows, nrows, na_values, parse_dates)Purpose: Read a comma-separated values (csv) file into DataFrame.
🌐
Reddit
reddit.com β€Ί r/python β€Ί i wrote a detailed guide of how pandas' read_csv() function actually works and the different engine options available, including new features in v2.0. figured it might be of interest here!
r/Python on Reddit: I wrote a detailed guide of how Pandas' read_csv() function actually works and the different engine options available, including new features in v2.0. Figured it might be of interest here!
March 30, 2023 - I don't know why you would expect a function called read_csv to be simple and parsimonious though. CSV is not a standardized file format, there's probably just many variations on it as there CSV files. I'm not saying pandas has the best API, but of course complex things have complex solutions.
🌐
Pandas
pandas.pydata.org β€Ί pandas-docs β€Ί version β€Ί 2.0 β€Ί reference β€Ί api β€Ί pandas.read_csv.html
pandas.read_csv β€” pandas 2.0.3 documentation
Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any os.PathLike. By file-like object, we refer to objects with a read() method, such as a file handle (e.g.
🌐
Pandas
pandas.pydata.org β€Ί docs β€Ί getting_started β€Ί intro_tutorials β€Ί 02_read_write.html
How do I read and write tabular data? β€” pandas 3.0.1 documentation
I want to analyze the Titanic passenger data, available as a CSV file. ... pandas provides the read_csv() function to read data stored as a csv file into a pandas DataFrame.
🌐
Pandas
pandas.pydata.org β€Ί pandas-docs β€Ί version β€Ί 1.5 β€Ί reference β€Ί api β€Ί pandas.read_csv.html
pandas.read_csv β€” pandas 1.5.3 documentation
Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.csv. If you want to pass in a path object, pandas accepts any os.PathLike. By file-like object, we refer to objects with a read() method, such as a file handle (e.g.
🌐
Pandas
pandas.pydata.org β€Ί pandas-docs β€Ί version β€Ί 0.19.0 β€Ί generated β€Ί pandas.read_csv.html
pandas.read_csv β€” pandas 0.19.0 documentation
pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, ...
🌐
Python Basics
pythonbasics.org β€Ί read-csv-with-pandas
Read CSV with Pandas - Python Tutorial
To read the csv file as pandas.DataFrame, use the pandas function read_csv() or read_table().