Pandas: Merge All CSV in a Folder

Posted Under: Pandas, Python, Tutorials on Oct 4, 2023

In this tutorial, we'll explore how to use Pandas to merge all CSV files in a specific folder. This is particularly useful when dealing with large datasets split across several files. Firstly, we'll import the required libraries.

import pandas as pd
import os

Now getting the list of all CSV files in the particular folder. To do this, we'll use os and glob.

import glob
folder_name = '/your_folder_path'
file_type = 'csv'
seperator =','
dataframe = pd.concat([pd.read_csv(f, sep=seperator) for f in glob.glob(folder_name + "/*."+file_type)],ignore_index=True)

The `glob.glob` function returns a list of file paths, which we then pass to the `pd.read_csv` function. This will return a list of data frames. The last step is to concatenate them together into a single data frame, which is done by `pd.concat`. The `ignore_index` parameter is set to True so that the index gets reset in the final merged dataframe. Next, let's write the merged data frame to a new CSV file.

dataframe.to_csv("/output_folder_path/merged.csv")

This will save the merged data frame to a new CSV file named `merged.csv`.

Handling Large Datasets

In the case of handling large datasets that don't fit into memory, we can do the concatenation in chunks. Supposing we still want to use `ignore_index=True`:

chunksize = 10 ** 6
chunks = []
for f in glob.glob(folder_name + "/*."+file_type):
    chunk = pd.read_csv(f, chunksize=chunksize)
    chunks.append(pd.concat(chunk, ignore_index=True))

df = pd.concat(chunks, ignore_index=True)

This way, each CSV file will be read and concatenated in chunks, and only then will the results be combined. Remember to always check the merged data for consistency and correctness, as issues can be complexified when dealing with multiple sources of data.

        Click to Load Comments
    

		Python Write to File These examples cover various ways to write different types of data to...
		Python convert string to array of chars To convert a string into an array (or list) of characters in Python, y...
		Python convert list to string with brackets To convert a Python list to a string with brackets, you can use the st...
		Python insert variable in string with curly braces In Python, you can use f-strings (formatted string literals) to inser...
		Python Read File Line by Line Reading a file line by line is a common task in Python, especially wh...
		Python check if variable is not null or empty In Python, to check if a variable is not None and is not an empty str...
		Python declare variable without value In Python, you can declare a variable without assigning it a value by ...
		List of Months in Python One of the common tasks in data handling or manipulation, especially ...

Pandas: Merge All CSV in a Folder

Handling Large Datasets

Subscribe YouTube

Related Posts