Google bigquery - Error message 'DataFrame' object has no attribute 'to_parquet' whereas pyarrow and fastparquet are installed

stackoverflow.com › questions › 58127679 › google-bigquery-error-message-dataframe-object-has-no-attribute-to-parquet

Solved with:

$ pip install --upgrade pandas

Answer from Nestor Solalinde on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 58127679 › google-bigquery-error-message-dataframe-object-has-no-attribute-to-parquet

Google bigquery - Error message 'DataFrame' object has no attribute 'to_parquet' whereas pyarrow and fastparquet are installed - Stack Overflow

Top answer

1 of 1

Solved with:

$ pip install --upgrade pandas

Stack Overflow

stackoverflow.com › questions › 62498887 › pandas-dataframe-object-has-no-attribute-write-when-trying-to-save-it-locall

python - Pandas 'DataFrame' object has no attribute 'write' when trying to save it locally in Parquet file - Stack Overflow

Top answer

1 of 1

Please check here to write pandas dataframe as parquet

df.to_parquet('df.parquet.gzip',
              compression='gzip')

GitHub

github.com › dask › fastparquet › issues › 229

.to_parquet() returns AttributeError: 'bool' object has no attribute 'writer' · Issue #229 · dask/fastparquet

File "/opt/anaconda3/lib/python3.6/site-packages/dask/dataframe/core.py", line 955, in to_parquet return to_parquet(path, self, *args, **kwargs)

Cloudera Community

community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093

Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'

January 2, 2024 - So, if someone could help resolve this issue that would be most appreciated ... As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method. result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame ...

Pandas

pandas.pydata.org › docs › reference › api › pandas.DataFrame.to_parquet.html

pandas.DataFrame.to_parquet — pandas 3.0.2 documentation

Write a DataFrame to the binary parquet format.

GitHub

github.com › dask › fastparquet › issues › 100

Fastparquet write :AttributeError: 'Series' object has no attribute 'valid' · Issue #100 · dask/fastparquet

June 3, 2017 - Hi, I am using a Dask data-frame as follows: fastparquet.write(parquet_dir, X_df, file_scheme='hive',compression="SNAPPY") Stack trace: AttributeError Traceback (most recent call last) in () 2 parquet_dir=createNewDir() 3 print parquet_dir ----> 4 fastparquet.write(parquet_dir, X_df, file_scheme='hive',compression="SNAPPY") 5 6 # X_df.to_parquet(parquet_dir, compression='SNAPPY')

Author ghost

Brainly

brainly.com › computers and technology › high school › if facing the error "dataframe' object has no attribute 'write'," what could be a common reason for this issue when working with pandas in python? a. the 'write' method is not applicable to the dataframe object b. the pandas library is not properly installed or imported c. dataframes in pandas are not designed for direct writing to files d. the 'write' method is misspelled in the code

[FREE] If facing the error "DataFrame' object has no attribute 'write'," what could be a common reason for this - brainly.com

November 19, 2023 - The common reason for the error 'Dataframe' object has no attribute 'write' when working with pandas in Python is that DataFrames in pandas are not designed for direct writing to files.

Dataiku Community

community.dataiku.com › questions & discussions › using dataiku

AttributeError: DataFrame object has no attribute _session — Dataiku Community

August 3, 2022 - Use dataset.write_dataframe() instead.") 154 --> 155 df_connection_name = df._session.dss_connection_name if hasattr(df._session, "dss_connection_name") else None 156 dataset_config = dataset.get_config() 157 dataset_info = dataset.get_location_info()["info"] /data/dataiku/datadir/code-envs/python/snowpark/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py in __getattr__(self, name) 521 # Snowflake DB ignores cases when there is no quotes. 522 if name.lower() not in [c.lower() for c in self.columns]: --> 523 raise AttributeError( 524 f"{self.__class__.__name__} object has no attribute {name}" 525 ) AttributeError: DataFrame object has no attribute _session

GitHub

github.com › microsoft › FLAML › issues › 625

AttributeError: 'DataFrame' object has no attribute 'copy' · Issue #625 · microsoft/FLAML

July 2, 2022 - AttributeError: 'DataFrame' object has no attribute 'copy'#625 · Copy link · Labels · enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed · Shafi2016 · opened · on Jul 2, 2022 · Issue body actions · I m using autoML(FLAML) with Spark on large data. The error image is given below · train = spark.read.parquet("./train.parquet") test = spark.read.parquet("./test.parquet") input_cols = [c for c in train.columns if c != 'target'] vectorAssembler = VectorAssembler(inputCols = input_cols, outputCol = 'features') vectorAssembler

Author Shafi2016

Find elsewhere

Google Bing Mojeek

GitHub

github.com › apache › arrow › issues › 4030

`df.to_parquet('s3://...', partition_cols=...)` fails with `'NoneType' object has no attribute '_isfilestore'` · Issue #4030 · apache/arrow

March 26, 2019 - df.to_parquet('s3://...', partition_cols=...) fails with 'NoneType' object has no attribute '_isfilestore'#4030 ... According to https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#partitioning-parquet-files, writing a parquet to S3 with partition_cols should work, but it fails for me. Example script: import pandas as pd import sys print(sys.version) print(pd.__version__) df = pd.DataFrame([{'a': 1, 'b': 2}]) df.to_parquet('s3://my_s3_bucket/x.parquet', engine='pyarrow') print('OK 1') df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow') print('OK 2')

Stack Overflow

stackoverflow.com › questions › 66037297 › pyspark-dataframe-write-attributeerror-nonetype-object-has-no-attribute

python - Pyspark - dataframe..write - AttributeError: 'NoneType' object has no attribute 'mode' - Stack Overflow

Traceback (most recent call last): ... 'NoneType' object has no attribute 'mode' ... The writing mode should be specified for DataFrameWriter not after save as you did (which returns nothing "None", thus the error ...

Databricks Community

community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132

AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132

February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...

Stack Overflow

stackoverflow.com › questions › 45637944 › create-function-for-dataframe-to-parquet-creation

python - Create function for dataframe to parquet creation - Stack Overflow

Top answer

1 of 1

Your code as it stands is trying to create a parquet file from a dataframe called df_name which is passed as an argument. What it receives instead of a dataframe is a string, so it fails. The ideal scenario here is that instead of passing in the string name of a dataframe, you pass in the object, like so:

df = ... # define a dataframe
filename = ... # some filename
parquet_create(df, filename)

It sounds like that's not an option for you for some reason, so there are a couple of workarounds. You can create a dictionary relating string dataframe names to dataframe objects like this:

df = ... # define a dataframe
df2 = ... # define another dataframe
filename = ... # some filename
name_map = {'df': df, 'df2': df2}
parquet_create('df', filename)

and define parquet_create like this:

def parquet_create(df_name, file_name):
    name_map[df_name].write.mode('overwrite').parquet(file_name+".parquet")

You will have to ensure that name_map is defined in the scope of parquet_create.

The only other option I can think of is using eval:

df = ... # define a dataframe
filename = ... # some filename
parquet_create(df, filename)

def parquet_create(df_name, file_name):
    eval(df_name).write.mode('overwrite').parquet(file_name+".parquet")

Note that you will also have to make sure that df is in the scope of parquet_create for this solution as well.

Both of these are really ugly solutions in my mind, and I honestly can't think of a reason not to just pass in the dataframe object itself, but there you go.

Stack Overflow

stackoverflow.com › questions › 73669240 › attributeerror-parquetfile-object-has-no-attribute-row-groups

python - AttributeError: 'ParquetFile' object has no attribute 'row_groups' - Stack Overflow

Top answer

1 of 1

Indeed, one of the causes of the issue appears to be dependent on incorrect file access path. Providing correct path solves it.

The other one seems to depend on mismatch between pyarrow and fastparquet load/save versions. This behavior however is not consistent (or I was not able to pin-point it across different versions) and depends on the version of the packages (very annoying). For example: at some point save had to occur with pyarrow and load with fastparquet. At some point, after package updates, both load and save started relying on the same version. Strange.

Additionally, I found there seems to be a GitHub issues that refers to the same issue, but different root-cause. Either way, if the path is correct, the update of the packages to "matching" versions should solve the issue.

Stack Exchange

gis.stackexchange.com › questions › 464137 › geopandas-attributeerror-dataframe-object-has-no-attribute-to-file-did-yo

python - GeoPandas: AttributeError: 'DataFrame' object has no attribute 'to_file'. Did you mean: 'to_pickle'? - Geographic Information Systems Stack Exchange

Top answer

1 of 1

That way of merging is creating a Pandas DataFrame. Try this:

import geopandas as gpd

df = gpd.read_file(r"/home/bera/Drives/data1/data/LMV/LMV-data_2021-10/ok_riks_Sweref_99_TM_shape/oversikt/riks/ak_riks.shp")

gdf1 = df.head(10)
gdf2 = df.tail(10)

print(type(gdf1.merge(gdf2, on="LANSKOD", how="outer"))) #This is creating a pandas df
#pandas.core.frame.DataFrame

merged = gpd.pd.merge(gdf1, gdf2, on="LANSKOD", how="outer")
print(type(merged))
#<class 'geopandas.geodataframe.GeoDataFrame'> #Now it is a geopandas df

merged["geometry"] = merged["geometry_x"].combine_first(merged["geometry_y"]) #Create a complete geometry column
merged = merged.drop(columns=["geometry_x", "geometry_y"])
merged.to_file(r"/home/bera/Desktop/GIStest/testmerge.shp")

Stack Overflow

stackoverflow.com › questions › 70334727 › dataframe-object-has-no-attribute-schema

python - 'DataFrame' object has no attribute 'schema' - Stack Overflow

Top answer

1 of 1

I've just run into the same issue, but I assume you've resolved yours. In case you haven't or someone else comes across this with a similar issue, try creating a pyarrow table from the dataframe first.

import pyarrow as pa
import pyarrow.parquet as pq
    
df = {some dataframe}
table = pa.Table.from_pandas(df)
pq.write_table(table, '{path}')

Polars

docs.pola.rs › py-polars › html › reference › api › polars.DataFrame.write_parquet.html

polars.DataFrame.write_parquet — Polars documentation

The following example will write the first row to ../watermark=1/.parquet and the other rows to ../watermark=2/.parquet. >>> df = pl.DataFrame({"a": [1, 2, 3], "watermark": [1, 2, 2]}) >>> path: pathlib.Path = dirpath / "partitioned_object" >>> df.write_parquet( ...

Stack Overflow

stackoverflow.com › questions › 41066582 › python-save-pandas-data-frame-to-parquet-file

hdfs - Python: save pandas data frame to parquet file - Stack Overflow

Top answer

1 of 8

Pandas has a core function to_parquet(). Just write the dataframe to parquet format like this:

df.to_parquet('myfile.parquet')

You still need to install a parquet library such as fastparquet. If you have more than one parquet library installed, you also need to specify which engine you want pandas to use, otherwise it will take the first one to be installed (as in the documentation). For example:

df.to_parquet('myfile.parquet', engine='fastparquet')

2 of 8

Yes pandas supports saving the dataframe in parquet format.

Simple method to write pandas dataframe to parquet.

Assuming, df is the pandas dataframe. We need to import following libraries.

import pyarrow as pa
import pyarrow.parquet as pq

First, write the dataframe df into a pyarrow table.

# Convert DataFrame to Apache Arrow Table
table = pa.Table.from_pandas(df_image_0)

Second, write the table into parquet file say file_name.parquet

# Parquet with Brotli compression
pq.write_table(table, 'file_name.parquet')

NOTE: parquet files can be further compressed while writing. Following are the popular compression formats.

Snappy ( default, requires no argument)
gzip
brotli

Parquet with Snappy compression

 pq.write_table(table, 'file_name.parquet')

Parquet with GZIP compression

pq.write_table(table, 'file_name.parquet', compression='GZIP')

Parquet with Brotli compression

pq.write_table(table, 'file_name.parquet', compression='BROTLI')

Comparative comparison achieved with different formats of parquet

Reference: https://tech.blueyonder.com/efficient-dataframe-storage-with-apache-parquet/

GitHub

github.com › dask › dask › issues › 1868

module 'dask.dataframe' has no attribute 'to_parquet' · Issue #1868 · dask/dask

import dask.dataframe as dd df = dd.read_csv(...) to_parquet('/path/to/output/', df, compression='SNAPPY') · AttributeError: module 'dask.dataframe' has no attribute 'to_parquet'

GitHub

github.com › horovod › horovod › issues › 2507

'pyarrow.lib.Schema' object has no attribute 'field' · Issue #2507 · horovod/horovod

December 7, 2020 - I use Pyspark to generate a DataFrame and save it in parquet. when I use petastorm to read the parquet file, it shows this error 'pyarrow.lib.Schema' object has no attribute 'field' this is my code from petastorm import make_batch_reader...

Author yikuanli