🌐
GitHub
github.com › dask › fastparquet › issues › 229
.to_parquet() returns AttributeError: 'bool' object has no attribute 'writer' · Issue #229 · dask/fastparquet
File "/opt/anaconda3/lib/python3.6/site-packages/dask/dataframe/core.py", line 955, in to_parquet return to_parquet(path, self, *args, **kwargs)
🌐
Cloudera Community
community.cloudera.com › t5 › Support-Questions › Pyspark-issue-AttributeError-DataFrame-object-has-no › m-p › 78093
Pyspark issue AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile'
January 2, 2024 - So, if someone could help resolve this issue that would be most appreciated ... As the error message states, the object, either a DataFrame or List does not have the saveAsTextFile() method. result.write.save() or result.toJavaRDD.saveAsTextFile() shoud do the work, or you can refer to DataFrame ...
🌐
GitHub
github.com › dask › fastparquet › issues › 100
Fastparquet write :AttributeError: 'Series' object has no attribute 'valid' · Issue #100 · dask/fastparquet
June 3, 2017 - Hi, I am using a Dask data-frame as follows: fastparquet.write(parquet_dir, X_df, file_scheme='hive',compression="SNAPPY") Stack trace: AttributeError Traceback (most recent call last) in () 2 parquet_dir=createNewDir() 3 print parquet_dir ----> 4 fastparquet.write(parquet_dir, X_df, file_scheme='hive',compression="SNAPPY") 5 6 # X_df.to_parquet(parquet_dir, compression='SNAPPY')
Author   ghost
🌐
Dataiku Community
community.dataiku.com › questions & discussions › using dataiku
AttributeError: DataFrame object has no attribute _session — Dataiku Community
August 3, 2022 - Use dataset.write_dataframe() instead.") 154 --> 155 df_connection_name = df._session.dss_connection_name if hasattr(df._session, "dss_connection_name") else None 156 dataset_config = dataset.get_config() 157 dataset_info = dataset.get_location_info()["info"] /data/dataiku/datadir/code-envs/python/snowpark/lib/python3.8/site-packages/snowflake/snowpark/dataframe.py in __getattr__(self, name) 521 # Snowflake DB ignores cases when there is no quotes. 522 if name.lower() not in [c.lower() for c in self.columns]: --> 523 raise AttributeError( 524 f"{self.__class__.__name__} object has no attribute {name}" 525 ) AttributeError: DataFrame object has no attribute _session
🌐
GitHub
github.com › microsoft › FLAML › issues › 625
AttributeError: 'DataFrame' object has no attribute 'copy' · Issue #625 · microsoft/FLAML
July 2, 2022 - AttributeError: 'DataFrame' object has no attribute 'copy'#625 · Copy link · Labels · enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed · Shafi2016 · opened · on Jul 2, 2022 · Issue body actions · I m using autoML(FLAML) with Spark on large data. The error image is given below · train = spark.read.parquet("./train.parquet") test = spark.read.parquet("./test.parquet") input_cols = [c for c in train.columns if c != 'target'] vectorAssembler = VectorAssembler(inputCols = input_cols, outputCol = 'features') vectorAssembler
Author   Shafi2016
Find elsewhere
🌐
GitHub
github.com › apache › arrow › issues › 4030
`df.to_parquet('s3://...', partition_cols=...)` fails with `'NoneType' object has no attribute '_isfilestore'` · Issue #4030 · apache/arrow
March 26, 2019 - df.to_parquet('s3://...', partition_cols=...) fails with 'NoneType' object has no attribute '_isfilestore'#4030 ... According to https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#partitioning-parquet-files, writing a parquet to S3 with partition_cols should work, but it fails for me. Example script: import pandas as pd import sys print(sys.version) print(pd.__version__) df = pd.DataFrame([{'a': 1, 'b': 2}]) df.to_parquet('s3://my_s3_bucket/x.parquet', engine='pyarrow') print('OK 1') df.to_parquet('s3://my_s3_bucket/x2.parquet', partition_cols=['a'], engine='pyarrow') print('OK 2')
🌐
Stack Overflow
stackoverflow.com › questions › 66037297 › pyspark-dataframe-write-attributeerror-nonetype-object-has-no-attribute
python - Pyspark - dataframe..write - AttributeError: 'NoneType' object has no attribute 'mode' - Stack Overflow
Traceback (most recent call last): ... 'NoneType' object has no attribute 'mode' ... The writing mode should be specified for DataFrameWriter not after save as you did (which returns nothing "None", thus the error ...
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › attributeerror-dataframe-object-has-no-attribute › td-p › 61132
AttributeError: 'DataFrame' object has no attribut... - Databricks Community - 61132
February 19, 2024 - Hello, I have some trouble deduplicating rows on the "id" column, with the method "dropDuplicatesWithinWatermark" in a pipeline. When I run this pipeline, I get the error message: "AttributeError: 'DataFrame' object has no attribute 'dropDuplicatesWithinWatermark'" Here is part of the code: @dl...
Top answer
1 of 1
1

Your code as it stands is trying to create a parquet file from a dataframe called df_name which is passed as an argument. What it receives instead of a dataframe is a string, so it fails. The ideal scenario here is that instead of passing in the string name of a dataframe, you pass in the object, like so:

df = ... # define a dataframe
filename = ... # some filename
parquet_create(df, filename)

It sounds like that's not an option for you for some reason, so there are a couple of workarounds. You can create a dictionary relating string dataframe names to dataframe objects like this:

df = ... # define a dataframe
df2 = ... # define another dataframe
filename = ... # some filename
name_map = {'df': df, 'df2': df2}
parquet_create('df', filename)

and define parquet_create like this:

def parquet_create(df_name, file_name):
    name_map[df_name].write.mode('overwrite').parquet(file_name+".parquet")

You will have to ensure that name_map is defined in the scope of parquet_create.

The only other option I can think of is using eval:

df = ... # define a dataframe
filename = ... # some filename
parquet_create(df, filename)

def parquet_create(df_name, file_name):
    eval(df_name).write.mode('overwrite').parquet(file_name+".parquet")

Note that you will also have to make sure that df is in the scope of parquet_create for this solution as well.

Both of these are really ugly solutions in my mind, and I honestly can't think of a reason not to just pass in the dataframe object itself, but there you go.

🌐
Polars
docs.pola.rs › py-polars › html › reference › api › polars.DataFrame.write_parquet.html
polars.DataFrame.write_parquet — Polars documentation
The following example will write the first row to ../watermark=1/.parquet and the other rows to ../watermark=2/.parquet. >>> df = pl.DataFrame({"a": [1, 2, 3], "watermark": [1, 2, 2]}) >>> path: pathlib.Path = dirpath / "partitioned_object" >>> df.write_parquet( ...
Top answer
1 of 8
81

Pandas has a core function to_parquet(). Just write the dataframe to parquet format like this:

df.to_parquet('myfile.parquet')

You still need to install a parquet library such as fastparquet. If you have more than one parquet library installed, you also need to specify which engine you want pandas to use, otherwise it will take the first one to be installed (as in the documentation). For example:

df.to_parquet('myfile.parquet', engine='fastparquet')
2 of 8
31

Yes pandas supports saving the dataframe in parquet format.

Simple method to write pandas dataframe to parquet.

Assuming, df is the pandas dataframe. We need to import following libraries.

import pyarrow as pa
import pyarrow.parquet as pq

First, write the dataframe df into a pyarrow table.

# Convert DataFrame to Apache Arrow Table
table = pa.Table.from_pandas(df_image_0)

Second, write the table into parquet file say file_name.parquet

# Parquet with Brotli compression
pq.write_table(table, 'file_name.parquet')

NOTE: parquet files can be further compressed while writing. Following are the popular compression formats.

  • Snappy ( default, requires no argument)
  • gzip
  • brotli

Parquet with Snappy compression

 pq.write_table(table, 'file_name.parquet')

Parquet with GZIP compression

pq.write_table(table, 'file_name.parquet', compression='GZIP')

Parquet with Brotli compression

pq.write_table(table, 'file_name.parquet', compression='BROTLI')

Comparative comparison achieved with different formats of parquet

Reference: https://tech.blueyonder.com/efficient-dataframe-storage-with-apache-parquet/

🌐
GitHub
github.com › dask › dask › issues › 1868
module 'dask.dataframe' has no attribute 'to_parquet' · Issue #1868 · dask/dask
import dask.dataframe as dd df = dd.read_csv(...) to_parquet('/path/to/output/', df, compression='SNAPPY') · AttributeError: module 'dask.dataframe' has no attribute 'to_parquet'
🌐
GitHub
github.com › horovod › horovod › issues › 2507
'pyarrow.lib.Schema' object has no attribute 'field' · Issue #2507 · horovod/horovod
December 7, 2020 - I use Pyspark to generate a DataFrame and save it in parquet. when I use petastorm to read the parquet file, it shows this error 'pyarrow.lib.Schema' object has no attribute 'field' this is my code from petastorm import make_batch_reader...
Author   yikuanli