That way of merging is creating a Pandas DataFrame. Try this:
import geopandas as gpd
df = gpd.read_file(r"/home/bera/Drives/data1/data/LMV/LMV-data_2021-10/ok_riks_Sweref_99_TM_shape/oversikt/riks/ak_riks.shp")
gdf1 = df.head(10)
gdf2 = df.tail(10)
print(type(gdf1.merge(gdf2, on="LANSKOD", how="outer"))) #This is creating a pandas df
#pandas.core.frame.DataFrame
merged = gpd.pd.merge(gdf1, gdf2, on="LANSKOD", how="outer")
print(type(merged))
#<class 'geopandas.geodataframe.GeoDataFrame'> #Now it is a geopandas df
merged["geometry"] = merged["geometry_x"].combine_first(merged["geometry_y"]) #Create a complete geometry column
merged = merged.drop(columns=["geometry_x", "geometry_y"])
merged.to_file(r"/home/bera/Desktop/GIStest/testmerge.shp")
Answer from Bera on Stack ExchangeIf you don't select geometry column from a GeoDataFrame, you get a DataFrame.
For example:
print type(trialyield[['column1', 'column2']])
# OUT:
# pandas.core.frame.DataFrame
print type(trialyield[['column1', 'column2', 'geometry']])
# OUT:
# geopandas.geodataframe.GeoDataFrame
Change the last line in following way:
trialyield[[title_col, 'yield', 'geometry']].to_file('trialyield_output.shp')
Although a previous answer is correct that geopandas.read_file - GeoPandas documentation will return a pandas.core.frame.DataFrame instead of a geopandas.geodataframe.GeoDataFrame if the data doesn't have a geometry column, that doesn't mean GeoDataFrame.to_file() cannot be used to write out the pandas dataframe.
This is one aspect of the geodataframe implementation I have always found odd/annoying. There is additional functionality with geodataframes compared to dataframes, e.g., to_file(), that aren't related to having geometry, but no geometry means no geodataframe.
I use geopandas.geodataframe.GeoDataFrame to write pandas dataframes to DBF files or geodatabase tables, it only requires changing the syntax.
import geopandas
df # a pandas dataframe without geometry column
folder # path to folder
fgdb # path to file geodatabase
# writing df as DBF using geopandas
geopandas.GeoDataFrame.to_file(df, folder, "Esri Shapefile", layer="TableA")
# writing df as FGDB table using geopandas
geopandas.GeoDataFrame.to_file(df, fgdb, "OpenFileGDB", layer="TableA")
Check your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})
I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:
data.columns = data.columns.str.strip()
See pandas.Series.str.strip
In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.
pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).
data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.
"sklearn.datasets" is a scikit package, where it contains a method load_iris().
load_iris(), by default return an object which holds data, target and other members in it. In order to get actual values you have to read the data and target content itself.
Whereas 'iris.csv', holds feature and target together.
FYI: If you set return_X_y as True in load_iris(), then you will directly get features and target.
from sklearn import datasets
data,target = datasets.load_iris(return_X_y=True)
The Iris Dataset from Sklearn is in Sklearn's Bunch format:
print(type(iris))
print(iris.keys())
output:
<class 'sklearn.utils.Bunch'>
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
So, that's why you can access it as:
x=iris.data
y=iris.target
But when you read the CSV file as DataFrame as mentioned by you:
iris = pd.read_csv('iris.csv',header=None).iloc[:,2:4]
iris.head()
output is:
2 3
0 petal_length petal_width
1 1.4 0.2
2 1.4 0.2
3 1.3 0.2
4 1.5 0.2
Here the column names are '1' and '2'.
First of all you should read the CSV file as:
df = pd.read_csv('iris.csv')
you should not include header=None as your csv file includes the column names i.e. the headers.
So, now what you can do is something like this:
X = df.iloc[:, [2, 3]] # Will give you columns 2 and 3 i.e 'petal_length' and 'petal_width'
y = df.iloc[:, 4] # Label column i.e 'species'
or if you want to use the column names then:
X = df[['petal_length', 'petal_width']]
y = df.iloc['species']
Also, if you want to convert labels from string to numerical format use sklearn LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
y = le.fit_transform(y)