Videos
EDIT 2/7/2021: as people seem to still find this on Google, I'll edit this to say right at the top that current DataFrames (1.0+) allows both Not() selection supported by InvertedIndices.jl and also string types as column names, including regex selection with the r"" string macro. Examples:
julia> df = DataFrame(a1 = rand(2), a2 = rand(2), x1 = rand(2), x2 = rand(2), y = rand(["a", "b"], 2))
2Γ5 DataFrame
Row β a1 a2 x1 x2 y
β Float64 Float64 Float64 Float64 String
ββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ
1 β 0.784704 0.963761 0.124937 0.37532 a
2 β 0.814647 0.986194 0.236149 0.468216 a
julia> df[!, r"2"]
2Γ2 DataFrame
Row β a2 x2
β Float64 Float64
ββββββΌββββββββββββββββββββ
1 β 0.963761 0.37532
2 β 0.986194 0.468216
julia> df[!, Not(r"2")]
2Γ3 DataFrame
Row β a1 x1 y
β Float64 Float64 String
ββββββΌββββββββββββββββββββββββββββ
1 β 0.784704 0.124937 a
2 β 0.814647 0.236149 a
Finally, the names function has a method which takes a type as its second argument, which is handy for subsetting DataFrames by the element type of each column:
julia> df[!, names(df, String)]
2Γ1 DataFrame
Row β y
β String
ββββββΌββββββββ
1 β a
2 β a
In addition to indexing with square brackets, there's also the select function (and its mutating equivalent select!), which basically takes the same input as the column index in []-indexing as its second argument:
julia> select(df, Not(r"a"))
2Γ3 DataFrame
Row β x1 x2 y
β Float64 Float64 String
ββββββΌββββββββββββββββββββββββββββ
1 β 0.124937 0.37532 a
2 β 0.236149 0.468216 a
Original answer below
As @Reza Afzalan said, what you're trying to do returns an array of strings, while column names in DataFrames are symbols.
Given that Julia doesn't have conditional list comprehension, the nicest thing you could do I guess would be
data[:, filter(x -> x != :column1, names(df))]
This will give you the data set with column 1 removed (without mutating it). You could extend this to checking against lists of names as well:
data[:, filter(x -> !(x in [:column1,:column2]), names(df))]
UPDATE: As Ian says below, for this use case the Not syntax is now the best way to go.
More generally, conditional list comprehensions are also available by now, so you could do:
data[:, [x for x in names(data) if x != :column1]]
As of DataFrames 0.19, seems that you can now do
select(data, Not(:column1))
to select all but the column column1. To select all except for multiple columns, use an array in the inverted index:
select(data, Not([:column1, :column2]))
df2 = select(df, Between(:A,:D), Between(:P,:Z))
or
df2 = df[:, All(Between(:A,:D), Between(:P,:Z))]
if you are sure your columns are only from :A to :Z you can also write:
df2 = select(df, Not(Between(:E, :O)))
or
df2 = df[:, Not(Between(:E, :O))]
Finally, you can easily find an index of the column using columnindex function, e.g.:
columnindex(df, :A)
and later use column numbers - if this is something what you would prefer.
In Julia you can also build Ranges with Chars and hence when your columns are named just by single letters yet another option is:
df[:, Symbol.(vcat('A':'D', 'P':'Z'))]
This is a Julia thing, not so much a DataFrame thing: you want & instead of &&. For example:
julia> [true, true] && [false, true]
ERROR: TypeError: non-boolean (Array{Bool,1}) used in boolean context
julia> [true, true] & [false, true]
2-element Array{Bool,1}:
false
true
julia> df[(df[:A].<5)&(df[:B].=="c"),:]
2x2 DataFrames.DataFrame
| Row | A | B |
|-----|---|-----|
| 1 | 3 | "c" |
| 2 | 4 | "c" |
FWIW, this works the same way in pandas in Python:
>>> df[(df.A < 5) & (df.B == "c")]
A B
1 3 c
2 4 c
I have the same now as https://stackoverflow.com/users/5526072/jwimberley , occurring on my update to julia 0.6 from 0.5, and now using dataframes v 0.10.1.
Update: I made the following change to fix:
r[(r[:l] .== l) & (r[:w] .== w), :] # julia 0.5
r[.&(r[:l] .== l, r[:w] .== w), :] # julia 0.6
but this gets very slow with long chains (time taken \propto 2^chains) so maybe Query is the better way now:
# r is a dataframe
using Query
q1 = @from i in r begin
@where i.l == l && i.w == w && i.nl == nl && i.lt == lt &&
i.vz == vz && i.vw == vw && i.vΞ΄ == vΞ΄ &&
i.ΞΆx == ΞΆx && i.ΞΆy == ΞΆy && i.ΞΆΞ΄x == ΞΆΞ΄x
@select {absu=i.absu, i.dBU}
@collect DataFrame
end
for example. This is fast. It's in the DataFrames documentation.