Remember that you can look up the definition of Prelude functions!
http://www.haskell.org/onlinereport/standard-prelude.html
Looking there, the definition of words is,
words :: String -> [String]
words s = case dropWhile Char.isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') = break Char.isSpace s'
So, change it for a function that takes a predicate:
wordsWhen :: (Char -> Bool) -> String -> [String]
wordsWhen p s = case dropWhile p s of
"" -> []
s' -> w : wordsWhen p s''
where (w, s'') = break p s'
Then call it with whatever predicate you want!
main = print $ wordsWhen (==',') "break,this,string,at,commas"
Answer from Steve on Stack OverflowRemember that you can look up the definition of Prelude functions!
http://www.haskell.org/onlinereport/standard-prelude.html
Looking there, the definition of words is,
words :: String -> [String]
words s = case dropWhile Char.isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') = break Char.isSpace s'
So, change it for a function that takes a predicate:
wordsWhen :: (Char -> Bool) -> String -> [String]
wordsWhen p s = case dropWhile p s of
"" -> []
s' -> w : wordsWhen p s''
where (w, s'') = break p s'
Then call it with whatever predicate you want!
main = print $ wordsWhen (==',') "break,this,string,at,commas"
There is a package for this called split.
cabal install split
Use it like this:
ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]
It comes with a lot of other functions for splitting on matching delimiters or having several delimiters.
Split String by delimiter in Haskell - Code Review Stack Exchange
How to split a string into letters?
Approach to string split by character in Haskell - Code Review Stack Exchange
Haskell - Splitting a string by delimiter - Stack Overflow
Videos
Enter the function that cuts a string into letters! The result should be a list in which individual letters of the original string appear as strings.
letterize :: String -> [String]
Well, sorry, but it is really tempting to note that there is in fact a library routine that can do that, and it is even called the same as your function:
groupBy (\a b -> b /= '/') "/hejsan/asdas"
This code with groupBy from Data.List will give you ["/hejsan","/asdas"]. Even though it should probably be noted that this is taking advantage of how groupBy is implemented internally, which might not be the best idea.
But let's look at your implementation - two things jump out:
You are clearly using the first element in the
rslist differently from the rest. Why not make it an additional parameter so you can skip de- and recomposing the list all the time?Why do you even need an accumulation parameter for your groups? Once you append something to
rs, you already know that it will be reversed in the end and end up at the start of the return value. So you can simply prepend the group to the result of the recursive function call.
With the two changes and a bit of refactoring (replacing your if by pattern matches), we get the following version:
groupBy :: String -> Char -> String -> [String]
groupBy "" _ "" = []
groupBy "" _ r = [r]
groupBy (x:xs) c ""
| x == c = groupBy xs c ""
| otherwise = groupBy xs c [x]
groupBy (x:xs) c r
| x == c = r : groupBy xs c ""
| otherwise = groupBy xs c (r ++ [x])
This can be improved further:
This has clearly two modes of operation depending on
r == "". When calling you always know whether that is the case, so you could easily split it into two functions. One of them can get rid of therparameter completely, so you end up with a nice function to call from outside.Building a list using
++is inefficient, as it creates a copy of the list each time you append an element, leading O(nยฒ) complexity. It's better to usex:rand thenreverseonce in the end, which is a more efficient O(n). A bit more involved refactoring enables you to construct the string in the right form right away (seespan).
That's really all the difference to the "perfect" library-level implementation (see the library code). Note that generally, initialization of accumulation parameters is simply done by defining a wrapper function, even though we can easily put this function without even needing it.
Hope this helps.
Another version, using the Prelude function break.
groupBy :: String -> Char -> [String]
groupBy str delim = let (start, end) = break (== delim) str
in start : if null end then [] else groupBy (tail end) delim
If you're trying to write this function "for real" instead of writing the character-by-character recursion for practice, I think a clearer method is to use the break function from Data.List. The following expression:
break (==',') str
breaks the string into a tuple (a,b) where the first part consists of the initial "comma-free" part, and the second part is either more string starting with the comma or else empty if there's no more string.
This makes the definition of split clear and straightforward:
split str = case break (==',') str of
(a, ',':b) -> a : split b
(a, "") -> [a]
You can verify that this handles split "" (which returns [""]), so there's no need to treat that as a special case.
This version has the added benefit that the modification to include the delimiter is also easy to understand:
split2 str = case break (==',') str of
(a, ',':b) -> a : "," : split2 b
(a, "") -> [a]
Note that I've written the patterns in these functions in more detail than is necessary to make it absolute clear what's going on, and this also means that Haskell does a duplicate check on each comma. For this reason, some people might prefer:
split str = case break (==',') str of
(a, _:b) -> a : split b
(a, _) -> [a]
or, if they still wanted to document exactly what they were expecting in each case branch:
split str = case break (==',') str of
(a, _comma:b) -> a : split b
(a, _empty) -> [a]
Instead of altering code in the hope that it matches the expecations, it is usually better to understand the code fragment first.
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs
First of all we better analyze what split does. The first statement simply says "The split of an empty string, is a list with one element, the empty string". This seems reasonable. Now the second clause states: "In case the head of the string is a comma, we produce a list where the first element is an empty string, followed by splitting up the remainings of the string.". The last guard says "In case the first character of the string is not a comma, we prepend that character to the first item of the split of the remaining string, followed by the remaining elements of the split of the remaining string". Mind that split returns a list of strings, so the head rest is a string.
So if we want to add the delimiter to the output, then we need to add that as a separate string in the output of split. Where? In the first guard. We should not return "," : rest, since the head is - by recursion - prepended, but as a separate string. So the result is:
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : "," : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs Doesn't Data.List.Split.splitOn do this?
splitBy delimiter = foldr f [[]]
where f c l@(x:xs) | c == delimiter = []:l
| otherwise = (c:x):xs
Edit: not by the original author, but below is a more (overly?) verbose, and less flexible version (specific to Char/String) to help clarify how this works. Use the above version because it works on any list of a type with an Eq instance.
splitBy :: Char -> String -> [String]
splitBy _ "" = [];
splitBy delimiterChar inputString = foldr f [""] inputString
where f :: Char -> [String] -> [String]
f currentChar allStrings@(partialString:handledStrings)
| currentChar == delimiterChar = "":allStrings -- start a new partial string at the head of the list of all strings
| otherwise = (currentChar:partialString):handledStrings -- add the current char to the partial string
-- input: "a,b,c"
-- fold steps:
-- first step: 'c' -> [""] -> ["c"]
-- second step: ',' -> ["c"] -> ["","c"]
-- third step: 'b' -> ["","c"] -> ["b","c"]
-- fourth step: ',' -> ["b","c"] -> ["","b","c"]
-- fifth step: 'a' -> ["","b","c"] -> ["a","b","c"]
I'm using Haskell to split up a csv to divvy up some data before processing. I'm still new to Haskell and have just recently started learning cabal. On my system, Cabal is acting up, leaving me unable to install split onto my system. I decided to just try to build a simple split function rather than go through what I would need to do to fix cabal for the moment, but I'm not really sure how that's supposed to work. I wanted to try to treat the IO String coming from the file as just a list of characters, but even that is beyond me, apparently.
What would the proper way of splitting an IO String on a delimiter (specifically ',') be?
You can use Hoogle and search for example by signature. Since you want to convert a String to a list of Strings, the signature is thus String -> [String]. The first matches are lines :: String -> [String] and words :: String -> [String]. Based on the name of the function, words is the right match.
As the documentation on words says:
words :: String -> [String]
wordsbreaks a string up into a list of words, which were delimited by white space.>>> words "Lorem ipsum\ndolor" ["Lorem","ipsum","dolor"]
This thus seems to be the function you are looking for. If we run this in ghci, we get the expected output:
Prelude> words "Hello world from haskell"
["Hello","world","from","haskell"]
words :: String -> [String]
words breaks a string up into a list of words, which were delimited by white space.
>>> words "Lorem ipsum\ndolor"
["Lorem","ipsum","dolor"]
Reference: https://hackage.haskell.org/package/base-4.12.0.0/docs/Data-String.html#v:words
Hey guys,
Stuck on a problem, if I wanted to split a list of characters such as "qwertyzxc" (9 elements) (['q','w','e','r','t','y','z','x','c']) into a list of strings with the length 3. How would one do it?
Example:
Input : ['q','w','e','r','t','y','z','x','c']
Output : [['q','w','e'],['r','t','y'],['z','x','c']]
Thanks, I have been stuck on this for quite a while
Well, the convenient thing about strings is that they're just lists of chars. The simplest way to do this is to ask the list for exactly what you want. Want one element off the top of the list? Pattern match to see if it has one! Want three? Pattern match for three of them! Once you've done that, you'll be able to decide what to do with the rest of the string.
foo (x:y:z:rest) = [x, y, z]:(foo rest)
foo rest = [rest]
https://hackage.haskell.org/package/split-0.2.3.2/docs/Data-List-Split.html#v:chunksOf