Remember that you can look up the definition of Prelude functions!

http://www.haskell.org/onlinereport/standard-prelude.html

Looking there, the definition of words is,

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                      "" -> []
                      s' -> w : words s''
                            where (w, s'') = break Char.isSpace s'

So, change it for a function that takes a predicate:

wordsWhen     :: (Char -> Bool) -> String -> [String]
wordsWhen p s =  case dropWhile p s of
                      "" -> []
                      s' -> w : wordsWhen p s''
                            where (w, s'') = break p s'

Then call it with whatever predicate you want!

main = print $ wordsWhen (==',') "break,this,string,at,commas"
Answer from Steve on Stack Overflow
๐ŸŒ
Hackage
hackage.haskell.org โ€บ package โ€บ split-0.2.5 โ€บ docs โ€บ Data-List-Split.html
Data.List.Split - Hackage - Haskell
The Data.List.Split module contains a wide range of strategies for splitting lists with respect to some sort of delimiter, mostly implemented through a unified combinator interface. The goal is to be flexible yet simple. See below for usage, examples, and detailed documentation of all exported functions.
Discussions

Split String by delimiter in Haskell - Code Review Stack Exchange
I wrote a split by delimiter function in Haskell and wanted some feedback on this piece of code. Since I come from an imperative programming background, I often write too complex functions in haske... More on codereview.stackexchange.com
๐ŸŒ codereview.stackexchange.com
December 12, 2020
How to split a string into letters?
letterize = map return The obscurity of the code is intentional More on reddit.com
๐ŸŒ r/haskell
10
0
October 4, 2023
Approach to string split by character in Haskell - Code Review Stack Exchange
I'm trying to learn Haskell and the mindset of programming functionally. I have started off by trying to understand the basics by writing code without any Monads in it. So far, I think I'm gettin... More on codereview.stackexchange.com
๐ŸŒ codereview.stackexchange.com
December 16, 2020
Haskell - Splitting a string by delimiter - Stack Overflow
I am trying to write a program in Haskell to split a string by delimiter. And I have studied different examples provided by other users. An example would the the code that is posted below. split :: More on stackoverflow.com
๐ŸŒ stackoverflow.com
๐ŸŒ
Hoogle
hoogle.haskell.org
split - Hoogle - Haskell.org
split 10 "a\nb\nd\ne" == ["a","b","d","e"] -- fromEnum '\n' == 10 split 97 "aXaXaXa" == ["","X","X","X",""] -- fromEnum 'a' == 97 split 120 "x" == ["",""] -- fromEnum 'x' == 120 split undefined "" == [] -- and not [""] and
๐ŸŒ
Julio Merino
jmmv.dev โ€บ 2006 โ€บ 08 โ€บ split-function-in-haskell.html
A split function in Haskell - Julio Merino (jmmv.dev)
August 24, 2006 - Iโ€™m sure there is some better and even cleaner way to write it because Iโ€™m still a Haskell newbie! Here is it: split :: String -> Char -> [String] split [] delim = [""] split (c:cs) delim | c == delim = "" : rest | otherwise = (c : head rest) : tail rest where rest = split cs delim
๐ŸŒ
O'Reilly
oreilly.com โ€บ library โ€บ view โ€บ haskell-data-analysis โ€บ 9781783286331 โ€บ ch03s07.html
Splitting a string on lines, words, or arbitrary tokens - Haskell Data Analysis cookbook [Book]
June 25, 2014 - Splitting a string on lines, words, or arbitrary tokensUseful data is often interspersed between delimiters, such as commas or spaces, making string splitting vital for most data... - Selection from Haskell Data Analysis cookbook [Book]
Author ย  Nishant Shukla
Published ย  2014
Pages ย  334
Top answer
1 of 1
4

Is it bad to have splitInternal function? I couldn't figure out a way without it.

Well, according to your procedure, I think it is necessary, but you can improve the readibility by writing some small functions and then combining them together. Besides, if there are consecutive delimiters, your split function doesn't work as expected. The code can be rewritten as following:

splitInternal :: Char -> ([String], String) -> ([String], String)
splitInternal _ (result, "") = (result, "")
splitInternal c (result, remain) = splitInternal c (getBefore c remain, getAfter c remain)
  where
    getBefore delimiter rest = result ++ [takeWhile (/= delimiter) rest]
    getAfter delimiter rest = dropWhile (== delimiter) . dropWhile (/= delimiter) $ rest

Is there maybe a simpler way to write the function?

Yes, you can use the break and span function defined in Prelude:

split :: Char -> String -> [String]
split _ "" = []
split delimiter str = 
    let (start, rest) = break (== delimiter) str
        (_, remain) = span (== delimiter) rest
     in start : split delimiter remain

So in this case, your splitInternal is unnecessary.

Any other feedback is welcome as well

Well, if you are dealing with string, then a better choice is Text from Data.Text. Text is more efficient than String when you are dealing with string. In the module Data.Text, there is a pre-defined function splitOn that works almost as you expect:

ghci> :seti -XOverloadedString

ghci> splitOn "," "123,456,789"
["123","456","789"]

ghci> splitOn "," "123,,,456,789"
["123","","","456","789"]           -- This is what I mean "almost", since splitOn doesn't use the consecutive delimiters. Maybe this is what you want.
```
๐ŸŒ
Lotz84
lotz84.github.io โ€บ haskellbyexample โ€บ ex โ€บ string-functions
Haskell by Example: String Functions
import Data.List import Data.Char include :: String -> String -> Bool include xs ys = or . map (isPrefixOf ys) . tails $ xs joinWith :: [String] -> String -> String joinWith xs sep = concat . init . concat $ [[x, sep] | x <- xs] split :: String -> Char -> [String] split "" _ = [] split xs c = let (ys, zs) = break (== c) xs in if null zs then [ys] else ys : split (tail zs) c main = do putStrLn $ "Contains: " ++ show ("test" `include` "es") putStrLn $ "Count: " ++ show (length .
Find elsewhere
๐ŸŒ
Reddit
reddit.com โ€บ r/haskell โ€บ how to split a string into letters?
r/haskell on Reddit: How to split a string into letters?
October 4, 2023 -

Enter the function that cuts a string into letters! The result should be a list in which individual letters of the original string appear as strings.

letterize :: String -> [String]

๐ŸŒ
GitHub
gist.github.com โ€บ da8fbd30b3e03300ce56
Haskell: split a string using a substring ยท GitHub
Haskell: split a string using a substring. GitHub Gist: instantly share code, notes, and snippets.
๐ŸŒ
Narkive
haskell-cafe.haskell.narkive.com โ€บ 9eHuYE37 โ€บ can-i-split-a-string-by-its-element
[Haskell-cafe] Can i split a String by its element ?
Post by Luke Palmer See the http://hackage.haskell.org/package/split package. You should be able to do this by splitting the string on comma, and then splitting the result on 0, plus some plumbing.
Top answer
1 of 4
10

Well, sorry, but it is really tempting to note that there is in fact a library routine that can do that, and it is even called the same as your function:

groupBy (\a b -> b /= '/') "/hejsan/asdas"

This code with groupBy from Data.List will give you ["/hejsan","/asdas"]. Even though it should probably be noted that this is taking advantage of how groupBy is implemented internally, which might not be the best idea.

But let's look at your implementation - two things jump out:

  • You are clearly using the first element in the rs list differently from the rest. Why not make it an additional parameter so you can skip de- and recomposing the list all the time?

  • Why do you even need an accumulation parameter for your groups? Once you append something to rs, you already know that it will be reversed in the end and end up at the start of the return value. So you can simply prepend the group to the result of the recursive function call.

With the two changes and a bit of refactoring (replacing your if by pattern matches), we get the following version:

groupBy :: String -> Char -> String -> [String]
groupBy ""     _ "" = []
groupBy ""     _ r  = [r]
groupBy (x:xs) c ""
    | x == c        = groupBy xs c ""
    | otherwise     = groupBy xs c [x]
groupBy (x:xs) c r
    | x == c        = r : groupBy xs c ""
    | otherwise     = groupBy xs c (r ++ [x])

This can be improved further:

  • This has clearly two modes of operation depending on r == "". When calling you always know whether that is the case, so you could easily split it into two functions. One of them can get rid of the r parameter completely, so you end up with a nice function to call from outside.

  • Building a list using ++ is inefficient, as it creates a copy of the list each time you append an element, leading O(nยฒ) complexity. It's better to use x:r and then reverse once in the end, which is a more efficient O(n). A bit more involved refactoring enables you to construct the string in the right form right away (see span).

That's really all the difference to the "perfect" library-level implementation (see the library code). Note that generally, initialization of accumulation parameters is simply done by defining a wrapper function, even though we can easily put this function without even needing it.

Hope this helps.

2 of 4
6

Another version, using the Prelude function break.

groupBy :: String -> Char -> [String]
groupBy str delim = let (start, end) = break (== delim) str
                    in start : if null end then [] else groupBy (tail end) delim  
Top answer
1 of 3
8

If you're trying to write this function "for real" instead of writing the character-by-character recursion for practice, I think a clearer method is to use the break function from Data.List. The following expression:

break (==',') str

breaks the string into a tuple (a,b) where the first part consists of the initial "comma-free" part, and the second part is either more string starting with the comma or else empty if there's no more string.

This makes the definition of split clear and straightforward:

split str = case break (==',') str of
                (a, ',':b) -> a : split b
                (a, "")    -> [a]

You can verify that this handles split "" (which returns [""]), so there's no need to treat that as a special case.

This version has the added benefit that the modification to include the delimiter is also easy to understand:

split2 str = case break (==',') str of
                (a, ',':b) -> a : "," : split2 b
                (a, "")    -> [a]

Note that I've written the patterns in these functions in more detail than is necessary to make it absolute clear what's going on, and this also means that Haskell does a duplicate check on each comma. For this reason, some people might prefer:

split str = case break (==',') str of
                (a, _:b) -> a : split b
                (a, _)   -> [a]

or, if they still wanted to document exactly what they were expecting in each case branch:

split str = case break (==',') str of
                (a, _comma:b) -> a : split b
                (a, _empty)   -> [a]
2 of 3
5

Instead of altering code in the hope that it matches the expecations, it is usually better to understand the code fragment first.

split :: String -> [String]
split [] = [""]
split (c:cs) | c == ','  = "" : rest
             | otherwise = (c : head rest) : tail rest
    where rest = split cs

First of all we better analyze what split does. The first statement simply says "The split of an empty string, is a list with one element, the empty string". This seems reasonable. Now the second clause states: "In case the head of the string is a comma, we produce a list where the first element is an empty string, followed by splitting up the remainings of the string.". The last guard says "In case the first character of the string is not a comma, we prepend that character to the first item of the split of the remaining string, followed by the remaining elements of the split of the remaining string". Mind that split returns a list of strings, so the head rest is a string.

So if we want to add the delimiter to the output, then we need to add that as a separate string in the output of split. Where? In the first guard. We should not return "," : rest, since the head is - by recursion - prepended, but as a separate string. So the result is:

split :: String -> [String]
split [] = [""]
split (c:cs) | c == ','  = "" : "," : rest
             | otherwise = (c : head rest) : tail rest
    where rest = split cs
๐ŸŒ
Hoogle
hoogle.haskell.org
splitOn - Hoogle - Haskell
intercalate s . splitOn s == id splitOn (singleton c) == split (==c) (Note: the string s to split on above cannot be empty.) In (unlikely) bad cases, this function's time complexity degrades towards O(n*m).
๐ŸŒ
Reddit
reddit.com โ€บ r/haskell โ€บ splitting strings without importing split
r/haskell on Reddit: Splitting Strings Without Importing Split
May 4, 2022 -

I'm using Haskell to split up a csv to divvy up some data before processing. I'm still new to Haskell and have just recently started learning cabal. On my system, Cabal is acting up, leaving me unable to install split onto my system. I decided to just try to build a simple split function rather than go through what I would need to do to fix cabal for the moment, but I'm not really sure how that's supposed to work. I wanted to try to treat the IO String coming from the file as just a list of characters, but even that is beyond me, apparently.

What would the proper way of splitting an IO String on a delimiter (specifically ',') be?

Top answer
1 of 1
2

Another variant of runs

Let us start with the finished product first, and then I will show you how to get there:

runs :: String -> [String]
runs "" = []
runs xs = munched : runs (drop (length munched) xs)
  where munched = munch xs

Use pattern matching

In your extract function, you've checked str's length. However, that is not necessary and is very expensive. If you want to check whether a list is empty either use null or pattern matching. With pattern matching, we end up with

extract []  xs = xs
extract str xs = extract (drop (length (munch str)) str) (xs ++ [(munch str)])

Use bindings to keep repetition low

However, we repeat ourselves here and use munch str twice. Let's get rid of that:

extract []  xs = xs
extract str xs = extract (drop (length munched) str) (xs ++ [munched])
  where
    munched = munch str

Our line got a little bit shorter. Great. Now let us have a look at your accumulator.

Avoid repeated ++

Your results looks like this:

munch str ++ munch str' ++ munch str'' ++ munch str''' โ€ฆ

However, with parentheses, we actually have

(โ€ฆ(((munch str ++ munch str') ++ munch str'') ++ munch str''') โ€ฆ)

Since ++ is linear in its first argument, that is going to be slow. However, we don't need to use ++ at all, since we're constructing a list element-wise from the first to the last. So instead of an accumulator, let us have extract return the list as soon as possible:

extract [] = []
extract str = munched : extract (drop (length munched) str)
  where munched = munch str

At that point extract has exactly the same type as runs, so we can get rid of it. We end up with

runs :: String -> [String]
runs []  = []
runs str = munched : runs (drop (length munched) str)
  where munched = munch str

Replace str with xs and the first [] with "", and you have my first variant. We're done.

Other functions

You could rewrite chomp in a way that returns already groups of characters, e.g.

chomp "aaaaabbbbbccccc" = ["aaaaa","bbbbb","ccccc"]

Then, you could rewrite munch so that it splits a string after 9 characters:

munch "aaaaaaaaaa" = ["aaaaaaaa","aa"]

If you do that, runs gets a lot simpler:

runs = concatMap munch . chomp

However, the types of chomp and munch would differ in that case. By the way, this variant of chomp is in the standard library, and munch is also easy to implement.

๐ŸŒ
Programming Idioms
programming-idioms.org โ€บ idiom โ€บ 49 โ€บ split-a-space-separated-string โ€บ 953 โ€บ haskell
Split a space-separated string, in Haskell
February 18, 2016 - (define (tokenize l) (let loop ((t '()) (l l)) (if (pair? l) (let ((c (car l))) (if (char=? c #\space) (cons (reverse t) (loop '() (cdr l))) (loop (cons (car l) t) (cdr l)))) (if (null? t) '() (list (reverse t)))))) (define (string-split s) (map list->string (tokenize (string->list s))))
๐ŸŒ
Hoogle
hoogle.haskell.org
String -> [String] - Hoogle
Split lines in a string using newline as separation.