You need to use re.split if you want to split a string according to a regex pattern.
tokens = re.split(r'[.:]', ip)
Inside a character class | matches a literal | symbol and note that [.:] matches a dot or colon (| won't do the orring here).
So you need to remove | from the character class or otherwise it would do splitting according to the pipe character also.
or
Use string.split along with list_comprehension.
>>> ip = '192.168.0.1:8080'
>>> [j for i in ip.split(':') for j in i.split('.')]
['192', '168', '0', '1', '8080']
Answer from Avinash Raj on Stack OverflowHow to split a string on regex in Python - Stack Overflow
regex - Python re.split() vs split() - Stack Overflow
Split string based on RegEx in Python?
python - Difference between re.split(" ", string) and re. ...
Videos
re.split is expected to be slower, as the usage of regular expressions incurs some overhead.
Of course if you are splitting on a constant string, there is no point in using re.split().
When in doubt, check the source code. You can see that Python s.split() is optimized for whitespace and inlined. But s.split() is for fixed delimiters only.
For the speed tradeoff, a re.split regular expression based split is far more flexible.
>>> re.split(':+',"One:two::t h r e e:::fourth field")
['One', 'two', 't h r e e', 'fourth field']
>>> "One:two::t h r e e:::fourth field".split(':')
['One', 'two', '', 't h r e e', '', '', 'fourth field']
# would require an addition step to find the empty fields...
>>> re.split('[:\d]+',"One:two:2:t h r e e:3::fourth field")
['One', 'two', 't h r e e', 'fourth field']
# try that without a regex split in an understandable way...
That re.split() is only 29% slower (or that s.split() is only 40% faster) is what should be amazing.
I have a fairly big regex matching various types of common mail headers which are used between replies. I'm trying to use re.split to separate each reply as follows:
r = re.compile(r'((?:^ *Original Message processed by david.+?$\\n{,7})(?:.*\\n){,3}(?:(?:^|\\n)[* ]*(?:Von|An|Cc)(?:\\s{,2}).*){2,})|^(?!Am.*Am\\s.+?schrieb.*:)(Am\\s(?:.+?\\s?)schrieb\\s(?:.+?\\s?.+?):)$|((?:(?:^|\\n)[* ]*(?:From|Sent|To|Subject|Date|Cc):[ *]*(?:\\s{,2}).*){2,}(?:\\n.*){,1})|^(?!On[.\\s]*On\\s(.+?\\s?.+?)\\swrote:)(On\\s(?:.+?\\s?.+?)\\swrote:)$|(?:(?:^|\\n)[* ]*(Von|Gesendet|An|Betreff|Datum):[ *]*(?:\\s{,2}).*){2,}|(^(> *))', flags=re.MULTILINE)
r.split(text)
However I'm getting back a lot of None and mix between matches and the reply body content. Not so sure why โ any idea? How I would imagine re.split to work:
[
'Latest reply.',
'Am So., 1. Jan. 2023 um 17:22 Uhr schrieb John Doe <\nnoreply@github.com>:\n\n\nSecond reply\n',
...
]Sample data and regex: https://regex101.com/r/cC1FUo/1
This only look similar based on your example.
A split on ' ' (a single space) does exactly that - it splits on a single space. Consecutive spaces will lead to empty "matches" when you split.
A split on '\s+' will also split on multiple occurences of those characters and it includes other whitespaces then "pure spaces":
import re
a = re.split(" ", "Why is this \t \t wrong")
b = re.split("\s+", "Why is this \t \t wrong")
print(a)
print(b)
Output:
# re.split(" ",data)
['Why', '', '', '', 'is', 'this', '', '\t', '\t', '', 'wrong']
# re.split("\s+",data)
['Why', 'is', 'this', 'wrong']
Documentation:
\s
Matches any whitespace character; this is equivalent to the class[ \t\n\r\f\v]. (https://docs.python.org/3/howto/regex.html#matching-characters)
It means about space characters. '\s' is split with any whitespaces characters(\b, \t, \n, \a, \r etc.). '+' is if it's following whitespaces. For example " \n \r \t \v". In my opinion, if you need to use directly string operations for separation, you should use my_string.split() like standart methods. Otherwise you should you regex. Because regex engine has a cost and developer should be able to predict that.
I want to split a string p ( a paragraph) in to sentences. If I do p.split(".") I get the sentences of p without the final dot. I want the final dot too. Is there other solution different to
use a regular expression instead
just re-add the dot to every single sentences Thabks