The function has several drawbacks.
For starters it should be declared like
std::string::size_type strStr( const std::string &haystack, const std::string &needle );
and if the second string is not found in the first string the function should return std::string::npos as all similar member functions of the class std::string do.
The function parameters shell be of constant referenced types.
The condition in this if-statement
if (haystack.empty() && !needle.empty())
has a redundant operand. It could be rewritten like
if (haystack.empty())
This loop
for (i; i < haystack.length(); i++)
should stop its iterations when the size of the tail of the first string is less than the size of the second string.
in this if-statement
if (haystack[i++] != needle[j]) {
the variable i is incremented that results in incrementing the variable two times: one in this statement and the second time in the loop.
The second pair of these statements
if (j == needle.length()) {
return ans;
is redundant.
The function can be written the following way as it is shown in the demonstrative program.
#include <iostream>
#include <string>
std::string::size_type strStr( const std::string &haystack, const std::string &needle )
{
if ( needle.empty() )
{
return 0;
}
else if ( haystack.empty() )
{
return -std::string::npos;
}
else
{
std::string::size_type ans = std::string::npos;
auto n1 = haystack.length();
auto n2 = needle.length();
for ( std::string::size_type i = 0; ans == std::string::npos && i + n2 <= n1; i++ )
{
std::string::size_type j = 0;
while ( j < n2 && haystack[i+j] == needle[j] ) j++;
if ( j == n2 ) ans = i;
}
return ans;
}
}
int main()
{
std::string haystack( "mississippi" );
std::string needle( "issip" );
std::cout << strStr( haystack, needle ) << '\n';
return 0;
}
Its output is
4
Answer from Vlad from Moscow on Stack OverflowVideos
The function has several drawbacks.
For starters it should be declared like
std::string::size_type strStr( const std::string &haystack, const std::string &needle );
and if the second string is not found in the first string the function should return std::string::npos as all similar member functions of the class std::string do.
The function parameters shell be of constant referenced types.
The condition in this if-statement
if (haystack.empty() && !needle.empty())
has a redundant operand. It could be rewritten like
if (haystack.empty())
This loop
for (i; i < haystack.length(); i++)
should stop its iterations when the size of the tail of the first string is less than the size of the second string.
in this if-statement
if (haystack[i++] != needle[j]) {
the variable i is incremented that results in incrementing the variable two times: one in this statement and the second time in the loop.
The second pair of these statements
if (j == needle.length()) {
return ans;
is redundant.
The function can be written the following way as it is shown in the demonstrative program.
#include <iostream>
#include <string>
std::string::size_type strStr( const std::string &haystack, const std::string &needle )
{
if ( needle.empty() )
{
return 0;
}
else if ( haystack.empty() )
{
return -std::string::npos;
}
else
{
std::string::size_type ans = std::string::npos;
auto n1 = haystack.length();
auto n2 = needle.length();
for ( std::string::size_type i = 0; ans == std::string::npos && i + n2 <= n1; i++ )
{
std::string::size_type j = 0;
while ( j < n2 && haystack[i+j] == needle[j] ) j++;
if ( j == n2 ) ans = i;
}
return ans;
}
}
int main()
{
std::string haystack( "mississippi" );
std::string needle( "issip" );
std::cout << strStr( haystack, needle ) << '\n';
return 0;
}
Its output is
4
The problem is that you modify i in
if (haystack[i++] != needle[j]) {
Thus preventing a second potential match from being explored. Try
if (haystack[i + j] != needle[j]) {
and fix any knock-on issues. I expect it to work as-is, though.
The problem is you increment i inside the inner loop and again in the outer loop, potentially skipping the null terminator, hence accessing bytes in haystack beyond the end, which has undefined behavior.
You should only increment j in the inner loop and compare haystack[i + j] == needle[j].
Here is a modified version:
#include <stdio.h>
int strStr(const char *haystack, const char *needle) {
if (needle[0] == '\0')
return 0;
int i = 0;
while (haystack[i] != '\0') {
int j = 0;
while (needle[j] != '\0' && haystack[i + j] == needle[j]) {
j++;
}
if (needle[j] == '\0') {
return i;
}
i++;
}
return -1;
}
int main() {
printf("%d\n", strStr("aaaaaaaaab", "aaaab"));
printf("%d\n", strStr("aaaaaaaa", "aaaaaaaaa"));
printf("%d\n", strStr("Hello world\n", "or"));
return 0;
}
Note that you can remove some redundant comparisons by reorganising the code:
int strStr(const char *haystack, const char *needle) {
for (int i = 0;; i++) {
for (int j = 0;; j++) {
if (needle[j] == '\0')
return i;
if (haystack[i + j] != needle[j])
break;
}
if (haystack[i] == '\0')
return -1;
}
}
Note however that this method has a worst case time complexity of O(len(haystack)*len(needle)), with a pathological example of:
strStr("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaab")
In the function strStr(), when i and j both equal to 3, the nested while loop exits because the condition haystack[i] != '\0' becomes false. Below in the code, it check for needle[j] == '\0' which is false as for j = 3, needle[j] is not equal to \0. Then it increments i which makes value of i equal to 4 and the outer while loop iterates and check the condition haystack[i] != '\0' which results in accessing haystack beyond the size of buffer it is pointing to because the valid index for the buffer that haystack is pointing to is range from 0 - 3 (string - "aaa").
Since, you have posted AddressSanitizer output, below is the explanation of how to interpret ASan output and identify the problem:
The error reported by ASan is:
ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000000014
buffer-overflow means your program is trying to access memory beyond the size of array, precisely trying to access the string literal beyond its size1).
Check this statement in the ASan output:
0x602000000014 is located 0 bytes to the right of 4-byte region [0x602000000010,0x602000000014)
This means program attempting to access address 0x602000000014 which exists 0 bytes right to 4-byte region [0x602000000010,0x602000000014).
A point to note here for region [0x602000000010,0x602000000014) - square brackets [ mean the end point is included, and round parentheses ) mean it's excluded.
The 4-byte region 0x602000000010 - 0x602000000013 contain the string literal "aaa" which is pointed by haystack pointer:
0x602000000014
0x602000000013 |
0x602000000012 | |
0x602000000011 | | |
0x602000000010 | | | |
| | | | |
+-----+-----+-----+------+-----+
| a | a | a | \0 | |
+-----+-----+-----+------+-----+
\ /
+--------------------+
|
4-byte region pointed
by haystack pointer
Note that ASan creates poisoned red zones at the edges of objects to detect overflows or underflows and during compilation ASan instruments the code to verify the shadow memory state at each memory access2). 1 byte of shadow memory keeps the track of 8 bytes of memory used by ASan instrumented program.
Now, check this section of ASan output:
Shadow bytes around the buggy address:
in the output, a memory region is highlighted with => :
=>0x0c047fff8000: fa fa[04]fa fa fa 05 fa fa fa fa fa fa fa fa fa
^^^^
In [04] -
04indicates that first four bytes of8byte region (which are mapped with this byte in shadow memory) are addressable. That means the memory region from0x602000000010to0x602000000013contain"aaa"(including null terminating character) are addressable.- square bracket
[]indicates that your program trying to access the redone (basically, the memory which is not allowed to access) which are mapped to this byte in shadow memory.
Your program trying to access the byte just next to terminating null character \0 of string "aaa" and ultimately attempting to access redzone and, hence, the ASan reporting it.
The other post already shown the better implementation of strStr(). I am leaving it up to you to fix the problem in your code and optimise the implementation of strStr() function.
A suggestion:
When compiling program with -fsanitize=address, enable the debugging information as well (e.g. -g option of gcc compiler) and you will get line number and proper stack in the ASan output.
1). Not sure why it's reporting heap-buffer-overflow for accessing string literal beyond its size. On my system ASan output for same program giving error - global-buffer-overflow, which seems more appropriate as the string literals usually allocated in data segment but where they will be placed can vary based on underlying platform/architecture.
2). AddressSanitizer - How it works?