Brave Search

stackoverflow.com › questions › 396567 › is-there-a-standard-way-to-do-an-fopen-with-a-unicode-string-file-path

c - Is there a standard way to do an fopen with a Unicode string file path? - Stack Overflow

1 of 4

No, there's no standard way. There are some differences between operating systems. Here's how different OSs handle non-ASCII filenames.

Linux

OS X

Windows

2 of 4

In *nix, you simply use the standard fopen (see more information in reply from TokeMacGuy, or in this forum)
In Windows, you can use _wfopen, and then pass a Unicode string (for more information, see MSDN).

As there is no real common way, I would wrap this call in a macro, together with all other system-dependent functions.

stackoverflow.com › questions › 53240475 › non-ascii-filename-for-fopen

c - Non-ascii filename for fopen() - Stack Overflow

On essentially every platform except Windows, the expectation is that you pass filenames to the standard functions as normal char[] strings represented in the character encoding of the locale that's being used, and on all modern systems that will be UTF-8. You can either:

honor this by ensuring that you call setlocale(LC_ALL,"") (or setlocale(LC_CTYPE,"") if you don't want to use other locale features) and treating all local text input and output as being in whatever that encoding is (making users happy but possibly making trouble when some external input (e.g. from network) in UTF-8 is not representable, or
just always work in UTF-8, and hope passing UTF-8 strings through to filesystem access functions works by virtue of them being abstract byte arrays.

Unfortunately none of this works on Windows, but it will work in the near future. It also works if you build your application with Cygwin or midipix. Short of that, you need shims to make things work on Windows, and it's a huge pain.

MS Word Step 1 - How to Open an ASCII Text txt File in Microsoft ...

It is operating system specific and file system specific.

You might not know what encoding is used for the file path. The user of your program should know that.

However, in 2018, UTF-8 tend to be used everywhere. In practice, that is not always the case today (specially on Windows).

BTW, different OSes have different restrictions on the file path. On Linux, in principle, you could have a file name containing only a tab and a return character (of course that is very poor taste, and nobody does that in practice; for details read path_resolution(7)). On Windows, that is not allowed.

Can fopen handle such paths?

Yes. The C11 standard (read n1570 for details) does not speak of character encoding.

A different question is what your particular implementation is doing with such paths. The evil is in the details, and they could be ugly.

Discussions

c - What character encoding is used by fopen() or open()? - Stack Overflow

When you use a function like fopen(), you have to pass it a string argument for the filename. I want to know what the character encoding of this string should be. This question has already been as... More on stackoverflow.com

stackoverflow.com

c - Why use fopen() mode 'b' (stdio.h) when output can be non-ASCII regardless? - Stack Overflow

With the C standard library stdio.h, I read that to output ASCII/text data, one should use mode "w" and to output binary data, one should use "wb". But why the difference? In either case, I'm just More on stackoverflow.com

stackoverflow.com

c++ - How to open a file with wchar_t* containing non-Ascii string in Linux? - Stack Overflow

Environment: Gcc/G++ Linux I have a non-ascii file in file system and I'm going to open it. Now I have a wchar_t*, but I don't know how to open it. (my trusted fopen only opens char* file) Pleas... More on stackoverflow.com

stackoverflow.com

c - fopen for everything - is this possible? - Stack Overflow

Is there some way to open UNICODE\ASCII file and automatically detect it's encoding using bare ANSI C. MSDN says that fopen() can switch between various UNICODE formats (utf-8, utf-16, UNICODE BI\LI) if I will use "ccs=UNICODE" flag. More on stackoverflow.com

stackoverflow.com

Videos

07:25

YouTube

October 13, 2021

youtube.com

Leica ASCII (CSV) Import Procedure

03:35

YouTube

ASCII Converter Demonstration - YouTube

Importing a *.txt (ASCII) File - YouTube

May 5, 2022

youtube.com

Eclipse ASCII File Creation - Tips & Tricks

05:53

YouTube

Import Horizontal Points from ASCII File in OpenRoads Designer ...

September 18, 2019

View all

stackoverflow.com › questions › 2005570 › what-encoding-used-when-invoke-fopen-or-open

c - What encoding used when invoke fopen or open? - Stack Overflow

1 of 6

It's a byte string, the interpretation is up to the particular filesystem.

2 of 6

Filesystem calls on Linux are encoding-agnostic, i.e. they do not (need to) know about the particular encoding. As far as they are concerned, the byte-string pointed to by the filename argument is passed down to the filesystem as-is. The filesystem expects that filenames are in the correct encoding (usually UTF-8, as mentioned by Matthew Talbert).

This means that you often don't need to do anything (filenames are treated as opaque byte-strings), but it really depends on where you receive the filename from, and whether you need to manipulate the filename in any way.

stackoverflow.com › questions › 50790217 › what-character-encoding-is-used-by-fopen-or-open

c - What character encoding is used by fopen() or open()? - Stack Overflow

1 of 1

They're both correct in some ways.

The strings passed to the file system calls are a string of bytes, with a null byte marking the end of the string and '/' used to separate path components. Within the file name segments, the meaning of the bytes is immaterial to the file system — they're just a sequence of bytes.

How the bytes that form the file name are displayed depends on the equipment used to display them. If the names use UTF-8 with non-ASCII characters, printing that data using ISO 8859-15 (or 8859-1 for intransigent residents of the USA) yields gibberish, often including C1 control bytes from the byte range 0x80 .. 0x9F. If the names use 8859-15 with non-ASCII characters, there will be sequences that are not valid UTF-8 and you will get illegible or meaningless data displayed (question marks, or other indications of invalid UTF-8 sequences).

stackoverflow.com › questions › 11141422 › why-use-fopen-mode-b-stdio-h-when-output-can-be-non-ascii-regardless

c - Why use fopen() mode 'b' (stdio.h) when output can be non-ASCII regardless? - Stack Overflow

Some operating systems - mostly named "windows" - don't guarantee that they will read and write ascii to files exactly the way you pass it in. So on windows they actually map \r\n to \n. This is fine and transparent when reading and writing ascii. But it would trash a stream of binary data. Basically just always give windows the 'b' flag if you want it to faithfully read and write data to files exactly the way you passed it in.

cygwin.cygwin.narkive.com › REqvcnal › fopen-with-utf-8-chars-in-filenames

There are certain transformations that can take place when outputting in ASCII (e.g. outputting neline+carriage-return when the outputted character is new-line) -- depending on your platform. Such transformations will not take place when using binary format

Narkive

fopen with UTF-8 chars in filenames

I'm not positive about this, but you may have to convert the UTF-8 to UTF-16 (Windows unicode) and call wfopen() instead of fopen(). But wfopen() is a Windows call, not a cygwin call :-( Which would strongly imply that calling wfopen was not the right solution for Cygwin. This is sort of like asking for an expert legal opinion on US law and quoting Canadian law... cgf ... Post by Paul J. Lucas Is this known to work (or not work)? Apparently, it doesn't. FYI: I'm writing JNI code. The strings passed from Java to C are UTF-8. A string containing a non-ASCII character, e.g., an 'e' with an accent, works fine with fopen() under Mac OS X.

stackoverflow.com › questions › 4676327 › how-to-open-a-file-with-wchar-t-containing-non-ascii-string-in-linux

c++ - How to open a file with wchar_t* containing non-Ascii string in Linux? - Stack Overflow

1 of 5

There are two possible answers:

If you want to make sure all Unicode filenames are representable, you can hard-code the assumption that the filesystem uses UTF-8 filenames. This is the "modern" Linux desktop-app approach. Just convert your strings from wchar_t (UTF-32) to UTF-8 with library functions (iconv would work well) or your own implementation (but lookup the specs so you don't get it horribly wrong like Shelwien did), then use fopen.

If you want to do things the more standards-oriented way, you should use wcsrtombs to convert the wchar_t string to a multibyte char string in the locale's encoding (which hopefully is UTF-8 anyway on any modern system) and use fopen. Note that this requires that you previously set the locale with setlocale(LC_CTYPE, "") or setlocale(LC_ALL, "").

And finally, not exactly an answer but a recommendation:

Storing filenames as wchar_t strings is probably a horrible mistake. You should instead store filenames as abstract byte strings, and only convert those to wchar_t just-in-time for displaying them in the user interface (if it's even necessary for that; many UI toolkits use plain byte strings themselves and do the interpretation as characters for you). This way you eliminate a lot of possible nasty corner cases, and you never encounter a situation where some files are inaccessible due to their names.

2 of 5

Linux is not UTF-8, but it's your only choice for filenames anyway

(Files can have anything you want inside them.)

With respect to filenames, linux does not really have a string encoding to worry about. Filenames are byte strings that need to be null-terminated.

This doesn't precisely mean that Linux is UTF-8, but it does mean that it's not compatible with wide characters as they could have a zero in a byte that's not the end byte.

But UTF-8 preserves the no-nulls-except-at-the-end model, so I have to believe that the practical approach is "convert to UTF-8" for filenames.

The content of files is a matter for standards above the Linux kernel level, so here there isn't anything Linux-y that you can or want to do. The content of files will be solely the concern of the programs that read and write them. Linux just stores and returns the byte stream, and it can have all the embedded nuls you want.

Find elsewhere

Google Bing Mojeek

stackoverflow.com › questions › 6821102 › fopen-for-everything-is-this-possible

c - fopen for everything - is this possible? - Stack Overflow

Question 1:

Yes, you can detect the byte order mark, which is the byte sequence you discovered - IF YOUR FILE HAS ONE.
A search on Google and stackoverflow will do the rest. As for the 'not so ugly': you can refactor/beautify your code, e.g. write a function for determining the BOM, and do it in the beginning, then call fopen or _tfopen as required. Then you can refactor that again, and write your own fopen function. But it will still be ugly.

Question 2:

Yes, but the unicode functions are not always called the same on Linux as they are on Windows.
Use defines. Maybe write your own TCHAR.H

Question 3:

#include <locale.h>
setlocale(LC_ALL, "en.UTF-8")

man 3 setlocale

Question 4:
Just use fwprintf.
The other is not a standard.

You can use the wxWidgets toolkit.
It uses unicode, and it uses classes that have implementations for the same thing on Windows and on Linux and Unix and Mac.

The better question for you is how do you convert ASCII to Unicode and vice-versa. That goes like this:

std::string Unicode2ASCII( std::wstring wstrStringToConvert )
{
    size_t sze_StringLength = wstrStringToConvert.length()  ;

    if(0 == sze_StringLength)
        return "" ;

    char* chrarry_Buffer = new char[ sze_StringLength + 1 ] ;
    wcstombs( chrarry_Buffer, wstrStringToConvert.c_str(), sze_StringLength ) ; // Unicode2ASCII, const wchar_t* C-String 2 mulibyte C-String
    chrarry_Buffer[sze_StringLength] = '\0'     ;
    std::string strASCIIstring = chrarry_Buffer ;
    delete chrarry_Buffer ;

    return strASCIIstring ;
}


std::wstring ASCII2Unicode( std::string strStringToConvert )
{
    size_t sze_StringLength = strStringToConvert.length() ;

    if(0 == sze_StringLength)
        return L"" ;

    wchar_t* wchrarry_Buffer = new wchar_t[ sze_StringLength + 1 ] ;
    mbstowcs( wchrarry_Buffer, strStringToConvert.c_str(), sze_StringLength ) ; // Unicode2ASCII, const. mulibyte C-String 2 wchar_t* C-String
    wchrarry_Buffer[sze_StringLength] = L'\0'    ;
    std::wstring wstrUnicodeString = wchrarry_Buffer ;
    delete wchrarry_Buffer   ;

    return wstrUnicodeString ;
}

Edit: Here some insight into the available Unicode functions on Linux (wchar.h):

__BEGIN_NAMESPACE_STD
/* Copy SRC to DEST.  */
extern wchar_t *wcscpy (wchar_t *__restrict __dest,
            __const wchar_t *__restrict __src) __THROW;
/* Copy no more than N wide-characters of SRC to DEST.  */
extern wchar_t *wcsncpy (wchar_t *__restrict __dest,
             __const wchar_t *__restrict __src, size_t __n)
     __THROW;

/* Append SRC onto DEST.  */
extern wchar_t *wcscat (wchar_t *__restrict __dest,
            __const wchar_t *__restrict __src) __THROW;
/* Append no more than N wide-characters of SRC onto DEST.  */
extern wchar_t *wcsncat (wchar_t *__restrict __dest,
             __const wchar_t *__restrict __src, size_t __n)
     __THROW;

/* Compare S1 and S2.  */
extern int wcscmp (__const wchar_t *__s1, __const wchar_t *__s2)
     __THROW __attribute_pure__;
/* Compare N wide-characters of S1 and S2.  */
extern int wcsncmp (__const wchar_t *__s1, __const wchar_t *__s2, size_t __n)
     __THROW __attribute_pure__;
__END_NAMESPACE_STD

#ifdef __USE_XOPEN2K8
/* Compare S1 and S2, ignoring case.  */
extern int wcscasecmp (__const wchar_t *__s1, __const wchar_t *__s2) __THROW;

/* Compare no more than N chars of S1 and S2, ignoring case.  */
extern int wcsncasecmp (__const wchar_t *__s1, __const wchar_t *__s2,
            size_t __n) __THROW;

/* Similar to the two functions above but take the information from
   the provided locale and not the global locale.  */
# include <xlocale.h>

extern int wcscasecmp_l (__const wchar_t *__s1, __const wchar_t *__s2,
             __locale_t __loc) __THROW;

extern int wcsncasecmp_l (__const wchar_t *__s1, __const wchar_t *__s2,
              size_t __n, __locale_t __loc) __THROW;
#endif


/* Special versions of the functions above which take the locale to
   use as an additional parameter.  */
extern long int wcstol_l (__const wchar_t *__restrict __nptr,
              wchar_t **__restrict __endptr, int __base,
              __locale_t __loc) __THROW;

extern unsigned long int wcstoul_l (__const wchar_t *__restrict __nptr,
                    wchar_t **__restrict __endptr,
                    int __base, __locale_t __loc) __THROW;

__extension__
extern long long int wcstoll_l (__const wchar_t *__restrict __nptr,
                wchar_t **__restrict __endptr,
                int __base, __locale_t __loc) __THROW;

__extension__
extern unsigned long long int wcstoull_l (__const wchar_t *__restrict __nptr,
                      wchar_t **__restrict __endptr,
                      int __base, __locale_t __loc)
     __THROW;

extern double wcstod_l (__const wchar_t *__restrict __nptr,
            wchar_t **__restrict __endptr, __locale_t __loc)
     __THROW;

extern float wcstof_l (__const wchar_t *__restrict __nptr,
               wchar_t **__restrict __endptr, __locale_t __loc)
     __THROW;

extern long double wcstold_l (__const wchar_t *__restrict __nptr,
                  wchar_t **__restrict __endptr,
                  __locale_t __loc) __THROW;


/* Copy SRC to DEST, returning the address of the terminating L'\0' in
   DEST.  */
extern wchar_t *wcpcpy (wchar_t *__restrict __dest,
            __const wchar_t *__restrict __src) __THROW;

/* Copy no more than N characters of SRC to DEST, returning the address of
   the last character written into DEST.  */
extern wchar_t *wcpncpy (wchar_t *__restrict __dest,
             __const wchar_t *__restrict __src, size_t __n)
     __THROW;
#endif  /* use GNU */


/* Wide character I/O functions.  */

#ifdef  __USE_XOPEN2K8
/* Like OPEN_MEMSTREAM, but the stream is wide oriented and produces
   a wide character string.  */
extern __FILE *open_wmemstream (wchar_t **__bufloc, size_t *__sizeloc) __THROW;
#endif

#if defined __USE_ISOC95 || defined __USE_UNIX98
__BEGIN_NAMESPACE_STD

/* Select orientation for stream.  */
extern int fwide (__FILE *__fp, int __mode) __THROW;


/* Write formatted output to STREAM.

   This function is a possible cancellation point and therefore not
   marked with __THROW.  */
extern int fwprintf (__FILE *__restrict __stream,
             __const wchar_t *__restrict __format, ...)
     /* __attribute__ ((__format__ (__wprintf__, 2, 3))) */;
/* Write formatted output to stdout.

   This function is a possible cancellation point and therefore not
   marked with __THROW.  */
extern int wprintf (__const wchar_t *__restrict __format, ...)
     /* __attribute__ ((__format__ (__wprintf__, 1, 2))) */;
/* Write formatted output of at most N characters to S.  */
extern int swprintf (wchar_t *__restrict __s, size_t __n,
             __const wchar_t *__restrict __format, ...)
     __THROW /* __attribute__ ((__format__ (__wprintf__, 3, 4))) */;

/* Write formatted output to S from argument list ARG.

   This function is a possible cancellation point and therefore not
   marked with __THROW.  */
extern int vfwprintf (__FILE *__restrict __s,
              __const wchar_t *__restrict __format,
              __gnuc_va_list __arg)
     /* __attribute__ ((__format__ (__wprintf__, 2, 0))) */;
/* Write formatted output to stdout from argument list ARG.

   This function is a possible cancellation point and therefore not
   marked with __THROW.  */
extern int vwprintf (__const wchar_t *__restrict __format,
             __gnuc_va_list __arg)
     /* __attribute__ ((__format__ (__wprintf__, 1, 0))) */;
/* Write formatted output of at most N character to S from argument
   list ARG.  */
extern int vswprintf (wchar_t *__restrict __s, size_t __n,
              __const wchar_t *__restrict __format,
              __gnuc_va_list __arg)
     __THROW /* __attribute__ ((__format__ (__wprintf__, 3, 0))) */;


/* Read formatted input from STREAM.

   This function is a possible cancellation point and therefore not
   marked with __THROW.  */
extern int fwscanf (__FILE *__restrict __stream,
            __const wchar_t *__restrict __format, ...)
     /* __attribute__ ((__format__ (__wscanf__, 2, 3))) */;
/* Read formatted input from stdin.

   This function is a possible cancellation point and therefore not
   marked with __THROW.  */
extern int wscanf (__const wchar_t *__restrict __format, ...)
     /* __attribute__ ((__format__ (__wscanf__, 1, 2))) */;
/* Read formatted input from S.  */
extern int swscanf (__const wchar_t *__restrict __s,
            __const wchar_t *__restrict __format, ...)
     __THROW /* __attribute__ ((__format__ (__wscanf__, 2, 3))) */;

# if defined __USE_ISOC99 && !defined __USE_GNU \
     && (!defined __LDBL_COMPAT || !defined __REDIRECT) \
     && (defined __STRICT_ANSI__ || defined __USE_XOPEN2K)
#  ifdef __REDIRECT
/* For strict ISO C99 or POSIX compliance disallow %as, %aS and %a[
   GNU extension which conflicts with valid %a followed by letter
   s, S or [.  */
extern int __REDIRECT (fwscanf, (__FILE *__restrict __stream,
                 __const wchar_t *__restrict __format, ...),
               __isoc99_fwscanf)
     /* __attribute__ ((__format__ (__wscanf__, 2, 3))) */;
extern int __REDIRECT (wscanf, (__const wchar_t *__restrict __format, ...),
               __isoc99_wscanf)
     /* __attribute__ ((__format__ (__wscanf__, 1, 2))) */;
extern int __REDIRECT_NTH (swscanf, (__const wchar_t *__restrict __s,
                     __const wchar_t *__restrict __format,
                     ...), __isoc99_swscanf)
     /* __attribute__ ((__format__ (__wscanf__, 2, 3))) */;
#  else
extern int __isoc99_fwscanf (__FILE *__restrict __stream,
                 __const wchar_t *__restrict __format, ...);
extern int __isoc99_wscanf (__const wchar_t *__restrict __format, ...);
extern int __isoc99_swscanf (__const wchar_t *__restrict __s,
                 __const wchar_t *__restrict __format, ...)

linuxquestions.org › questions › programming-9 › how-to-use-fopen-or-something-to-open-unicode-filename-220111

As I suggested in a comment, you should take a look at ICU which is a cross platform C library for Unicode handling, created by IBM. It provides additional support for C++ and Java with a very powerful String class. It's used in a lot of places like Android and iOS so it's very stable and mature.

LinuxQuestions.org

How to use fopen or something to open unicode filename

Iam new to linux. Windows(NT, 2000, XP) will support two byte unicode strings and there are functions like _wfopen() to open a file having some

Experts Exchange

experts-exchange.com › questions › 27455203 › fopen-encoding.html

Solved: fopen encoding | Experts Exchange

November 18, 2011 - <?php $text = "\t\tThese are a few words :) ... "; $binary = "\x09Example string\x0A"; $hello = "Hello World"; var_dump($text, $binary, $hello); print "\n"; $trimmed = trim($text); var_dump($trimmed); $trimmed = trim($text, " \t."); var_dump($trimmed); $trimmed = trim($hello, "Hdle"); var_dump($trimmed); // trim the ASCII control characters at the beginning and end of $binary // (from 0 to 31 inclusive) $clean = trim($binary, "\x00..\x1F"); var_dump($clean); ?> Select allOpen in new window 19-11-2011-10-38-38.png image-ipad.jpeg · EARN REWARDS FOR ASKING, ANSWERING, AND MORE. Earn free swag for participating on the platform. Ray Paseur🇺🇸 · I do not think this is an fopen() problem.

Microsoft Learn

learn.microsoft.com › en-us › cpp › c-runtime-library › reference › fopen-wfopen

fopen, _wfopen | Microsoft Learn

You can use the AreFileApisANSI function to determine whether filename is interpreted using the ANSI or the system default OEM codepage. _wfopen is a wide-character version of fopen; the _wfopen arguments are wide-character strings. Otherwise, _wfopen and fopen behave identically.

Cplusplus

cplusplus.com › reference › cstdio › fopen

fopen

The returned stream is fully buffered ... All opened files are automatically closed on normal program termination. The running environment supports at least FOPEN_MAX files open simultaneously....

LinuxVox

linuxvox.com › blog › what-encoding-used-when-invoke-fopen-or-open

What Encoding Is Used for Filenames in Linux's `fopen()` and `open()` Functions? — linuxvox.com

But have you ever wondered how Linux systems interpret the characters in filenames—especially when dealing with non-ASCII characters like accents, emojis, or non-Latin scripts? If you’ve encountered garbled filenames or struggled to open files with special characters, the root cause likely lies in how filenames are encoded when passed to system functions like open() and fopen().

GitHub

github.com › illsk1lls › ZipRipper › issues › 59

Brute-forcing fails on Windows 10 with `fopen: ASCII.chr: No such file or directory` · Issue #59 · illsk1lls/ZipRipper

January 18, 2025 - Brute-forcing fails on Windows 10 with fopen: ASCII.chr: No such file or directory#59 · Copy link · hashimaziz1 · opened · on Jan 18, 2025 · Issue body actions · Firstly, thanks for your work on this amazing tool, I used it successfully with the default wordlist just a few months ago.

Author hashimaziz1

stackoverflow.com › questions › 936428 › fopen-non-ascii-character-error

c++ - fopen non-ascii character error - Stack Overflow

Convert your pathname to UTF-16 (probably using MultiByteToWideChar) and use GetShortPathNameW to get a path you can pass to fopen.

rakanalysis.wordpress.com › 2012 › 04 › 13 › fundamentals-of-ascii-file-inputoutput-in-c

Make sure that you have wchar_t (it's a compiler setting) and make sure you're passing in a wchar_t * and not a char * to _wfopen.

Wordpress

Fundamentals of ASCII File Input/Output in C | A Technophile's Indulgence

October 26, 2014 - If the file can be opened, the file pointer is updated and the program may continue. However, if a file cannot be opened, e.g. because the user lacks permissions for the file, the fopen() function returns the value NULL, which can be used to set up error reporting and functions.

GitHub

github.com › ocornut › imgui › issues › 917

Windows's fopen() Assertion if path contains utf-8 · Issue #917 · ocornut/imgui

November 22, 2016 - Using imgui in a game, we pass the pathnames as utf-8 to it. Under windows, it seems ImLoadFileToMemory uses fopen, which does not deal with utf-8 correctly, but just uses the "default locale". Wou...

Author ecraven

Linux Man Pages

man7.org › linux › man-pages › man3 › fopen.3.html

fopen(3) - Linux manual page

The fopen() function opens the file whose name is the string pointed to by path and associates a stream with it. The argument mode points to a string beginning with one of the following sequences (possibly followed by additional characters, as described below): r Open text file for reading.

stackoverflow.com › questions › 66499608 › ascii-file-processing-in-c

ascii file processing in C - Stack Overflow