An ideal way to decode JSON documents in C?

softwareengineering.stackexchange.com › questions › 212110 › an-ideal-way-to-decode-json-documents-in-c

Since C is statically typed and JSON is not, and any JSON element can be a null, a number, a string, a boolean, an object, or an array, you basically have to do it as "a rip off the OOP way". Create a record type that represents a JSON value, and has a member that's a tag indicating which type of JSON value it is, and then create "subclasses" that build on this record type. To represent JSON well in C, you basically have to recreate OOP and polymorphism.

Anything that uses a JSON value will have to take a pointer to the base record type. Remember that objects are always reference types, poor C++ language design choices notwithstanding, because otherwise it screws up polymorphism, and you require polymorphism to do this right. When you find out what kind of "subclass" you're actually working with, (by checking the tag member,) you can cast your JSON Value pointer to the appropriate subclass type pointer to access the rest of the record.

Answer from Mason Wheeler on Stack Exchange

Stack Exchange

softwareengineering.stackexchange.com › questions › 212110 › an-ideal-way-to-decode-json-documents-in-c

programming languages - An ideal way to decode JSON documents in C? - Software Engineering Stack Exchange

Top answer

1 of 3

2 of 3

You're doing it right. This is an old problem - JSON's varying types and unordered presentation aspects are essentially the same as those presented by every data-markup language going back to at least SGML.

For C in particular, there are lots of options already available. Googling "C JSON Parser" turns up many, including jsmn, which looks like it learned many of the lessons that Java processing of XML had to teach. More directly to the point, this has been addressed on StackOverflow again and again. And, of course Crockford's JSON.org lists 16 different C implementations of JSON.

Stack Overflow

stackoverflow.com › questions › 6673936 › parsing-json-using-c

Parsing JSON using C - Stack Overflow

Top answer

1 of 7

Json isn't a huge language to start with, so libraries for it are likely to be small(er than Xml libraries, at least).

There are a whole ton of C libraries linked at Json.org. Maybe one of them will work well for you.

2 of 7

cJSON has a decent API and is small (2 files, ~700 lines). Many of the other JSON parsers I looked at first were huge... I just want to parse some JSON.

Edit: We've made some improvements to cJSON over the years.

Videos

17:53

YouTube

Obj-C Parsing JSON - Last Video on this Old Programming Language ...

April 10, 2018

05:04

YouTube

JSON Parsing Tutorial - YouTube

How to Parse JSON in C - YouTube

December 30, 2018

View all

GitHub

github.com › DaveGamble › cJSON

GitHub - DaveGamble/cJSON: Ultralightweight JSON parser in ANSI C · GitHub

Ultralightweight JSON parser in ANSI C. Contribute to DaveGamble/cJSON development by creating an account on GitHub.

Starred by 12.5K users

Forked by 3.5K users

Zserge

zserge.com › jsmn

The most simple JSON parser in C for small systems

Library sources are available at https://github.com/zserge/jsmn. Usually JSON parsers convert JSON string to internal object representation. But if you are using C it becomes tricky as there is no hash tables, no reflection etc.

RealTimeLogic

realtimelogic.com › products › json

JSON C Source Code Library for IoT Communication

Encode/decode messages sent on a WebSocket connection. Encode/decode messages sent via an MQTT topic. Encode/decode messages sent/received via RPC protocols such as HTTP. ... When the m2m example runs, navigate to: realtimelogic.info/IoT/led/json/ and click the link to your connected device.

JSON Formatter

jsonformatter.org › json-parser

JSON Parser Online to parse JSON

It's a wonderful tool crafted for JSON lovers who are looking to deserialize JSON online. This JSON decode online helps to decode unreadable JSON.

University of Alberta

sites.ualberta.ca › ~delliott › ece492 › appnotes › 2015w › G6_Parsing_JSON_in_C › microjson_tutorial.html

App Notes: Parsing JSON using C

This tutorial will provide a simple introduction to parsing JSON strings in the C programming language using the microjson library. More sophisticated examples can be found in the official documentation.

Stack Exchange

codereview.stackexchange.com › questions › 180266 › simple-json-parser-in-c

parsing - Simple JSON parser in C - Code Review Stack Exchange

Top answer

1 of 3

Header

In C, all enum names share the same namespace with each other (and with things like variable names). It's therefore a good idea to try to reduce the risk that they'll collide.

Your enum json_value_type names have the prefix TYPE_, which is pretty generic. Some other library might try to use the same name. I'd suggest changing that prefix to, say, JSON_.

Also, you don't seem to be using TYPE_KEY for anything. Just remove it.

Implementation

As Roland Illig notes, the arguments to iscntrl() and isspace() in your skip_whitespace() function should be cast to unsigned char to avoid sign extension.

Alternatively, and more closely following the JSON spec, you could rewrite this function simply as:

static void skip_whitespace(const char** cursor)
{
    while (**cursor == '\t' || **cursor == '\r' ||
           **cursor == '\n' || **cursor == ' ') ++(*cursor);
}

A lot of your static helper functions do non-trivial combinations of things, and lack any comment explaining what they do. One or two comment lines before each function could help readability a lot.

In particular, your has_char() function does a bunch of different things:

It skips whitespace.
It checks for the presence of a certain character in the input.
If the character is found, it automatically skips it.

Only #2 is obviously implied by the function name; the others are unexpected side effects, and should at least be clearly documented.

Actually, it seems to me that it might be better to remove the call to skip_whitespace() from has_char(), and just let the caller explicitly skip whitespace before calling it if needed. In many cases your code already does that, making the duplicate skip redundant.

Also, to make effect #3 less surprising to the reader, it might be a good idea to rename that function to something a bit more active like, say, read_char().

At the end of json_parse_object(), you have:

    return success;
    return 1;
}

Surely that's redundant. Just get rid of the return 1;.

Also, it looks like you're using the generic json_parse_value() function to parse object keys, and don't test to make sure that they're strings. This allows some invalid JSON to get through your parser. I'd suggest either adding a explicit type check or splitting your string parsing code into a separate function (as described below) and calling it directly from json_parse_object().

At the top of json_parse_array(), you have:

if (**cursor == ']') {
    ++(*cursor);
    return success;
}
while (success) {

You could rewrite that the same way as you do in json_parse_object():

while (success && !has_char(']')) {

(Only, you know, I still think the name read_char() would be better.)

Also, for some reason, your json_parse_array() seems to expect the caller to initialize the parent struct, while json_parse_object() does it automatically. AFAICT there's no reason for the inconsistency, so you could and probably should just make both functions work the same way.

Your json_is_literal() function is not marked as static, even though it doesn't appear in the header. Like is_char(), I'd also prefer to rename it to something more active, like json_read_literal() or just read_literal(), to make it clearer that it automatically advances the cursor on a successful match.

(Also note that, as written, this function does not check that the literal in the input actually ends where it's supposed to. For example, it would successfully match the input nullnullnull against null. I don't think that's an actual bug, since the only valid literals in JSON are true, false and null, none of which are prefixes of each other, and since two literals cannot appear consecutively in valid JSON without some other token in between. But it's definitely at least worth noting in a comment.)

You might also want to explicitly mark some of your static helper functions as inline to give the compiler a hint that it should try to merge them into the calling code. I'd suggest doing that at least for skip_whitespace(), has_char() and json_is_literal().

Since your json_value_to_X() accessor functions all consist of nothing but an assert() and a pointer dereference, you should also consider moving their implementations into json.h and marking them as static inline. This would allow the compiler to inline them into the calling code even in other .c files, and possibly to optimize away the assert() if the calling code already checks the type anyway.

In your main json_parse() function, you might want to explicitly check that there's nothing but whitespace left in the input after the root value has been parsed.

String parsing

Your string parsing code in json_parse_value() is broken, since it doesn't handle backslash escapes. For example, it fails on the following valid JSON input:

"I say: \"Hello, World!\""

You may want to add that as a test case.

You should also test that your code correctly handles other backslash escape sequences like \b, \f, \n, \r, \t, \/ and especially \\ and \unnnn. Here's few more test cases for those:

"\"\b\f\n\r\t\/\\"
"void main(void) {\r\n\tprintf(\"I say: \\\"Hello, World!\\\"\\n\");\r\n}"
"\u0048\u0065\u006C\u006C\u006F\u002C\u0020\u0057\u006F\u0072\u006C\u0064\u0021"
"\u3053\u3093\u306B\u3061\u306F\u4E16\u754C"

Since JSON strings can contain arbitrary Unicode characters, you'll need to decide how to handle them. Probably the simplest choice would be to declare your input and output to be in UTF-8 (or perhaps WTF-8) and to convert \unnnn escapes into UTF-8 byte sequences (and, optionally, vice versa). Note that, since you're using null-terminated strings, you may prefer to decode \u0000 into the overlong encoding "\xC0\x80" instead of a null byte.

For the sake of keeping the main json_parse_value() function readable, I would strongly recommend splitting the string parsing code into a separate helper function. Especially since making it handle backslash escapes correctly will complicate it considerably.

One of the complications is that you won't actually know how long the string will be until you've parsed it. One way to deal with that would be to dynamically grow the allocated output string with realloc(), e.g. like this:

// resize output buffer *buffer to new_size bytes
// return 1 on success, 0 on failure
static int resize_buffer(char** buffer, size_t new_size) {
    char *new_buffer = realloc(*buffer, new_size);
    if (new_buffer) {
        *buffer = new_buffer;
        return 1;
    }
    else return 0;
}

// parse a JSON string value
// expects the cursor to point after the initial double quote
// return 1 on success, 0 on failure
static int json_parse_string(const char** cursor, json_value* parent) {
    int success = 1;

    size_t length = 0, allocated = 8;  // start with an 8-byte buffer 
    char *new_string = malloc(allocated);
    if (!new_string) return 0;

    while (success && **cursor != '"') {
        if (**cursor == '\0') {
            success = 0;  // unterminated string
        }
        // we're going to need at least one more byte of space
        while (success && length + 1 > allocated) {
             success = resize_buffer(&new_string, allocated *= 2);
        }
        if (!success) break;
        if (**cursor != '\\') {
             new_string[length++] = **cursor;  // just copy normal bytes to output
             ++(*cursor);
        }
        else switch ((*cursor)[1]) {
            case '\\':new_string[length++] = '\\'; *cursor += 2; break;
            case '/': new_string[length++] = '/';  *cursor += 2; break;
            case '"': new_string[length++] = '"';  *cursor += 2; break;
            case 'b': new_string[length++] = '\b'; *cursor += 2; break;
            case 'f': new_string[length++] = '\f'; *cursor += 2; break;
            case 'n': new_string[length++] = '\n'; *cursor += 2; break;
            case 'r': new_string[length++] = '\r'; *cursor += 2; break;
            case 't': new_string[length++] = '\t'; *cursor += 2; break;
            case 'u':
                // TODO: handle Unicode escapes! (decode to UTF-8?)
                // note that this may require extending the buffer further
            default:
                success = 0; break;  // invalid escape sequence
        }
    }
    success = success && resize_buffer(&new_string, length+1);
    if (!success) { 
        free(new_string);
        return 0;
    }
    new_string[length] = '\0';
    parent->type = TYPE_STRING;
    parent->value.string = new_string;
    ++(*cursor);  // move cursor after final double quote
    return 1;
}

An alternative solution would be to run two parsing passes over the input: one just to determine the length of the output string, and another to actually decode it. This would be most easily done something like this:

static int json_parse_string(const char** cursor, json_value* parent) {
    char *tmp_cursor = *cursor;

    size_t length = (size_t)-1;
    if (!json_string_helper(&tmp_cursor, &length, NULL)) return 0;

    char *new_string = malloc(length);
    if (!new_string) return 0;

    if (!json_string_helper(&tmp_cursor, &length, new_string)) {
        free(new_string);
        return 0;
    }
    parent->type = TYPE_STRING;
    parent->value.string = new_string;
    *cursor = tmp_cursor;
    return 1;
}

where the helper function:

static int json_parse_helper(const char** cursor, size_t* length, char* new_string) {
    // ...
}

parses a JSON string of at most *length bytes into new_string and writes the actual length of the parsed string into *length, or, if new_string == NULL, just determines the length of the string without actually storing the decoded output anywhere.

Number parsing

Your current json_parse_value() implementation treats numbers as the default case, and simply feeds anything that doesn't being with ", [, {, n, t or f into the C standard library function strtod().

Since strtod() accepts a superset of valid JSON number literals, this should work, but can make your code sometimes accept invalid JSON as valid. For example, your code will accept +nan, -nan, +inf and -inf as valid numbers, and will also accept hexadecimal notation like 0xABC123. Also, as the strtod() documentation linked above notes:

In a locale other than the standard "C" or "POSIX" locales, this function may recognize additional locale-dependent syntax.

If you want to be stricter, you might want to explicitly validate anything that looks like a number against the JSON grammar before passing it to strtod().

Also note that strtod() may set errno e.g. if the input number is outside the range of a double. You probably should be checking for this.

Testing

I have not looked at your tests in detail, but it's great to see that you have them (even if, as noted above, their coverage could be improved).

Personally, though, I'd prefer to move the tests out of the implementation into a separate source file. This does have both advantages and disadvantages:

The main disadvantage is that you can no longer directly test static helper functions. However, given that your public API looks clean and comprehensive, and doesn't suffer from any "hidden state" issues that would complicate testing, you should be able to achieve good test coverage even just through the API.
The main advantage (besides a clean separation between implementation and testing code) is that your tests will automatically test the public API. In particular, any problems with the json.h header will show up in your tests. Also, doing your tests via the API helps you ensure that your API really is sufficiently complete and flexible for general use.

If you really still want to directly test your static functions, you could always add a preprocessor flag that optionally exposes them for testing, either via simple wrappers or just by removing the static keyword from their definitions.

Ps. I did notice that your json_test_value_number() test is failing for me (GCC 5.4.0, i386 arch), presumably because the number 23.4 is not exactly representable in floating point. Changing it to 23.5 makes the test pass.

2 of 3

This is in no way a complete review, but I'll share some things that caught my eye while reading your code.

Comments

While comments surely are nice, some of your inline comments add only noise to the code.

// Eat whitespace
int success = 0;
skip_whitespace(cursor);

First of all, the comment is one line too early. Second, one can read that the whitespace is consumed by looking at the function - the name describes it perfectly, there's no need for an additional comment.

case '\0':
    // If parse_value is called with the cursor at the end of the string
    // that's a failure
    success = 0;
    break;

Again, this comment just repeats what the code itself is saying.

enum json_value_type {
    TYPE_NULL,
    TYPE_BOOL,
    TYPE_NUMBER,
    TYPE_OBJECT, // Is a vector with pairwise entries, key, value
    TYPE_ARRAY, // Is a vector, all entries are plain 
    TYPE_STRING,
    TYPE_KEY
};

Now, these comments are not really useless since they document what each value represents. But why only for TYPE_OBJECT and TYPE_ARRAY - why not for all values? Personally, I'd just put a link to json.org just before that enum. Your types are analogous to the ones there, you need only document what TYPE_KEY is supposed to be. Which brings me to the next point...

`TYPE_KEY`

Taking a look at json.org, you can see an object consists of a list of members, which in turn are made of a string and a value. Which means that you don't really need TYPE_KEY! Just add a new struct for members consisting of a TYPE_STRING value and another json value of any type and you're good to go. Right now, you could have e.g. a number as key for a value, which is not allowed. Would make some of the object-related logic nicer too, like this for loop:

for (size_t i = 0; i < size; i += 2)

Ironically, the step of this for loop actually could use a comment (why += 2?) but lacks one.

Miscellaneous

case '\0':
    // If parse_value is called with the cursor at the end of the string
    // that's a failure
    success = 0;
    break;

Why not just return 0;?

while (iscntrl(**cursor) || isspace(**cursor)) ++(*cursor);

and

if (success) ++(*cursor);

and

if (has_char(cursor, '}')) break;
else if (has_char(cursor, ',')) continue;

and a few others of those. I'm not particularly fond of putting condition and statement on the same line, especially since you're not consistently doing this. I'm kinda okay with doing this for the sake of control flow, like if (!something) return;, but it's still "meh". Better do it right and put the statement on a new line.

Also, I find that your code could use some more empty lines to seperate "regions" or whatever you'd like to call them. For example:

json_value key = { .type = TYPE_NULL };
json_value value = { .type = TYPE_NULL };
success = json_parse_value(cursor, &key);
success = success && has_char(cursor, ':');
success = success && json_parse_value(cursor, &value);

if (success) {
    vector_push_back(&result.value.object, &key);
    vector_push_back(&result.value.object, &value);
}
else {
    json_free_value(&key);
    break;
}
skip_whitespace(cursor);
if (has_char(cursor, '}')) break;
else if (has_char(cursor, ',')) continue;
else success = 0;

There is one empty line seperating the setup-and-parse-stuff from the check-and-return stuff, but you can do better.

json_value key = { .type = TYPE_NULL };
json_value value = { .type = TYPE_NULL };

success = json_parse_value(cursor, &key);
success = success && has_char(cursor, ':');
success = success && json_parse_value(cursor, &value);

if (success) {
    vector_push_back(&result.value.object, &key);
    vector_push_back(&result.value.object, &value);
}
else {
    json_free_value(&key);
    break;
}

skip_whitespace(cursor);

if (has_char(cursor, '}')) break;
else if (has_char(cursor, ',')) continue;
else success = 0;

I find this to be way cleaner. You have a block for setting up the values, a block for parsing them, a block for putting them into the vector, a block for skipping whitespace and a block for finalizing the current action. The last empty line between skip_whitespace(cursor); and if ... is debatable, but I prefer it this way.

Other than that, I found your code to be easily readable and understandable. You properly check for any errors and use sensible naming. As for the idiomaticity, apart from what I've mentioned, there's nothing I'd mark as unusual or un-idomatic.

Find elsewhere

Google Bing Mojeek

reddit.com › r/c_programming › how to parse json in c ?

r/C_Programming on Reddit: How to parse JSON in C ?

August 14, 2021 -

So, I am planning to make the weather app which gives weather from API . I got the response (JSON) using cURL library but stuck on parsing that response into getting useful things like get weather report of user input places.

Top answer

1 of 6

Pretty much any library you decide to use should do the trick. For a fully-featured but still simple to use library I'd recommend cJSON , or if you need something super lightweight, jsmn

2 of 6

You might want to read the classic “ Parsing JSON is a Minefield 💣 ” and especially the results graph in section 4.

GitHub

github.com › Jacajack › mkjson

GitHub - Jacajack/mkjson: A simple, yet flexible, JSON encoder for C

A simple, yet flexible, JSON encoder for C. Contribute to Jacajack/mkjson development by creating an account on GitHub.

Starred by 42 users

Forked by 3 users

Languages C 94.3% | Makefile 5.7% | C 94.3% | Makefile 5.7%

GitHub

github.com › whyisitworking › C-Simple-JSON-Parser

GitHub - whyisitworking/C-Simple-JSON-Parser: Extremely lightweight, easy-to-use & blazing fast JSON parsing library written in pure C

An easy to use, very fast JSON parsing implementation written in pure C

Starred by 60 users

Forked by 17 users

Languages C 100.0% | C 100.0%

Obj-sys

obj-sys.com › docs › xbv30 › CCppUsersGuide › ch13s02.html

JSON C Decode Functions

JSON C decode functions handle the decoding of simple XSD types. Calls to these functions are assembled in the C source code generated by the XBinder compiler to decode complex XML schema-based messages.

GeeksforGeeks

geeksforgeeks.org › c language › cjson-json-file-write-read-modify-in-c

cJSON - JSON File Write/Read/Modify in C - GeeksforGeeks

April 28, 2025 - The cJSON library is written in C and has no external dependencies, making it easy to integrate into C programs. To write JSON data in C, we need to create a cJSON object and convert it to a JSON string using the cJSON library.

JSON

json.org › JSON_checker › utf8_decode.c

utf8_decode.c

The decoder is not reentrant, */ void utf8_decode_init(char p[], int length) { the_index = 0; the_input = p; the_length = length; the_char = 0; the_byte = 0; } /* Get the current byte offset. This is generally used in error reporting. */ int utf8_decode_at_byte() { return the_byte; } /* Get the current character offset.

DEV Community

dev.to › uponthesky › c-making-a-simple-json-parser-from-scratch-250g

[C++] Making a Simple JSON Parser from Scratch - DEV Community

July 30, 2023 - Now, let’s dive into the code. The main function ParseJson() has its logical flow as follows, which has only two steps. Read the fixture JSON file given the path of the file.

GitHub

github.com › json-c › json-c

GitHub - json-c/json-c: https://github.com/json-c/json-c is the official code repository for json-c. See the wiki for release tarballs for download. API docs at http://json-c.github.io/json-c/ · GitHub

JSON-C implements a reference counting object model that allows you to easily construct JSON objects in C, output them as JSON formatted strings and parse JSON formatted strings back into the C representation of JSON objects.

Starred by 3.2K users

Forked by 1.1K users

Languages C 87.0% | CMake 6.0% | Shell 3.3% | Meson 2.5%

Json-c

json-c.github.io › json-c › json-c-current-release › doc › html › index.html

json-c: json-c

Stack Overflow

stackoverflow.com › questions › 72184082 › how-to-encode-json-buffer-in-c

How to encode JSON buffer in C? - Stack Overflow

Top answer

1 of 2

What you have is reasonable, although an alternative might be some sort of result builder:

char buff[256] = { 0 }

jsonObjectOpen(buff);
jsonObjectInteger(buff,"minHour", minHour);
jsonObjectInteger(buff,"maxHour", maxHour);
jsonObjectClose(buff);

Basically each function is appending the necessary json elements to the buffer, and you'd need to implement functions for each data type (string, int, float), and of course, make sure you use the in the correct order.

I don't think this is more succinct, but if you are doing it more than a few times, especially for more complex structures, you might find it more readible and maintainable.

It's entirely possible there is an existing library that will help with this type of approach, also being mindful of ensuring that the buffer space isn't exceeded during the building process.

In other languages that have type detection, this is a lot easier, and I supposed you could always have a single function that takes a void pointer and a 'type' enum, but that could be more error prone for the sake of a marginally simpler API.

2 of 2

I might be good idea to separate JSON object building from the encoding.

One of the existing JSON C-library do it by the following way:

json_t *item = json_object();
json_object_set_new(item, "id", json_string("stat"));
json_object_set_new(item, "minHour", json_integer(minHour));
json_object_set_new(item, "maxHour", json_integer(maxHour));
...


// Dump to console
json_dumpf(item, stdout, JSON_INDENT(4) | JSON_SORT_KEYS);

// Dump to file 
json_dumpf(item, file, JSON_COMPACT);
    
// Free allocated resources
json_decref(item);

The separation give some benefits. For example, encode formatting can be selected in one place.
And the same object can be easily encoded several ways (as in the example).

GitHub

github.com › douglascrockford › JSON-c › blob › master › utf8_decode.c

JSON-c/utf8_decode.c at master · douglascrockford/JSON-c

/* utf8_decode.c */ · /* 2016-04-05 */ · /* Copyright (c) 2005 JSON.org · · Permission is hereby granted, free of charge, to any person obtaining a copy · of this software and associated documentation files (the "Software"), to deal · in the Software without restriction, including without limitation the rights ·

Author douglascrockford

Readthedocs

simba-os.readthedocs.io › en › latest › library-reference › encode › json.html

11.2. json — JSON encoding and decoding — Simba master documentation

int json_parse(struct json_t *self_p, const char *js_p, size_t len)¶ · Parse given JSON data string into and array of tokens, each describing a single JSON object. Return · Number of decoded tokens or negative error code. Parameters · self_p: JSON object.