After trying out a few options, I think the easier way to do it with large scale is to use elinks.

in ubuntu:

sudo apt-get install elinks
elinks -dump a.html > a.txt
Answer from formatjam on Stack Overflow
🌐
Jcbeck
jcbeck.com › features › tools › HTML2C.asp
Jason C Beck: Convert HTML into C# Response.Write
A personal web site with thousands of photos, a diverse discussion board, and a friendly community atmosphere.
🌐
CodingFleet
codingfleet.com › code-converter › html-css-js › c
HTML/CSS/JS to C Converter - CodingFleet
Convert your HTML/CSS/JS Code to C. This exceptional AI-powered tool converts your HTML/CSS/JS code into C code easily, eliminating the need for manual re-coding. Save your precious time and unlock cross-platform development like never before with our converter tool.
Discussions

Convert HTML to Plain Text using c++ - Stack Overflow
I am doing mail parsing application which required to convert the HTML file to Plain Text. regarding this i have found some scripts which does conversion. I want to do same thing in C++. So ple... More on stackoverflow.com
🌐 stackoverflow.com
How can I Convert HTML to Text in C#? - Stack Overflow
I'm looking for C# code to convert an HTML document to plain text. I'm not looking for simple tag stripping , but something that will output plain text with a reasonable preservation of the orig... More on stackoverflow.com
🌐 stackoverflow.com
In C, how can one convert HTML strings to C strings? - Stack Overflow
Is there a common routine or library available? e.g. ' has to become '. More on stackoverflow.com
🌐 stackoverflow.com
how to change this html code to c++ code - C++ Forum
Topic archived. No new replies allowed. Home page | Privacy policy © cplusplus.com, 2000-2026 - All rights reserved - v3.3.3 Spotted an error? More on cplusplus.com
🌐 cplusplus.com
People also ask

Can I convert HTML to C automatically?
Yes. DocuWriter.ai's AI-powered converter analyzes your HTML code and generates equivalent C code. It handles syntax differences (Markup (SGML-based) to C-style), type system changes (Untyped to Static, weak), and common library mappings. Always review the output — automated conversion is a starting point, not a finished product.
🌐
docuwriter.ai
docuwriter.ai › html-to-c-code-converter
html to c - Code Converter
Should I rewrite from scratch or use a converter for HTML to C?
For small to medium codebases, an AI converter saves significant time by handling routine syntax translation. For large projects, use the converter as a starting point and then refactor to leverage C-specific patterns. HTML (declarative, markup) and C (procedural) have different paradigms, so some architectural changes may be needed.
🌐
docuwriter.ai
docuwriter.ai › html-to-c-code-converter
html to c - Code Converter
What HTML features don't have a direct C equivalent?
Some HTML features like semantic elements (header, nav, main) may not map directly to C. The converter handles these by finding the closest C equivalent or generating helper code. Be aware that self-closing tags vary by html version — this can affect conversion results.
🌐
docuwriter.ai
docuwriter.ai › html-to-c-code-converter
html to c - Code Converter
🌐
STMicroelectronics Community
community.st.com › t5 › stm32-mcus-embedded-software › http-server-convert-html-file-to-c-data-array-for-in-memory › td-p › 188494
HTTP server - convert html file to C data array for in memory storage ? Does anybody have a good example.
March 29, 2021 - Inside stm32 repository the folder C:\Users\user\STM32Cube\Repository\STM32Cube_FW_F4_V1.26.0\Middlewares\Third_Party\LwIP\src\apps\http\makefsdata contains an html to c converter in c source code with a lot of options!. It can convert whole sites with subdirectories, generate HTML1.0 or 1.1 headers , and more.
🌐
Docuwriter
docuwriter.ai › html-to-c-code-converter
html to c - Code Converter
Select C as the target language and click Convert to start the AI translation. ... Review the generated code, make any adjustments, and copy it to your project. ... Yes. DocuWriter.ai's AI-powered converter analyzes your HTML code and generates equivalent C code.
🌐
Code Converter
codeconverter.fr › code converter › code conversion
HTML to C - Code Converter
Converts a program from one language to another while preserving logic. Keeps structure, names, and inputs/outputs when possible. Warns when the code does not match the selected language. Fast conversion for snippets or projects. Structured output ready to compile. Multi-language support with format checks...
Find elsewhere
Top answer
1 of 16
57

Just a note about the HtmlAgilityPack for posterity. The project contains an example of parsing text to html, which, as noted by the OP, does not handle whitespace at all like anyone writing HTML would envisage. There are full-text rendering solutions out there, noted by others to this question, which this is not (it cannot even handle tables in its current form), but it is lightweight and fast, which is all I wanted for creating a simple text version of HTML emails.

using System.IO;
using System.Text.RegularExpressions;
using HtmlAgilityPack;

//small but important modification to class https://github.com/zzzprojects/html-agility-pack/blob/master/src/Samples/Html2Txt/HtmlConvert.cs
public static class HtmlToText
{

    public static string Convert(string path)
    {
        HtmlDocument doc = new HtmlDocument();
        doc.Load(path);
        return ConvertDoc(doc);
    }

    public static string ConvertHtml(string html)
    {
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(html);
        return ConvertDoc(doc);
    }

    public static string ConvertDoc (HtmlDocument doc)
    {
        using (StringWriter sw = new StringWriter())
        {
            ConvertTo(doc.DocumentNode, sw);
            sw.Flush();
            return sw.ToString();
        }
    }

    internal static void ConvertContentTo(HtmlNode node, TextWriter outText, PreceedingDomTextInfo textInfo)
    {
        foreach (HtmlNode subnode in node.ChildNodes)
        {
            ConvertTo(subnode, outText, textInfo);
        }
    }
    public static void ConvertTo(HtmlNode node, TextWriter outText)
    {
        ConvertTo(node, outText, new PreceedingDomTextInfo(false));
    }
    internal static void ConvertTo(HtmlNode node, TextWriter outText, PreceedingDomTextInfo textInfo)
    {
        string html;
        switch (node.NodeType)
        {
            case HtmlNodeType.Comment:
                // don't output comments
                break;
            case HtmlNodeType.Document:
                ConvertContentTo(node, outText, textInfo);
                break;
            case HtmlNodeType.Text:
                // script and style must not be output
                string parentName = node.ParentNode.Name;
                if ((parentName == "script") || (parentName == "style"))
                {
                    break;
                }
                // get text
                html = ((HtmlTextNode)node).Text;
                // is it in fact a special closing node output as text?
                if (HtmlNode.IsOverlappedClosingElement(html))
                {
                    break;
                }
                // check the text is meaningful and not a bunch of whitespaces
                if (html.Length == 0)
                {
                    break;
                }
                if (!textInfo.WritePrecedingWhiteSpace || textInfo.LastCharWasSpace)
                {
                    html= html.TrimStart();
                    if (html.Length == 0) { break; }
                    textInfo.IsFirstTextOfDocWritten.Value = textInfo.WritePrecedingWhiteSpace = true;
                }
                outText.Write(HtmlEntity.DeEntitize(Regex.Replace(html.TrimEnd(), @"\s{2,}", " ")));
                if (textInfo.LastCharWasSpace = char.IsWhiteSpace(html[html.Length - 1]))
                {
                    outText.Write(' ');
                }
                    break;
            case HtmlNodeType.Element:
                string endElementString = null;
                bool isInline;
                bool skip = false;
                int listIndex = 0;
                switch (node.Name)
                {
                    case "nav":
                        skip = true;
                        isInline = false;
                        break;
                    case "body":
                    case "section":
                    case "article":
                    case "aside":
                    case "h1":
                    case "h2":
                    case "header":
                    case "footer":
                    case "address":
                    case "main":
                    case "div":
                    case "p": // stylistic - adjust as you tend to use
                        if (textInfo.IsFirstTextOfDocWritten)
                        {
                            outText.Write("\r\n");
                        }
                        endElementString = "\r\n";
                        isInline = false;
                        break;
                    case "br":
                        outText.Write("\r\n");
                        skip = true;
                        textInfo.WritePrecedingWhiteSpace = false;
                        isInline = true;
                        break;
                    case "a":
                        if (node.Attributes.Contains("href"))
                        {
                            string href = node.Attributes["href"].Value.Trim();
                            if (node.InnerText.IndexOf(href, StringComparison.InvariantCultureIgnoreCase)==-1)
                            {
                                endElementString =  "<" + href + ">";
                            }  
                        }
                        isInline = true;
                        break;
                    case "li": 
                        if(textInfo.ListIndex>0)
                        {
                            outText.Write("\r\n{0}.\t", textInfo.ListIndex++); 
                        }
                        else
                        {
                            outText.Write("\r\n*\t"); //using '*' as bullet char, with tab after, but whatever you want eg "\t->", if utf-8 0x2022
                        }
                        isInline = false;
                        break;
                    case "ol": 
                        listIndex = 1;
                        goto case "ul";
                    case "ul": //not handling nested lists any differently at this stage - that is getting close to rendering problems
                        endElementString = "\r\n";
                        isInline = false;
                        break;
                    case "img": //inline-block in reality
                        if (node.Attributes.Contains("alt"))
                        {
                            outText.Write('[' + node.Attributes["alt"].Value);
                            endElementString = "]";
                        }
                        if (node.Attributes.Contains("src"))
                        {
                            outText.Write('<' + node.Attributes["src"].Value + '>');
                        }
                        isInline = true;
                        break;
                    default:
                        isInline = true;
                        break;
                }
                if (!skip && node.HasChildNodes)
                {
                    ConvertContentTo(node, outText, isInline ? textInfo : new PreceedingDomTextInfo(textInfo.IsFirstTextOfDocWritten){ ListIndex = listIndex });
                }
                if (endElementString != null)
                {
                    outText.Write(endElementString);
                }
                break;
        }
    }
}
internal class PreceedingDomTextInfo
{
    public PreceedingDomTextInfo(BoolWrapper isFirstTextOfDocWritten)
    {
        IsFirstTextOfDocWritten = isFirstTextOfDocWritten;
    }
    public bool WritePrecedingWhiteSpace {get;set;}
    public bool LastCharWasSpace { get; set; }
    public readonly BoolWrapper IsFirstTextOfDocWritten;
    public int ListIndex { get; set; }
}
internal class BoolWrapper
{
    public BoolWrapper() { }
    public bool Value { get; set; }
    public static implicit operator bool(BoolWrapper boolWrapper)
    {
        return boolWrapper.Value;
    }
    public static implicit operator BoolWrapper(bool boolWrapper)
    {
        return new BoolWrapper{ Value = boolWrapper };
    }
}

As an example, the following HTML code...

<!DOCTYPE HTML>
<html>
    <head>
    </head>
    <body>
        <header>
            Whatever Inc.
        </header>
        <main>
            <p>
                Thanks for your enquiry. As this is the 1<sup>st</sup> time you have contacted us, we would like to clarify a few things:
            </p>
            <ol>
                <li>
                    Please confirm this is your email by replying.
                </li>
                <li>
                    Then perform this step.
                </li>
            </ol>
            <p>
                Please solve this <img alt="complex equation" src="http://upload.wikimedia.org/wikipedia/commons/8/8d/First_Equation_Ever.png"/>. Then, in any order, could you please:
            </p>
            <ul>
                <li>
                    a point.
                </li>
                <li>
                    another point, with a <a href="http://en.wikipedia.org/wiki/Hyperlink">hyperlink</a>.
                </li>
            </ul>
            <p>
                Sincerely,
            </p>
            <p>
                The whatever.com team
            </p>
        </main>
        <footer>
            Ph: 000 000 000<br/>
            mail: whatever st
        </footer>
    </body>
</html>

...will be transformed into:

Whatever Inc. 


Thanks for your enquiry. As this is the 1st time you have contacted us, we would like to clarify a few things: 

1.  Please confirm this is your email by replying. 
2.  Then perform this step. 

Please solve this [complex equation<http://upload.wikimedia.org/wikipedia/commons/8/8d/First_Equation_Ever.png>]. Then, in any order, could you please: 

*   a point. 
*   another point, with a hyperlink<http://en.wikipedia.org/wiki/Hyperlink>. 

Sincerely, 

The whatever.com team 


Ph: 000 000 000
mail: whatever st 

...as opposed to:

        Whatever Inc.


            Thanks for your enquiry. As this is the 1st time you have contacted us, we would like to clarify a few things:

                Please confirm this is your email by replying.

                Then perform this step.


            Please solve this . Then, in any order, could you please:

                a point.

                another point, with a hyperlink.


            Sincerely,


            The whatever.com team

        Ph: 000 000 000
        mail: whatever st
2 of 16
39

You could use this:

 public static string StripHTML(string HTMLText, bool decode = true)
        {
            Regex reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
            var stripped = reg.Replace(HTMLText, "");
            return decode ? HttpUtility.HtmlDecode(stripped) : stripped;
        }

Updated

Thanks for the comments I have updated to improve this function

Top answer
1 of 3
1

This isn't particularly hard, assuming you only care about &#xx; style entities. The bare-bones, let-everyone-else-worry-about-the-memory-management, mechanical, what's-a-regex way:

int hex_to_value(char hex) {
    if (hex >= '0' && hex <= '9') { return hex - '0'; }
    if (hex >= 'A' && hex <= 'F') { return hex - 'A' + 10; }
    if (hex >= 'a' && hex <= 'f') { return hex - 'f' + 10; }
    return -1;
}

void unescape(char* dst, const char* src) {
    // Write the translated version of the text at 'src', to 'dst'.
    // All sequences of '&#xx;', where x is a hex digit, are replaced
    // with the corresponding single byte.
    enum { NONE, AND, AND_HASH, AND_HASH_EX, AND_HASH_EX_EX } mode;
    char first_hex, second_hex, translated;
    mode m = NONE;
    while (*src) {
        char c = *src++;
        switch (m) {
            case NONE:
            if (c == '&') { m = AND; }
            else { *dst++ = c; m = NONE; }
            break;

            case AND:
            if (c == '#') { m = AND_HASH; }
            else { *dst++ = '&'; *dst++ = c; m = NONE; }
            break;

            case AND_HASH:
            translated = hex_to_value(c);
            if (translated != -1) { first_hex = c; m = AND_HASH_EX; }
            else { *dst++ = '&'; *dst++ = '#'; *dst++ = c; m = NONE; }
            break;

            case AND_HASH_EX:
            translated = hex_to_value(c);
            if (translated != -1) {
                second_hex = c;
                translated = hex_to_value(first_hex) << 4 | translated;
                m = AND_HASH_EX_EX;
            } else {
                *dst++ = '&'; *dst++ = '#'; *dst++ = first_hex; *dst++ = c;
                m = NONE;
            }
            break;

            case AND_HASH_EX_EX:
            if (c == ';') { *dst++ = translated; }
            else { 
                *dst++ = '&'; *dst++ = '#';
                *dst++ = first_hex; *dst++ = second_hex; *dst++ = c;
            }
            m = NONE;
            break;
        }
    }
}

Tedious, and way more code than seems reasonable, but not hard :)

2 of 3
1

I'd try to parse the number out from the string and then convert it to a number using atoi and then cast it to a character.

This is something I wrote in ~20 seconds so it's completely contrived:

  char html[] = "&#39;";
  char* pch = &html[2];
  int n = 0;
  char c = 0;

  pch[2] = '\0';
  n = atoi(pch);
  c = n;

now c is '. Also I don't really know about html strings... so I might be missing something

🌐
W3Schools
w3schools.com › howto › howto_js_temperature_converter.asp
How To Create a Temperature Converter With HTML ...
The table below shows how to convert from Celsius to other temperature measurements: The table below shows how to convert from Kelvin to other temperature measurements: ... If you want to use W3Schools services as an educational institution, team or enterprise, send us an e-mail: sales@w3schools.com · If you want to report an error, or if you want to make a suggestion, send us an e-mail: help@w3schools.com · HTML ...
🌐
SEGGER
segger.com › segger - the embedded experts › free utilities › bin2c
Bin2C—Binary to C Converter
Bin2C is a Windows command-line utility which takes HTML or text files as input, converting it to a C-array that can be included in target application code.
🌐
Web-code-converter
web-code-converter.com
Web-code-converter
Quickly convert HTML, CSS and Javascript into Javascript, Typescript, React, PHP, VBScript, ASP, Perl, Python, Ruby, Lisp and more with the Web Code Converter. Take the hard work out of coding HTML to Javascript. Web designers love the Web Code Converter.
🌐
Sololearn
sololearn.com › en › Discuss › 1551115 › how-to-combine-html-and-c-language
How to combine html and c language?? | Sololearn: Learn to code for FREE!
Maybe this is what you are looking for C: http://www.i-visionblog.com/2014/02/creating-website-using-c-programming.html C++: https://blog.sourcerer.io/building-a-website-with-c-db942c801aee But you cannot create html page like you are used to and just connect C/C++ in the same way as javascript.
🌐
Cplusplus
cplusplus.com › forum › windows › 152782
how to change this html code to c++ code - C++ Forum
However I want to emphasis that if you are using another library to display the image that this suggestion is probably · NOT the correct solution. In the case that you don't know what I'm talking about yet, start here: http://msdn.microsoft.com/en-us/library/windows/desktop/aa380599(v=vs.85).aspx A lot of people insist that you should use a wizard to draw up your resource files and they aren't wrong.
🌐
Cprogramming
cboard.cprogramming.com › c-programming › 94727-how-we-can-convert-html-text.html
how we can convert html to text
October 18, 2007 - > plz tell me the function Please look through string.h yourself and read the manual pages. If you at least familiarise yourself with what the standard C library is capable of (no one expects you to remember every last detail), then you won't need so much spoon-feeding in future.
🌐
Aspose
products.aspose.com › aspose.words › c++ › conversion › html to doc in c++
Convert HTML to DOC in C++
This is a professional software solution to import and export HTML, DOC, and many other document formats using C++. ... For C++ developers seeking a seamless solution to convert HTML to DOC, Aspose.Words for C++ provides an intuitive and straightforward file conversion API.
🌐
Tomeko
tomeko.net › online_tools › cpp_text_escape.php
Text -> C/C++ string converter
Converting text into C-like literal, escaping newlines, tab, double quotes, backslash.
🌐
Hermetic
hermetic.ch › c2html › c2html.htm
C Code to HTML Converter
If you wish to compare corresponding lines in the C/C++ file and the HTML file then check the Confirm each line checkbox and click again on the Display HTML file button. Each line of the C/C++ file which has been changed will then be displayed, as in: You can test the trial version with any C or C++ source code file, but it will convert ...
🌐
CodingFleet
codingfleet.com › code-converter › c++
Convert Your Code to C++ - CodingFleet
C++ Code Converter - this online AI-powered tool can convert any code to C++. Enjoy seamless conversions and unlock cross-platform development like never before. Expand to Full Screen · Code Assistant · ABAP · APL · Access VBA · Ada · Arduino · Assembly · Bash · Batch · C · C# C++ COBOL · Clojure · CommonLisp · Crystal · Dart · Elixir · Elm · Erlang · F# Fortran · GameMaker · Go · Groovy · HTML/CSS/JS ·
🌐
Reddit
reddit.com › r/c_programming › generate html in c
r/C_Programming on Reddit: Generate HTML in C
February 25, 2023 -

I was trying to find a way, both elegant and simple, to generate html pages in C when I finally came up with this solution, using open_memstream, curly braces and some macros...

EDIT: updated with Eternal_Weeb's comment.

#include <stdio.h>
#include <stdlib.h>

#include "html_tags.h"

typedef struct {
  char *user_name;
  int task_count;
  char **tasks;
} user_tasks;

void user_tasks_html(FILE *fp, user_tasks *data) {
  {
    DOCTYPE;
    HTML("en") {
      HEAD() {
        META("charset='utf-8'");
        META("name='viewport' "
             "content='width=device-width, initial-scale=1'");
        TITLE("Index page");
        META("name='description' content='Description'");
        META("name='author' content='Author'");
        META("property='og:title' content='Title'");
        LINK("rel='icon' href='/favicon.svg' type='image/svg+xml'");
        LINK("rel='stylesheet' href='css/styles.css'");
      }
      BODY("") {
        DIV("id='main'") {
          H1("id='title'") { _("Hello %s", data->user_name); }
          if (data->task_count > 0) {
            UL("class='default'") {
              for (int i = 0; i < data->task_count; i++) {
                LI("class='default'") {
                  _("Task %d: %s", i + 1, data->tasks[i]);
                }
              }
            }
          }
        }
      }
      SCRIPT("js/main.js");
    }
  }
}

int main(void) {
  user_tasks data;
  {
    data.user_name = "John";
    data.task_count = 3;
    data.tasks = calloc(data.task_count, sizeof(char *));
    {
      data.tasks[0] = "Feed the cat";
      data.tasks[1] = "Clean the room";
      data.tasks[2] = "Go to the gym";
    }
  }
  char *html;
  size_t html_size;
  FILE *fp;
  fp = open_memstream(&html, &html_size);
  if (fp == NULL) {
    return 1;
  }
  user_tasks_html(fp, &data);
  fclose(fp);
  printf("%s\n", html);
  printf("%lu bytes\n", html_size);
  free(html);
  free(data.tasks);
  return 0;
}

html_tags.h:

#ifndef HTML_TAGS_H_
#define HTML_TAGS_H_

#define SCOPE(atStart, atEnd) for (int _scope_break = ((atStart), 1); _scope_break; _scope_break = ((atEnd), 0))

#define DOCTYPE fputs("<!DOCTYPE html>", fp)
#define HTML(lang) SCOPE(fprintf(fp, "<html lang='%s'>", lang), fputs("</html>", fp))
#define HEAD() SCOPE(fputs("<head>", fp), fputs("</head>",fp))
#define TITLE(text) fprintf(fp, "<title>%s</title>", text)
#define META(attributes) fprintf(fp, "<meta %s>", attributes)
#define LINK(attributes) fprintf(fp, "<link %s>", attributes)
#define SCRIPT(src) fprintf(fp, "<script src='%s'></script>", src)
#define BODY(attributes) SCOPE(fprintf(fp, "<body %s>", attributes), fputs("</body>", fp))
#define DIV(attributes) SCOPE(fprintf(fp, "<div %s>", attributes), fputs("</div>", fp))
#define UL(attributes) SCOPE(fprintf(fp, "<ul %s>", attributes), fputs("</ul>", fp))
#define OL(attributes) SCOPE(fprintf(fp, "<ol %s>", attributes), fputs("</ol>", fp))
#define LI(attributes) SCOPE(fprintf(fp, "<li %s>", attributes), fputs("</li>", fp))
#define BR fputs("<br>", fp)
#define _(...) fprintf(fp, __VA_ARGS__)
#define H1(attributes) SCOPE(fprintf(fp, "<h1 %s>", attributes), fputs("</h1>", fp))
#define H2(attributes) SCOPE(fprintf(fp, "<h2 %s>", attributes), fputs("</h2>", fp))
#define H3(attributes) SCOPE(fprintf(fp, "<h3 %s>", attributes), fputs("</h3>", fp))
#define H4(attributes) SCOPE(fprintf(fp, "<h4 %s>", attributes), fputs("</h4>", fp))
#define H5(attributes) SCOPE(fprintf(fp, "<h5 %s>", attributes), fputs("</h5>", fp))
#define H6(attributes) SCOPE(fprintf(fp, "<h6 %s>", attributes), fputs("</h6>", fp))
#define P(content) fprintf(fp, "<p>%s</p>", content)
#define A(href, content) fprintf(fp, "<a href='%s'>%s</a>", href, content)
#define IMG(attributes) fprintf(fp, "<img %s>", attributes)
#define HR fputs("<hr/>", fp)
#define TABLE(attributes) SCOPE(fprintf(fp, "<table %s>", attributes), fputs("</table>", fp)
#define TR(attributes) SCOPE(fprintf(fp, "<tr %s>", attributes), fputs("</tr>", fp))
#define TD(attributes) SCOPE(fprintf(fp, "<td %s>", attributes), fputs("</td>", fp))
#define TH(attributes) SCOPE(fprintf(fp, "<th %s>", attributes), fputs("</th>", fp))
#define FORM(attributes) SCOPE(fprintf(fp, "<form %s>", attributes), fputs("</form>", fp))
#define INPUT(attributes) fprintf(fp, "<input %s>", attributes)
#define OPTION(attributes, content) fprintf(fp, "<option %s>%s</option>", attributes, content)

#endif