This might be a possibility - I did not test it though - but it might work: It utilizes chafa a program you can use to look at pictures inside your terminal… [Screenshot from 2024-09-22 05-59-14] I have no idea if GPT4 can read that - but it might be worth a shot. import os import subprocess im… Answer from jochenschultz on community.openai.com
🌐
OpenAI
platform.openai.com › docs › guides › images-vision
Images and vision | OpenAI API
The OpenAI API offers several endpoints to process images as input or generate them as output, enabling you to build powerful multimodal applications. To learn more about the input and output modalities supported by our models, refer to our ...
🌐
OpenAI Developer Community
community.openai.com › api
Image to text description in the API? - API - OpenAI Developer Community
November 7, 2023 - Hi I’ve been using some other image to text models out there. I have been really amazed by the image description feature of chatgpt. I understood in yesterday’s keynote that the feature would finally be available in t…
Discussions

How to extract text from images using API?
Hello, I am a beginner in OpenAI. I am creating a project, where I want to be able to extract data from invoices as images. Now, I am stuck at extracting text from a photo. In documentation for Vision, I see that the model used is 4o-mini, and the photo was uploaded as a base64. More on community.openai.com
🌐 community.openai.com
0
January 27, 2025
How to Programmatically Extract Text from Images Using GPT-4
Hi All, I am trying to read a list of images from my local directory and want to extract the text from those images using GPT-4 in a Python script. However, I found that there is no direct endpoint for image input. Although I can upload images in the chat using GPT-4, My question is: how can ... More on community.openai.com
🌐 community.openai.com
1
0
September 22, 2024
OCR using API for text extraction
I have a PDF. It’s all images. I want to use ChatGPT’s API to perform OCR on it and extract the text in to a .json file and a .txt file. Is the OCR only for the customers who have take the subscription ? What model should I use? If anyone knows any blogs, youtube videos or github repos ... More on community.openai.com
🌐 community.openai.com
3
August 4, 2024
API playground - image generation
I am using the playground to test features. I can easily use the chat or assistant part, but when I ask for image generation, I get the answer that it can not generate images. (I tried different models including ChatGPT-4o) So I understand that , if I need to create an image using the API I ... More on community.openai.com
🌐 community.openai.com
1
0
May 31, 2024
🌐
OpenAI
platform.openai.com › docs › guides › image-generation
Image generation | OpenAI API
1 week ago - Learn how to generate or edit images. ... The OpenAI API lets you generate and edit images from text prompts, using GPT Image or DALL·E models.
🌐
Microsoft Learn
learn.microsoft.com › en-us › azure › ai-foundry › openai › dall-e-quickstart
Quickstart: Generate images with Azure OpenAI in Microsoft Foundry Models - Azure OpenAI | Microsoft Learn
September 30, 2025 - Enter your endpoint URL and key in the appropriate fields. Change the value of prompt to your preferred text. # Azure OpenAI metadata variables $openai = @{ api_base = $Env:AZURE_OPENAI_ENDPOINT api_version = '2023-06-01-preview' # This can change in the future.
🌐
OpenAI Developer Community
community.openai.com › api
How to extract text from images using API? - API - OpenAI Developer Community
January 27, 2025 - Hello, I am a beginner in OpenAI. I am creating a project, where I want to be able to extract data from invoices as images. Now, I am stuck at extracting text from a photo. In documentation for Vision, I see that the mod…
🌐
OpenAI Developer Community
community.openai.com › api
OCR using API for text extraction - API - OpenAI Developer Community
August 4, 2024 - I have a PDF. It’s all images. I want to use ChatGPT’s API to perform OCR on it and extract the text in to a .json file and a .txt file. Is the OCR only for the customers who have take the subscription ? What model shoul…
Find elsewhere
🌐
Medium
medium.com › @tejaswi_kashyap › from-image-to-data-automating-text-extraction-with-openai-api-83de9be585c7
From Image to Data: Automating Text Extraction with OpenAI Api | by Tejaswi kashyap | Medium
October 5, 2024 - In this project, we use OpenAI’s ... The idea is simple: feed in an image, extract its text, and classify the content according to specific questions, like identifying active ingredients or usage warnings....
🌐
OpenAI
platform.openai.com › docs › guides › text
Text generation | OpenAI API
The substitution values can either be strings, or other Response input message types like input_image or input_file. See the full API reference. ... 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 import OpenAI from "openai"; const client = new OpenAI(); const response = await client.responses.create({ model: "gpt-5", prompt: { id: "pmpt_abc123", version: "2", variables: { customer_name: "Jane Doe", product: "40oz juice box" } } }); console.log(response.output_text);
🌐
OpenAI
openai.com › index › image-generation-api
Introducing our latest image generation model in the API | OpenAI
Today, we’re bringing the natively multimodal model that powers this experience in ChatGPT to the API via gpt-image-1, enabling developers and businesses to easily integrate high-quality, professional-grade image generation directly into their own tools and platforms. The model’s versatility allows it to create images across diverse styles, faithfully follow custom guidelines, leverage world knowledge, and accurately render text—unlocking countless practical applications across multiple domains.
🌐
OpenAI
platform.openai.com › docs › guides › images
OpenAI API Documentation - DALL-E Image Generation
Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.
🌐
OpenAI Developer Community
community.openai.com › api
OpenAI API for image text extraction - API - OpenAI Developer Community
November 13, 2023 - Hi guys, Can I use current OpenAI API to upload jpeg or PDF file and extract contextual data in JSON format. In our case we have scanned purchase bills which need to be parsed into our local database.
🌐
OpenAI Developer Community
community.openai.com › api
Image to text description - API - OpenAI Developer Community
July 19, 2023 - im trying to incorporate image detection into my discord bot . i want it to function how it does in the gpt-4 browser page. you give it a image and it tells you what the image is . here is my current gpt-4 discord bot , very simply , how do incorporate the users being able to give the bot a image to describe? import os import discord import openai from discord.ext import commands from dotenv import load_dotenv load_dotenv() TOKEN = os.getenv('DISCORD_BOT_TOKEN') intents = discord.Intents.d...
🌐
Reddit
reddit.com › r/openai › best text to image model with api at the moment?
r/OpenAI on Reddit: Best text to image model with API at the moment?
May 31, 2025 -

I just would need good quality blog style images, but all models I've tested seem to have issues adding letters, numbers, symbols incorrectly very often.

Is there any image model which handles these without issues? I'm currently using Flux, and even it's quite good, it can't be automated due to these quality issues.

🌐
OpenAI
openai.com › index › introducing-4o-image-generation
Introducing 4o Image Generation | OpenAI
From logos to diagrams, images can convey precise meaning when augmented with symbols that refer to shared language and experience. GPT‑4o image generation excels at accurately rendering text, precisely following prompts, and leveraging 4o’s inherent knowledge base and chat context—including transforming uploaded images or using them as visual inspiration.
🌐
OpenAI
openai.com › index › dall-e-2
DALL·E 2 | OpenAI
DALL·E 2 is an AI system that can create realistic images and art from a description in natural language.
🌐
Reddit
reddit.com › r/openai › how to use the new image generation ai via api?
How to use the new image generation AI via API? : r/OpenAI
March 27, 2025 - I wonder how much they are going to charge for each image. ... Parent commenter can delete this message to hide from others. ... I'm getting real tired of OpenAI moving new features and functions out to ChatGPT instead of people who are paying for what we use. ... this tbh, we've had to resort to APIs like piapi that seem to have figured a wrapper around chat gpt's API but their API is pretty unstable, sometimes consuming quota without generating images for our apps, but they do disclaim that the API is unstable..
🌐
Reddit
reddit.com › r/openai › ow to send image and text together to gpt-4 api for ocr and code diffing?
r/OpenAI on Reddit: ow to Send Image and Text Together to GPT-4 API for OCR and Code Diffing?
November 12, 2024 -

I'm working on a Python project where I need to perform Optical Character Recognition (OCR) on screenshots and then compare the extracted text with a source code file to identify any differences. I'm leveraging OpenAI's GPT-4 model via the ChatCompletion API to handle both tasks.

Here's what I'm trying to achieve:

  1. Capture a Screenshot: Use pyautogui to capture a specific region of the screen and save it as an image file.

  2. Encode the Image: Convert the captured image to a base64-encoded string to send it via the API.

  3. Prepare and Send the Prompt: Send both the encoded image data and the source code text to GPT-4 in a single prompt, instructing it to perform OCR on the image and then compare the extracted text with the source code to identify any differences.

Here's a snippet of my current implementation:

pythonCopy codeimport openai
import base64
from PIL import Image
import io

# Set your OpenAI API key
openai.api_key = 'YOUR_API_KEY'

def capture_and_compare(image_path, source_text):
    # Read and encode the image
    with open(image_path, 'rb') as image_file:
        image_bytes = image_file.read()
    encoded_image = base64.b64encode(image_bytes).decode('utf-8')
    
    # Prepare the prompt with image data and source code
    messages = [
        {
            "role": "system",
            "content": "You are an AI assistant that performs OCR on images and compares the extracted text with provided source code."
        },
        {
            "role": "user",
            "content": (
                f"Task: Perform OCR on the following image and compare the extracted text with the provided source code to identify differences.\n\n"
                f"Source Code:\n```python\n{source_text}\n```\n\n"
                f"Image Data (base64): {encoded_image}\n\n"
                f"Please extract the text from the image and provide a line-by-line diff with the source code."
            )
        }
    ]
    
    # Make the API call to GPT-4
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=messages,
        temperature=0
    )
    
    # Extract and return the response content using attribute access
    return response.choices[0].message.content.strip()

# Example source code for comparison
source_code = """
def add(a, b):
    return a + b

def subtract(a, b):
    return a - b
"""

# Perform OCR and comparison on a sample image
diff_result = capture_and_compare('screenshot.png', source_code)
print(diff_result)

Issue I'm Facing:

When I run this script, instead of receiving actual OCR results and a meaningful diff between the image and the source code, GPT-4 responds by simulating the OCR process without performing it. Here's an example of the response I receive:

vbnetCopy code---- Top Half Differences ----
To perform the task, I will first extract the text from the provided image data using Optical Character Recognition (OCR). Then, I will compare the extracted text with the provided source code line by line to identify any differences. Let's proceed with the OCR and comparison:

### OCR Text Extraction
(Note: The OCR process is simulated here as I cannot directly process images. The extracted text is assumed to be similar to the source code with potential OCR errors.)

Question:

What is the correct method to send both image data and text in the same prompt to GPT-4 via the OpenAI API to perform OCR and then a code diff?
I'm aiming to have GPT-4 extract text from an image and compare it with a given source code file to identify any discrepancies. Is there a specific format, additional parameters, or a different approach I should use to achieve this effectively? My current method doesn't seem to enable GPT-4 to perform actual OCR on the image data provided.

Accuracy is very important