gemini-docs/latest/content · Jun 26, 14:03 UTC

pages/document-processing.txt

TXT18.7 KB250 lines

route: /gemini-api/docs/document-processing
title: Document understanding
description: Learn how to use the Gemini API to process documents like PDFs

Note: This version of the page covers the Interactions API. You can use the toggle on this page to switch to the generateContent API version of this page.
Gemini models can process documents in PDF format, using native
vision to understand entire document contexts. This goes beyond
just text extraction, allowing Gemini to:
Analyze and interpret content, including text, images, diagrams,
charts, and tables, even in long documents up to 1000 pages.
Extract information into structured output formats.
Summarize and answer questions based on both the visual and textual elements
in a document.
Transcribe document content (e.g. to HTML), preserving layouts and
formatting, for use in downstream applications.
You can also pass non-PDF documents in the same way but Gemini will see them
as normal text which will eliminate context like charts or formatting.
Passing PDF data inline
You can pass PDF data inline in the request. This is best
suited for smaller documents or temporary processing where you don't need to
reference the file in subsequent requests. We recommend using the
Files API
for larger documents that you need to refer to in multi-turn interactions to
improve request latency and reduce bandwidth usage.
The following example shows you how to pass PDF data inline:
Python
from google import genai
import base64
client = genai.Client()
with open('path/to/document.pdf', 'rb') as f:
pdf_bytes = f.read()
interaction = client.interactions.create(
model="gemini-3.5-flash",
input=[
{
"type": "document",
"data": base64.b64encode(pdf_bytes).decode('utf-8'),
"mime_type": "application/pdf"
},
{"type": "text", "text": "Summarize this document"}
]
)
print(interaction.output_text)
JavaScript
import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";
const ai = new GoogleGenAI({});
async function main() {
const pdfData = fs.readFileSync("path/to/document.pdf", {
encoding: "base64"
});
const interaction = await ai.interactions.create({
model: "gemini-3.5-flash",
input: [
{ type: "text", text: "Summarize this document" },
{
type: "document",
data: pdfData,
mime_type: "application/pdf"
}
]
});
console.log(interaction.output_text);
}
main();
REST
PDF_PATH="path/to/document.pdf"
if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
B64FLAGS="--input"
else
B64FLAGS="-w0"
fi
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3.5-flash",
"input": [
{
"type": "document",
"data": "'$(base64 $B64FLAGS $PDF_PATH)'",
"mime_type": "application/pdf"
},
{"type": "text", "text": "Summarize this document"}
]
}'
You can also upload a local PDF file for processing:
Python
from google import genai
client = genai.Client()
uploaded_file = client.files.upload(file="file.pdf")
interaction = client.interactions.create(
model="gemini-3.5-flash",
input=[
{"type": "document", "uri": uploaded_file.uri, "mime_type": uploaded_file.mime_type},
{"type": "text", "text": "Summarize this document"}
]
)
print(interaction.output_text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function main() {
const uploadedFile = await ai.files.upload({
file: "file.pdf",
config: { mime_type: "application/pdf" }
});
const interaction = await ai.interactions.create({
model: "gemini-3.5-flash",
input: [
{ type: "text", text: "Summarize this document" },
{
type: "document",
uri: uploadedFile.uri,
mime_type: uploadedFile.mime_type
}
]
});
console.log(interaction.output_text);
}
main();
Uploading PDFs using the Files API
We recommend you use Files API for larger files or when you intend to reuse a
document across multiple requests. This improves request latency and reduces
bandwidth usage by decoupling the file upload from the model requests.
Note: The Files API is available at no cost in all regions where the Gemini API is
available. Uploaded files are stored for 48 hours.
Large PDFs from URLs
Use the File API to simplify uploading and processing large PDF files from URLs:
Python
from google import genai
import io
import httpx
client = genai.Client()
long_context_pdf_path = "https://arxiv.org/pdf/2312.11805"
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)
sample_doc = client.files.upload(
file=doc_io,
config=dict(
mime_type='application/pdf')
)
prompt = "Summarize this document"
interaction = client.interactions.create(
model="gemini-3.5-flash",
input=[
{"type": "document", "uri": sample_doc.uri, "mime_type": sample_doc.mime_type},
{"type": "text", "text": prompt}
]
)
print(interaction.output_text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function main() {
const pdfBuffer = await fetch("https://arxiv.org/pdf/2312.11805")
.then((response) => response.arrayBuffer());
const fileBlob = new Blob([pdfBuffer], { type: 'application/pdf' });
const file = await ai.files.upload({
file: fileBlob,
config: {
displayName: 'A17_FlightPlan.pdf',
},
});
let getFile = await ai.files.get({ name: file.name });
while (getFile.state === 'PROCESSING') {
getFile = await ai.files.get({ name: file.name });
console.log(`current file status: ${getFile.state}`);
console.log('File is still processing, retrying in 5 seconds');
await new Promise((resolve) => {
setTimeout(resolve, 5000);
});
}
if (file.state === 'FAILED') {
throw new Error('File processing failed.');
}
const interaction = await ai.interactions.create({
model: 'gemini-3.5-flash',
input: [
{ type: "document", uri: file.uri, mime_type: file.mime_type },
{ type: "text", text: "Summarize this document" }
],
});
console.log(interaction.output_text);
}
main();
REST
PDF_PATH="https://arxiv.org/pdf/2312.11805"
DISPLAY_NAME="Gemini_paper"
PROMPT="Summarize this document"
# Download the PDF from the provided URL
wget -O "${DISPLAY_NAME}.pdf" "${PDF_PATH}"
MIME_TYPE=$(file -b --mime-type "${DISPLAY_NAME}.pdf")
NUM_BYTES=$(wc -c < "${DISPLAY_NAME}.pdf")
echo "MIME_TYPE: ${MIME_TYPE}"
echo "NUM_BYTES: ${NUM_BYTES}"
tmp_header_file=upload-header.tmp
# Initial resumable request defining metadata.
# The upload url is in the response headers dump them to a file.
curl "https://generativelanguage.googleapis.com/upload/v1beta/files?key=${GEMINI_API_KEY}" \
-D upload-header.tmp \
-H "X-Goog-Upload-Protocol: resumable" \
-H "X-Goog-Upload-Command: start" \
-H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \
-H "Content-Type: application/json" \
-d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null
upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"
# Upload the actual bytes.
curl "${upload_url}" \
-H "Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Offset: 0" \
-H "X-Goog-Upload-Command: upload, finalize" \
--data-binary "@${DISPLAY_NAME}.pdf" 2> /dev/null > file_info.json
file_uri=$(jq -r ".file.uri" file_info.json)
echo "file_uri: ${file_uri}"
# Create payload JSON file for safety
cat << EOF > payload.json
{
"model": "gemini-3.5-flash",
"input": [
{"type": "text", "text": "${PROMPT}"},
{"type": "document", "uri": "${file_uri}", "mime_type": "application/pdf"}
]
}
EOF
# Now create an interaction using that file
curl "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d @payload.json 2> /dev/null > response.json
cat response.json
echo
jq ".steps[-1].content[0].text" response.json
# Clean up
rm "${DISPLAY_NAME}.pdf"
rm payload.json
Large PDFs stored locally
Python
from google import genai
import pathlib
client = genai.Client()
file_path = pathlib.Path('large_file.pdf')
sample_file = client.files.upload(
file=file_path,
)
interaction = client.interactions.create(
model="ge
…

All content/ files Changelog

gemini-docs/latest/content · Jun 26, 14:03 UTC

pages/document-processing.txt

TXT18.7 KB250 lines

route: /gemini-api/docs/document-processing
title: Document understanding
description: Learn how to use the Gemini API to process documents like PDFs

Note: This version of the page covers the Interactions API. You can use the toggle on this page to switch to the generateContent API version of this page.
Gemini models can process documents in PDF format, using native
vision to understand entire document contexts. This goes beyond
just text extraction, allowing Gemini to:
Analyze and interpret content, including text, images, diagrams,
charts, and tables, even in long documents up to 1000 pages.
Extract information into structured output formats.
Summarize and answer questions based on both the visual and textual elements
in a document.
Transcribe document content (e.g. to HTML), preserving layouts and
formatting, for use in downstream applications.
You can also pass non-PDF documents in the same way but Gemini will see them
as normal text which will eliminate context like charts or formatting.
Passing PDF data inline
You can pass PDF data inline in the request. This is best
suited for smaller documents or temporary processing where you don't need to
reference the file in subsequent requests. We recommend using the
Files API
for larger documents that you need to refer to in multi-turn interactions to
improve request latency and reduce bandwidth usage.
The following example shows you how to pass PDF data inline:
Python
from google import genai
import base64
client = genai.Client()
with open('path/to/document.pdf', 'rb') as f:
pdf_bytes = f.read()
interaction = client.interactions.create(
model="gemini-3.5-flash",
input=[
{
"type": "document",
"data": base64.b64encode(pdf_bytes).decode('utf-8'),
"mime_type": "application/pdf"
},
{"type": "text", "text": "Summarize this document"}
]
)
print(interaction.output_text)
JavaScript
import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";
const ai = new GoogleGenAI({});
async function main() {
const pdfData = fs.readFileSync("path/to/document.pdf", {
encoding: "base64"
});
const interaction = await ai.interactions.create({
model: "gemini-3.5-flash",
input: [
{ type: "text", text: "Summarize this document" },
{
type: "document",
data: pdfData,
mime_type: "application/pdf"
}
]
});
console.log(interaction.output_text);
}
main();
REST
PDF_PATH="path/to/document.pdf"
if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
B64FLAGS="--input"
else
B64FLAGS="-w0"
fi
curl -X POST "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "gemini-3.5-flash",
"input": [
{
"type": "document",
"data": "'$(base64 $B64FLAGS $PDF_PATH)'",
"mime_type": "application/pdf"
},
{"type": "text", "text": "Summarize this document"}
]
}'
You can also upload a local PDF file for processing:
Python
from google import genai
client = genai.Client()
uploaded_file = client.files.upload(file="file.pdf")
interaction = client.interactions.create(
model="gemini-3.5-flash",
input=[
{"type": "document", "uri": uploaded_file.uri, "mime_type": uploaded_file.mime_type},
{"type": "text", "text": "Summarize this document"}
]
)
print(interaction.output_text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function main() {
const uploadedFile = await ai.files.upload({
file: "file.pdf",
config: { mime_type: "application/pdf" }
});
const interaction = await ai.interactions.create({
model: "gemini-3.5-flash",
input: [
{ type: "text", text: "Summarize this document" },
{
type: "document",
uri: uploadedFile.uri,
mime_type: uploadedFile.mime_type
}
]
});
console.log(interaction.output_text);
}
main();
Uploading PDFs using the Files API
We recommend you use Files API for larger files or when you intend to reuse a
document across multiple requests. This improves request latency and reduces
bandwidth usage by decoupling the file upload from the model requests.
Note: The Files API is available at no cost in all regions where the Gemini API is
available. Uploaded files are stored for 48 hours.
Large PDFs from URLs
Use the File API to simplify uploading and processing large PDF files from URLs:
Python
from google import genai
import io
import httpx
client = genai.Client()
long_context_pdf_path = "https://arxiv.org/pdf/2312.11805"
doc_io = io.BytesIO(httpx.get(long_context_pdf_path).content)
sample_doc = client.files.upload(
file=doc_io,
config=dict(
mime_type='application/pdf')
)
prompt = "Summarize this document"
interaction = client.interactions.create(
model="gemini-3.5-flash",
input=[
{"type": "document", "uri": sample_doc.uri, "mime_type": sample_doc.mime_type},
{"type": "text", "text": prompt}
]
)
print(interaction.output_text)
JavaScript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
async function main() {
const pdfBuffer = await fetch("https://arxiv.org/pdf/2312.11805")
.then((response) => response.arrayBuffer());
const fileBlob = new Blob([pdfBuffer], { type: 'application/pdf' });
const file = await ai.files.upload({
file: fileBlob,
config: {
displayName: 'A17_FlightPlan.pdf',
},
});
let getFile = await ai.files.get({ name: file.name });
while (getFile.state === 'PROCESSING') {
getFile = await ai.files.get({ name: file.name });
console.log(`current file status: ${getFile.state}`);
console.log('File is still processing, retrying in 5 seconds');
await new Promise((resolve) => {
setTimeout(resolve, 5000);
});
}
if (file.state === 'FAILED') {
throw new Error('File processing failed.');
}
const interaction = await ai.interactions.create({
model: 'gemini-3.5-flash',
input: [
{ type: "document", uri: file.uri, mime_type: file.mime_type },
{ type: "text", text: "Summarize this document" }
],
});
console.log(interaction.output_text);
}
main();
REST
PDF_PATH="https://arxiv.org/pdf/2312.11805"
DISPLAY_NAME="Gemini_paper"
PROMPT="Summarize this document"
# Download the PDF from the provided URL
wget -O "${DISPLAY_NAME}.pdf" "${PDF_PATH}"
MIME_TYPE=$(file -b --mime-type "${DISPLAY_NAME}.pdf")
NUM_BYTES=$(wc -c < "${DISPLAY_NAME}.pdf")
echo "MIME_TYPE: ${MIME_TYPE}"
echo "NUM_BYTES: ${NUM_BYTES}"
tmp_header_file=upload-header.tmp
# Initial resumable request defining metadata.
# The upload url is in the response headers dump them to a file.
curl "https://generativelanguage.googleapis.com/upload/v1beta/files?key=${GEMINI_API_KEY}" \
-D upload-header.tmp \
-H "X-Goog-Upload-Protocol: resumable" \
-H "X-Goog-Upload-Command: start" \
-H "X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}" \
-H "Content-Type: application/json" \
-d "{'file': {'display_name': '${DISPLAY_NAME}'}}" 2> /dev/null
upload_url=$(grep -i "x-goog-upload-url: " "${tmp_header_file}" | cut -d" " -f2 | tr -d "\r")
rm "${tmp_header_file}"
# Upload the actual bytes.
curl "${upload_url}" \
-H "Content-Length: ${NUM_BYTES}" \
-H "X-Goog-Upload-Offset: 0" \
-H "X-Goog-Upload-Command: upload, finalize" \
--data-binary "@${DISPLAY_NAME}.pdf" 2> /dev/null > file_info.json
file_uri=$(jq -r ".file.uri" file_info.json)
echo "file_uri: ${file_uri}"
# Create payload JSON file for safety
cat << EOF > payload.json
{
"model": "gemini-3.5-flash",
"input": [
{"type": "text", "text": "${PROMPT}"},
{"type": "document", "uri": "${file_uri}", "mime_type": "application/pdf"}
]
}
EOF
# Now create an interaction using that file
curl "https://generativelanguage.googleapis.com/v1beta/interactions" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-X POST \
-d @payload.json 2> /dev/null > response.json
cat response.json
echo
jq ".steps[-1].content[0].text" response.json
# Clean up
rm "${DISPLAY_NAME}.pdf"
rm payload.json
Large PDFs stored locally
Python
from google import genai
import pathlib
client = genai.Client()
file_path = pathlib.Path('large_file.pdf')
sample_file = client.files.upload(
file=file_path,
)
interaction = client.interactions.create(
model="ge
…