Key Capabilities
OCR Processing
Extract text from scanned documents and images.
Multiple Formats
Works with PDF, PNG, JPEG, and JPG files.
Page-by-Page Output
For multi-page documents, get text from each page separately.
Ready for AI
Output flows directly to AI tasks for analysis.
When to Use It
Use Opus Text Extraction when you need to convert documents or images into text your workflow can process:- Pull text from scanned invoices or receipts
- Extract text from screenshots or photos
- Convert PDFs into text for AI analysis
- Digitize paper documents for processing
Opus Text Extraction outputs raw text. To extract specific fields (like amounts from invoices), follow it with an Opus Agent task to parse and structure the text.
Supported File Types
| Format | Extensions |
|---|---|
.pdf | |
| PNG | .png |
| JPEG | .jpeg, .jpg |
Input
The File input accepts the document to extract text from. Only File (single) type is supported; multiple files are not currently supported.Synchronous Extraction
The Synchronous Extraction toggle (off by default) controls how the extraction is processed:- Off (Asynchronous): Supports multi-page PDFs. The extraction runs in the background and retrieves results when complete. Use this for documents with multiple pages or when processing larger files.
- On (Synchronous): Returns results immediately in real-time. However, this mode only supports single-page documents. If you pass a multi-page PDF with synchronous extraction enabled, the extraction will fail.
How to Add Opus Text Extraction
Drop it into your workflow
Drag an Opus Text Extraction task into your workflow where you need to convert documents to text.
Connect the input file
Link the file input to a source—like a Workflow Input, Import Data task, or another task’s file output.
Name the output
Give the extracted text output a meaningful name so you can reference it in later tasks.
Connect to downstream tasks
Wire the text output to tasks that will process the content—like Opus Agent or Custom Agent.
Tips for Better Results
Use high-quality source documents
Use high-quality source documents
OCR accuracy depends on input quality:
- Use high-resolution scans (300 DPI or higher)
- Make sure documents are properly aligned
- Avoid blurry or distorted images
- Ensure good contrast between text and background
Validate extracted text
Validate extracted text
OCR isn’t perfect—plan for errors:
- Use Agentic Review to check extraction quality
- Add Human Review for critical documents
- Handle low-confidence extractions appropriately
Structure text with downstream agents
Structure text with downstream agents
Opus Text Extraction outputs raw text:
- Use Opus Agent to parse into structured fields
- Use Custom Agent to extract specific data points
- Validate structure before further processing