Red Hat OpenShift
You can use the Unstructured Partition Endpoint on Red Hat OpenShift. The Unstructured Partition Endpoint is intended for rapid prototyping of Unstructured’s various partitioning strategies, with limited support for chunking. It is designed to work only with processing of local files, one file at a time. File processing happens in a containerized environment, running within your Red Hat OpenShift deployment.
Unstructured on Red Hat OpenShift does not support the following:
- The Unstructured Workflow Endpoint. Use the Unstructured Workflow Endpoint instead of the Unstructured Partition Endpoint for production-level scenarios, file processing in batches, files and data in remote locations, generating embeddings, applying post-transform enrichments, using the latest and highest-performing models, and for the highest quality results at the lowest cost.
- The Unstructured Ingest CLI.
- The Unstructured Ingest Python library.
- The Unstructured open source libary.
- The Unstructured API base URL for calling Unstructured-hosted services:
https://api.unstructuredapp.io
- Unstructured API keys.
- Partitioning of files by using a vision language model (VLM) without appropriate user-supplied API credentials for the target VLM provider.
To get started with Unstructured on Red Hat OpenShift, complete the following steps. This procedure uses the Red Hat Developer Sandbox. To use other methods, see the additional resources section for links to how-to documentation for your specific Red Hat edition.
-
Create a new Red Hat login ID and account, if you do not already have one.
-
In the sidebar, the OpenShift view should be visible. If not, to show it, at the top of the sidebar, in the view selector, click Red Hat Hybrid Cloud Console and then, under Platforms, click Red Hat OpenShift.
-
In the sidebar, under Products, expand OpenShift AI, and then click Developer Sandbox | OpenShift AI.
-
Under Available Services, in the OpenShift tile, click Launch.
-
In the sidebar, the Developer view should be visible. If not, to show it, at the top of the sidebar, in the view selector, click Developer.
-
Click +Add.
-
Click the Container images tile.
-
On the Deploy Image page, for Image name from external registry, enter the name for the Unstructured on Red Hat OpenShift image:
a. On a separate tab in your web browser, go to the unstructured-api-core container artict page in the Red Hat Ecosystem Catalog.
b. Click the Get this image tab.
c. On the Using Red Hat login tab, click the copy icon next to Manifest List Digest.
d. Paste the copied value into the Image name from external registry field on the other tab in your web browser. -
Leave all of the other fields on the Deploy Image page set to their default values.
-
Note the value of the Target port field (such as
8080
). -
Click Create.
-
If the Topology view is not visible, to show it, at the top of the sidebar, in the view selector, click Topology.
-
If the unstructured-api-plus Knative service (KSVC) is not already selected, select it.
-
In the properties pane, on the Resources tab, note the value of the Routes field.
You can now use the route and target port that you noted previously to call the Unstructured Partition Endpoint that is running in the container within your Red Hat Developer Sandbox.
For example, to make a POST request to the Unstructured Partition Endpoint
to process an individual file, you can use the following curl
command example, replacing the following values:
- Replace
<route>
with your route value. - Replace
<target-port>
with your target port value. - Replace
<path/to/local/file>
with the path to the local file that you want to process. - Replace
<mime-type>
with the MIME type (for example,application/pdf
) of the local file that you want to process.
For additional command options that you can use with the Unstructured Partition Endpoint, see Partition Endpoint parameters.
The preceding command example uses the Hi-Res partitioning strategy, which is best for PDFs with embedded images, tables, or varied layouts.
To use the VLM paritioning strategy, which uses a vision language model (VLM) and is best for PDFs with scanned images, handwritten layous, highly complex layouts, or visually degraded pages, use a command similar to the following. This command example uses the OpenAI VLM provider and the gpt-4o vision language model provided by OpenAI:
To use the VLM strategy, you must also provide your own API credentials for the target VLM provider. For the preceding command,
you provide your OpenAI API key by adding an environment variable named OPENAI_API_KEY
for your
OpenAPI API key to your container deployment.
To add this environment variable, do the following:
- In your Red Hat Developer Sandbox, on the sidebar, if the Topology view is not visible, to show it, at the top of the sidebar, in the view selector, click Topology.
- If the unstructured-api-plus Knative service (KSVC) is not already selected, select it.
- In the Actions drop-down list, select Edit unstructured-api-plus.
- At the bottom of the settings pane, click the Deployment link next to Click on the names to access advanced options.
- Under Environment variables (runtime) only, for Name, enter
OPENAI_API_KEY
. For Value, enter your OpenAI API key. - Click Save.
To use other VLM providers, you must add the following environment variables:
- For Anthropic, add
ANTHROPIC_API_KEY
. - For AWS Bedrock, set
AWS_BEDROCK_ACCESS_KEY
,AWS_BEDROCK_SECRET_KEY
, andAWS_BEDROCK_REGION
. - For Vertex AI, set
GOOGLE_VERTEX_AI_API_KEY
.
To learn how to get the values for these environment variables, see the following:
- For Anthropic, sign in to your Anthropic account and then go to Account Settings to generate your Anthropic API key.
- For AWS Bedrock, see Getting started with the API in the Amazon Bedrock documentation to generate your IAM user’s access key, secret key, and region.
- For Vertex AI, go to your Google Cloud account’s APIs & Services > Credentials page, and then click Create credentials > API key to generate your Google Cloud API key.
You can also use Unstructured on Red Hat OpenShift with the Unstructured Python SDK or the Unstructured JavaScript/TypeScript SDK. To use these SDKs, note the following:
- When initializing an instance of
UnstructuredClient
, you must specify the Unstructured API URL ashttps://<route>:<target-port>/general/v0/general
, replacing<route>
with your route value and<target-port>
with your target port value. - When initializing an instance of
UnstructuredClient
, you do not specify an Unstructured API key. - To use VLM providers, you must first set the appropriate environment variables for each target VLM provider in your container deployment, as described previously in this article.