Integration spotlight: Amazon Textract

With the Amazon Textract integration, text of document files within Brightspot objects like PDFs can now be extracted and applied as metadata to make them more easily searchable for users.

Amazon Textract is a machine-learning (ML) document analysis service that intelligently detects and extracts text, handwriting and data from any type of document with no manual configuration or templates required.

Amazon Textract and Brightspot: How it works

With the Amazon Textract integration on Brightspot, publishers can extract text from Brightspot objects like PDFs, JPGs and PNGs and apply them as metadata. Brightspot associates the extracted text with the files, so users can easily search for and use them in their content.

Amazon Textract and Brightspot: Use cases

A major media outlet relies on PDFs or scanned-source documents with important information in tables it previously couldn’t access. Using Amazon Textract, the team extracts information from tables in PDFs uploaded to Brightspot, enabling editors to search for them in the CMS.
An automotive e-retailer looking to modernize the car-buying and selling process leverages Amazon Textract to accelerate transactions by automatically capturing and validating data from documents and forms, such as loan applications or vehicle titles, so decisions can be made more quickly. The Brightspot integration enables site editors to quickly access and publish this valuable data.

How to integrate Amazon Textract with Brightspot

With Amazon Textract, you can extract text from PDFs, JPGs, and PNGs. Brightspot associates the extracted text with the files, so editors can then search for and use your files in their own content.

Read the documentation

Related integrations

Integration spotlight: Amazon Comprehend

Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. Amazon Comprehend develops insights by recognizing the entities, key phrases, language, sentiments and other common elements in a document.

Integration spotlight: Amazon Rekognition

Amazon Rekognition can automatically identify objects, people, text, scenes and activities, as well as detect any inappropriate content in images. See it in action.

Integration spotlight: Amazon Transcribe

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy to add speech-to-text capability. Using the Amazon Transcribe API, you can analyze audio files stored in Amazon S3 and have the service return a text file of the transcribed speech.

Integration spotlight: CloudConvert

Brightspot’s CloudConvert integration extracts metadata from text and images inside of assets, then uses that metadata to improve a user’s search experience.