05/07/2024 | News release | Distributed by Public on 05/07/2024 11:15
Organizations must deal with countless reports, contracts, research papers, and other documents, but managing, deciphering, and extracting pertinent information from these documents can be challenging and time-consuming. In such scenarios, an AI-powered document management system can offer a transformative solution.
Developing Generative AI (GenAI) technologies with Docker offers endless possibilities not only for summarizing lengthy documents but also for categorizing them and generating detailed descriptions and even providing prompt insights you may have missed. This multi-faceted approach, powered by AI, changes the way organizations interact with textual data, saving both time and effort.
In this article, we'll look at how to integrate Alfresco, a robust document management system, with the GenAI Stack to open up possibilities such as enhancing document analysis, automating content classification, transforming search capabilities, and more.
Alfresco is an open source content management platform designed to help organizations manage, share, and collaborate on digital content and documents. It provides a range of features for document management, workflow automation, collaboration, and records management.
You can find the Alfresco Community platform on Docker Hub. The Docker image for the UI, named alfresco-content-app, has more than 10 million pulls, while other core platform services have more than 1 million pulls.
Alfresco Community platform (Figure 1) provides various open source technologies to create a Content Service Platform, including:
For detailed instructions on deploying Alfresco Community with Docker Compose, refer to the official Alfresco documentation.
Figure 1: Basic diagram for Alfresco Community deployment with Docker.Integrating Alfresco with the GenAI Stack unlocks a powerful suite of GenAI services, significantly enhancing document management capabilities. Enhancing Alfresco document management with the GenAI stack services has different benefits:
Alfresco provides two main APIs for integration purposes: the Alfresco REST API and the Alfresco Messaging API (Figure 2).
The Alfresco Repository can be updated with the enrichment data provided by GenAI Service using both APIs:
Technically, Docker deployment includes both the Alfresco and GenAI Stack platforms running over the same Docker network (Figure 3).
The GenAI Stack works as a REST API service with endpoints available in genai:8506, whereas Alfresco uses a REST API client (named alfresco-ai-applier) and a Messages API client (named alfresco-ai-listener) to integrate with AI services. Both clients can also be run as containers.
Figure 3: Deployment architecture for Alfresco integration with GenAI Stack services.The GenAI Stack service provides the following endpoints:
The implementation of GenAI Stack services loads the document text into chunks in Neo4j VectorDB to improve QA chains with embeddings and prevent hallucinations in the response. Pictures are processed using an LLM with a visual encoder (LlaVA) to generate descriptions (Figure 4). Note that Docker GenAI Stack allows for the use of multiple LLMs for different goals.
Figure 4: The GenAI Stack services are implemented using RAG and an LLM with visual encoder (LlaVA) for describing pictures.To get started, check the following:
Obtaining the amount of RAM available for Docker Desktop can be done using following command:
docker info --format '{{json .MemTotal}}'
If the result is under 20 GiB, follow the instructions in Docker official documentation for your operating system to boost the memory limit for Docker Desktop.
Use the following command to close the repository:
git clone https://github.com/aborroy/alfresco-genai.git
The project includes the following components:
The Docker GenAI Service for Alfresco, located in the genai-stack folder, is based on the Docker GenAI Stack project, and provides the summarization service as a REST endpoint to be consumed from Alfresco integration.
cd genai-stack
Before running the service, modify the .env file to adjust available preferences:
# Choose any of the on premise models supported by ollama LLM=mistral LLM_VISION=llava # Any language name supported by chosen LLM SUMMARY_LANGUAGE=English # Number of words for the summary SUMMARY_SIZE=120 # Number of tags to be identified with the summary TAGS_NUMBER=3
Start the Docker Stack using the standard command:
docker compose up --build --force-recreate
After the service is up and ready, the summary REST endpoint becomes accessible. You can test its functionality using a curl command.
Use a local PDF file (file.pdf in the following sample) to obtain a summary and a number of tags.
curl --location 'http://localhost:8506/summary' \ --form 'file=@"./file.pdf"' { "summary": " The text discusses...", "tags": " Golang, Merkle, Difficulty", "model": "mistral" }
Use a local PDF file (file.pdf in the following sample) and a list of terms (such as Japanese or Spanish) to obtain a classification of the document.
curl --location \ 'http://localhost:8506/classify?termList=%22Japanese%2CSpanish%22' \ --form 'file=@"./file.pdf"' { "term": " Japanese", "model": "mistral" }
Use a local PDF file (file.pdf in the following sample) and a prompt (such as "What is the name of the son?") to obtain a response regarding the document.
curl --location \ 'http://localhost:8506/prompt?prompt=%22What%20is%20the%20name%20of%20the%20son%3F%22' \ --form 'file=@"./file.pdf"' { "answer": " The name of the son is Musuko.", "model": "mistral" }
Use a local picture file (picture.jpg in the following sample) to obtain a text description of the image.
curl --location 'http://localhost:8506/describe' \ --form 'image=@"./picture.jpg"' { "description": " The image features a man standing... ", "model": "llava" }
Note that, in this case, LlaVA LLM is used instead of Mistral.
Make sure to stop Docker Compose before continuing to the next step.
The Alfresco Platform, located in the alfresco folder, provides a sample deployment of the Alfresco Repository including a customized content model to store results obtained from the integration with the GenAI Service.
Because we want to run both Alfresco and GenAI together, we'll use the compose.yaml file located in the project's main folder.
include: - genai-stack/compose.yaml - alfresco/compose.yaml # - alfresco/compose-ai.yaml
In this step, we're deploying only GenAI Stack and Alfresco, so make sure to leave the compose.ai.yaml line commented out.
Start the stack using the standard command:
docker compose up --build --force-recreate
After the service is up and ready, the Alfresco Repository becomes accessible. You can test the platform using default credentials (admin/admin) in the following URLs:
The AI Applier application, located in the alfresco-ai/alfresco-ai-applier folder, contains a Spring Boot application that retrieves documents stored in an Alfresco folder, obtains the response from the GenAI Service and updates the original document in Alfresco.
Before running the application for the first time, you'll need to build the source code using Maven.
cd alfresco-ai/alfresco-ai-applier mvn clean package
As we have GenAI Service and Alfresco Platform up and running from the previous steps, we can upload documents to the Alfresco Shared Files/summary folder and run the program to update the documents with the summary.
java -jar target/alfresco-ai-applier-0.8.0.jar \ --applier.root.folder=/app:company_home/app:shared/cm:summary \ --applier.action=SUMMARY ... Processing 2 documents of a total of 2 END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.
Once the process has been completed, every Alfresco document in the Shared Files/summary folder will include the information obtained by the GenAI Stack service: summary, tags, and LLM used (Figure 5).
Figure 5: The document has been updated in Alfresco Repository with summary, tags and model (LLM).You can now upload documents to the Alfresco Shared Files/classify folder to prepare the repository for the next step.
Classifying action can be applied to documents in the Alfresco Shared Files/classify folder using the following command. GenAI Service will pick the term from the list (English, Spanish, Japanese) that best matches each document in the folder.
java -jar target/alfresco-ai-applier-0.8.0.jar \ --applier.root.folder=/app:company_home/app:shared/cm:classify \ --applier.action=CLASSIFY \ --applier.action.classify.term.list=English,Spanish,Japanese ... Processing 2 documents of a total of 2 END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.
Upon completion, every Alfresco document in the Shared Files folder will include the information obtained by the GenAI Stack service: a term from the list of terms and the LLM used (Figure 6).
Figure 6: The document has been updated in Alfresco Repository with term and model (LLM).You can upload pictures to the Alfresco Shared Files/picture folder to prepare the repository for the next step.
To obtain a text description from pictures, create a new folder named picture under the Shared Files folder. Upload any image file to this folder and run the following command:
java -jar target/alfresco-ai-applier-0.8.0.jar \ --applier.root.folder=/app:company_home/app:shared/cm:picture \ --applier.action=DESCRIBE ... Processing 1 documents of a total of 1 END: All documents have been processed. The app may need to be executed again for nodes without existing PDF rendition.
Following this process, every Alfresco image in the picture folder will include the information obtained by the GenAI Stack service: a text description and the LLM used (Figure 7).
Figure 7: The document has been updated in Alfresco Repository with text description and model (LLM).The AI Listener application, located in the alfresco-ai/alfresco-ai-listener folder, contains a Spring Boot application that listens to Alfresco messages, obtains the response from the GenAI Service and updates the original document in Alfresco.
Before running the application for the first time, you'll need to build the source code using Maven and to build the Docker image.
cd alfresco-ai/alfresco-ai-listener mvn clean package docker build . -t alfresco-ai-listener
As we are using the AI Listener application as a container, stop the Alfresco deployment and uncomment the alfresco-ai-listener in the compose.yaml file.
include: - genai-stack/compose.yaml - alfresco/compose.yaml - alfresco/compose-ai.yaml
Start the stack using the standard command:
docker compose up --build --force-recreate
After the service is again up and ready, the Alfresco Repository becomes accessible. You can verify that the platform is working by using default credentials (admin/admin) in the following URLs:
Summarization
Next, upload a new document and apply the "Summarizable with AI" aspect to the document. After a while, the document will include the information obtained by the GenAI Stack service: summary, tags, and LLM used.
Description
If you want to use AI enhancement, you might want to set up a folder that automatically applies the necessary aspect, instead of doing it manually.
Create a new folder named pictures in Alfresco Repository and create a rule with the following settings in it:
Upload a new picture to this folder. After a while, without manual setting of the aspect, the document will include the information obtained by the GenAI Stack service: description and LLM used.
Classification
Create a new folder named classifiable in Alfresco Repository. Apply the "Classifiable with AI" aspect to this folder and add a list of terms separated by comma in the "Terms" property (such as English, Japanese, Spanish).
Create a new rule for classifiable folder with the following settings:
Upload a new document to this folder. After a while, the document will include the information obtained by the GenAI Stack service: term and LLM used.
A degree of automation can be achieved when using classification with AI. To do this, a simple Alfresco Repository script named classify.js needs to be created in the folder "Repository/Data Dictionary/Scripts" with following content.
document.move( document.parent.childByNamePath( document.properties["genai:term"]));
Create a new rule for classifiable folder to apply this script with following settings:
Create a child folder of the classifiable folder with the name of every term defined in the "Terms" property.
When you set up this configuration, any documents uploaded to the folder will automatically be moved to a subfolder based on the identified term. This means that the documents are classified automatically.
Prompting
Finally, to use the prompting GenAI feature, apply the "Promptable with AI" aspect to an existing document. Type your question in the "Question" property.
After a while, the document will include the information obtained by the GenAI Stack service: answer and LLM used.
By embracing this framework, you can not only unlock a new level of efficiency, productivity, and user experience but also lay the foundation for limitless innovation. With Alfresco and GenAI Stack, the possibilities are endless - from enhancing document analysis and automating content classification to revolutionizing search capabilities and beyond.
If you're unsure about any part of this process, check out the following video, which demonstrates all the steps live: