Data Quality Analysis in Salesforce files with MuleSoft Anypoint

Github Repository: https://github.com/andrewwhitten/template-sfdc-file-check

I have created a solution that can be added to any record page (Case, Account, whatever) and find the data quality / encryption / password protect status of all the attachments.

It comprises of aSalesforce LWC Lightning control and MuleSoft Anypoint 4.3 runtime service and you to extend the ContentVersion standard object, and drag and drop the control onto the page designer.

The interactions of my solution are detailed in the Sequence Diagram below. The data quality process is initiated by the Salesforce LWC control calling AnyPoint with a list of DocumentId’s to check:

The MuleSoft Anypoint 4.3 Flow will take a list of Document ID’s and start processing them. Currently it uses a For loop to check each file:

You don’t have to call this process from Salesforce – just call the service from a REST client or your browser: https://<YOUR HOSTING URL>/FileCheck?”GUID1″,”GUID2″,”GUID3″ :

What does this version detect?

  • PDF – Password protection
  • Microsoft Word (doc and docx) – Password protection
  • Zip file password encryption

What may future versions include?

  • Microsoft Excel password protection
  • Microsoft PowerPoint password protection
  • Image validation
  • Corrupt files

This proof of concept introduces a framework for working with Salesforce files in MuleSoft. You can analyze your files with any Java code you like. Do you need to scan all word documents for ‘Copyright of Acme’? Just write another Java class.

Disclaimer

  • Question: Is this Production ready?
  • Answer: No. This is currently at a ‘working proof of concept’ stage, but needs a lot more in terms of performance management, error handling and testing. You should only use this in your Salesforce developer sandboxes.

Problem Background

The Salesforce CRM platform does not yet have a coding capability to read and analyze large files. For example, if a user was to password protect a large PDF file would require a lot of inventive Apex coding from scratch to determine that it was password protected and unusable.

Java on the other hand can do this kind of file analysis, and take advantage of strong and capable open source libraries that are available to work for a variety of file formats. MuleSoft Anypoint is a popular integration product owned by Salesforce and used by many Salesforce orgs, that can further be extended with Java code.

There are other options, such as licensing Salesforce Heroku or maybe Serverless Functions when formally available. You can also create web services hosted on Microsoft Azure, AWS, or anything else. Many Salesforce customers have however invested in MuleSoft that does have strong Salesforce support out of the box. This design is not a compelling reason by itself to procure MuleSoft if you don’t have it already, however it is interesting if you already have MuleSoft and some spare capacity on it.

Notes:

  1. All this can be run on free trial services. Salesforce and MuleSoft have signup pages.
  2. You can run the MuleSoft AnyPoint service locally on your machine, however Salesforce won’t be able to connect to it (or at least you will have a hard job making the connection). You will need to deploy to AnyPoint Cloud with an SSL certificate for a full end-to-end.
  3. I’m running this on a medium AnyPoint vCore in the cloud (the highest available on the trial service). Performance seems fine, but there has been no real performance testing yet. There probably is a limit around how much you can throw at this service before it starts failing.
  4. You will need to add your MuleSoft service’s URL to CSP Trusted Sites in Salesforce Admin
  5. The web service is called from a LWC component directly. This can be secured to the calling host, but in the next version I would probably put into Apex so that it is called from the Salesforce org rather than directly from the browser.
  6. The ability to detect whether a document is password protected was actually not as easy as I had imagined. Open source libraries are great, but they really lack a simple isDocumentEncrypted() function.
    • PDF password detection comes courtesy of Apache PDF Box 
    • ZIP password detection is with the standard Java libraries.
    • Microsoft document password detection uses Apache POI
    • Other file types and other types of file quality detection can be added to the MuleSoft solution just by adding a new Java class and libraries
  7. The next step is to extend this to determine issues when users uploads a file through the UI. For example, rejecting an upload of an encrypted PDF
  8. I’m not so experienced with Anypoint, so my Flow is rather long. I’ll also look to break that up in anticipation of other services that will come and reuse common parts
  9. I should also create a ‘how to set up’ page with detailed instructions. Please ping me if that is of interest.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s