Parsing Salesforce attachments for import

TECHDESKTOPWEBCOMMAND-LINESALESFORCETUTORIAL

I found deleting and renaming files en masse via the command line based on a list super handy for a recent Salesforce data preparation task.

I was working on a Salesforce to Salesforce migration and of some 56K attachments in the legacy instance I had to isolate less than 2K and prepare them for import into the new instance. I was working with a complete data export from Salesforce including all attachments.

When you extract the Salesforce data export zip files you’ll see an Attachments.csv that lists all attachments, and an Attachments directory that contains the file binaries. You may have to extract several zip files depending on the volume of attachments. I consolidated them from several zip extracts into one directory.

cp -v -a /Source/Directory/Attachments/. /Destination/Directory/Consolidated


Each record in the Attachments.csv file has a Record Id like 00P80000003BdZLEA0 and if you search for that Id in your Attachments directory you’ll find the actual file. You might need to add the proper file extension to view it (docx, xlsx, pdf, etc).

I used the Attachments.csv file to isolate the records (attachments) we were going to migrate. This also meant I had my list of files I didn’t need anymore.

I used a simple shell script to run the delete job based on a list in a csv file.

My files.csv delete sheet had records that looked like this (no headers).

/Destination/Directory/Consolidated/00P80000003BdZLEA0


This was the script I used courtesy nos on stack exchange.

#!/bin/bash

for f in $(cat files.csv) ; do
  rm "$f"
done


After running the delete job the files left in the my Consolidated directory were exactly the ones I needed to migrate into the new Salesforce instance.

There was just one more step.

Salesforce recommends prepping the AttachmentsImport.csv sheet like this:

For Attachments, you will need to prepare the import file with the following columns:

  • ParentId: ID of the object associated with the attachment
  • Name: The name of the attachment file
  • Body: The absolute path to the attachment file’s location on your local drive.

The files in my Consolidated directory needed to be renamed for the import.

I used another simple shell script to run the rename job, again, based on a list in a csv file.

My filesrename.csv sheet had records that looked like this (no headers).

/Destination/Directory/Consolidated/00P80000003BdZLEA0 /Destination/Directory/Consolidated/UniqueFileName.xlsx


I got some help modifying a script shared by terdon on stack exchange resulting in this.

#!/bin/bash

for f in $(cat filesrename.csv) ; do
  echo $f | while IFS=, read orig new
  do
    mv $orig $new
  done
done


Now I had my index and files ready for import.


Photo by Rich Tervet on Unsplash