Pulling documents out of nested folders
As part of some consulting work, I had to extract documents out of a large zip file. The zip file contained thousands of documents and EACH individual document resided in a directory structure nested 6 to 11 other directories deep. Doing this by hand would be tedious and boring.
Also, the individual files were named with punctuation, spaces and other annoying bits. I wanted to clean those up as well.
I spent a few minutes hacking together a ColdFusion script to do the work for me.
Here is the code:
<cfdirectory directory="C:\Xcel\CandidateDocuments" name="dirQuery" action="list" recurse="true">
<!--- Filter the resulting directory query to remove empty directories. --->
<cfquery dbtype="query" name="FilesOnly">
SELECT * FROM dirQuery
WHERE TYPE<>'Dir'
</cfquery>
<!--- Whip through the filtered query --->
<cfoutput query="FilesOnly">
<!--- Remove the zip files, since we don't need them and also anything without a 3 character file extension --->
<cfif len(listlast( name, ".")) IS 3 AND NOT listlast( name, ".") IS 'zip'>
<!---
Pull out the name of the file minus the extention and remove all the annoying parts
Note, I am not using listFirst() here because I don't know if the document has a period in some inappropriate spot.
--->
<cfset filename = ReReplace(mid(name, 1, len(name) - 4 ), "[^[:alnum:]]", "", "all")>
<!--- Hold on to the extention of the file --->
<cfset ext = listlast(name, ".")>
<!--- Copy the file to another directory --->
<cffile action="copy" destination="C:\Xcel\cleandocuments\" source="#directory#\#name#">
<!--- And rename it --->
<cffile action="rename" source="C:\Xcel\cleandocuments\#name#" destination="C:\Xcel\cleandocuments\DataCurl-#filename#.#ext#">
</cfif>
</cfoutput>
<!--- Clap hands, and use the time saved to post nonsense on blog --->
There, wasn't that fun?







There are no comments for this entry.
[Add Comment]