SSIS的文件系统任务实例(zz)
一篇采用SSIS的文件系统任务实现文件拷贝、改名的的文章。简单,但能说明问题。来源于:http://sqljunkies.com/WebLog/knight_reign/archive/2005/02/12/7750.aspx
Expressions : Part II
This is a blog I've been wanting to do for some time. It addresses a problem users have a lot and one that I've kept in the back of my mind throughout the project. The problem is, how can I move files from one location to another and rename them in some consistent way as I do it. On its face, the problem doesn't seem that difficult, and in practice it's not. But there are a few gotchas that can get in the way. The way I've seen most folks address this is to drop a script task onto the designer and start coding away. That's the DTS way of doing things. There is a more subtle and simpler way to do this sort of thing in Integration Services. It also requires less code. A lot less code. This example, I think, points out some of the quintessential differences between DTS and IS. So, let's take a look.
So, the problem is, I have some files in the C:\SourceFlatFiles folder. I want to do some stuff with them and then I want to copy them to the ProcessedFlatFiles folder when I'm all done. I also want to rename them to make it clear that each file was processed. For this example, I'll prepend the word “Processed”. I could also prepend the date etc., but I want to keep it simple. Also, I'll not do any real work with the files. That's easy enough to work into your packages based on your requirements. The point is that this loop is basically self contained. You can copy and paste it into other packages and tweak the folder strings or configure them etc. and you've got a loop that copies and renames files.
There are a couple of problems with this. One, how to copy them. I could just use a command line call out to the shell and copy them all and then rename them. That would actually work for my scenario, but then this is a very simple scenario. What if I want to prepend information about the run, or the package or other such package or time specific information. That's tricky, but doable by using a variable and property expression on the execute process task etc. But, that's just another way of doing essentially the same thing. I could use a script task as described above, but then, to understand that the script task is moving files around I have to open it up and read the code. This isn't a bad solution, but it requires that I write code and some of the details are hidden. It may even be preferable in some cases, especially if the logic for moving the files is fairly complex.
However, if the scenario is fairly simple as described, there is a middle ground that wasn't available in DTS. Property expressions provide a way to smooth out the rough granularity of functionality like the foreach loops and enumerators by slightly tweaking settings on the fly. In this package, I used one strategically placed property expression to rename each file as it gets copied and eliminated the script task entirely.
The enumeration and looping story for IS is the foreach loop. I have a number of files in the source folder, so I drop a foreach loop onto the designer. Next, I put a file system task inside the loop. Then I created three variables: FileSourcePath, FileDestinationPath, and FileDestinationFolder. That's the structure.
Now, I need to set up the loop to point to the correct folder and enumerate the files in the source folder. The foreach loop UI looks like this:
I've selected the Fully qualified radio button because I want to get the whole string like this: “C:\SourceFlatFiles\FileName.txt”. In my property expression, I'll rip that apart a bit and put it back together with a different destination folder and the “Processed“ string prepended.
The Variable Mapping looks like this. Nothing special here so far. This just takes the current enumerated value, in this case, the fully qualified name of the file and places it in the variable specified on the right side. The index is always zero for the file enumerator because it does not enumerate collections. For an enumerator that enumerates over a collection of collection, the index is used to map from the item in the current enumerated collection to the variable.
At this point, I've got my loop enumerating over the files in the specified folder and putting the fully qualified name into the FileSourcePath variable. Now, I want to set up the file system task to copy the files. That task UI looks something like this:
Notice a few things here:
- IsDestinationPathVariable is set to true.
- IsSourcePathVariable is set to true.
- The DestinationVariable name is FileDestinationPath.
- The SourceVariable name is FileSourcePath
- The operation is “Copy“, I could move if I wanted, or delete. But I want to copy these files.
- I've actually have given a description to this task. Descriptions are important and should always be accurate. It's documentation!
- Overwrite destination is set to true. I didn't have a preference for this sample, but you really should think about this setting. It's can be destructive.
So, basically, the foreach loop iterates over the files in a folder, sets a variable to the path of the files and the file system task moves them. Pretty straight forward. No magic yet. The file system task can use connections to reference files, but referencing variables simplifies this somewhat.
Now, here's the trick. Variables have a property called Expression and you can enable that property by setting the EvaluateAsExpression property to true as shown in the bitmap below.
The expression is as follows:
@[User::FileDestinationFolder]+"\\"+"Processed"+RIGHT(@[User::FileSourcePath] , LEN(@[User::FileSourcePath])-FINDSTRING(@[User::FileSourcePath],"\\",2) )
This expression does the following:
- @[User::FileDestinationFolder] - It takes the value of FileDestinationFolder (“C:\ProcessedFlatFiles“)
- "\\"+"Processed" - It prepends the “\processed“ string to the file name
- RIGHT(@[User::FileSourcePath] , LEN(@[User::FileSourcePath])-FINDSTRING(@[User::FileSourcePath],"\\",2) - This part gets the file name by taking the number of letters from the right part of the string based on the length of the file name. In this case, I know that the second “\“ is the last one before the file name, so I use it as a marker. Taking the total length of the fully qualified name less the length of the name, gives me the starting point for the file name which is then returned to the “RIGHT“ method to take only the name.
With the above expression, the variable value returned is something like this:“C:\ProcessedFlatFiles\ProcessedFlatFile1.txt“. Again, if I wanted I could prepend some other string like the date or something similar.
This kind of scenario is very common. We get questions about how to do similar things all the time. I think this is a good example because it's so simple. By changing one or two variables, you can modify the entire function of the foreach loop.
The other point about this package I'd like you to notice. The variables are created on the foreachloop. There's no possibility that another task or component outside the loop will somehow accidentally reference or change those variables. This is why scoping variables is so important when building packages. In general, we find that packages should be kept as small and simple as possible, but occasionally they do get complex and it's easy to reference the wrong variable. This is true especially if you use a lot of variables. So, as a rule variables should be scoped as closely to where they are used as possible.
You can find the package here if you'd like to play around with it some.
I'd be interested in any questions or comments.
Thanks,
Universe.Earth.Software.Microsoft.SQLServer.IS.KirkHaselden