Data ManagementIntroduction The final topic to be looked at is Data Management. A significant number of the programs on the grid today require large volumes of data to be analysed. To that end there are number of services within the EGEE middleware dedicated to handling the data. One of the key issues to remember with data is that moving large volumes of data can be financially expensive and slow. The solution to this problem is to keep copies of the data at key locations on the grid and then choose the CE that matches the requirements of the job and is closest (in grid terms) to the data. Clearly this requires a specialised way of handling the data, there is no way you can be expected to know where each copy of the data is and how to choose the closest appropriate CE. In fact the way the data is handled is by using severally different ways of addressing the file. Each piece of data is allocated a Globally Unique Identifier (GUID) which is used to track the data. On top of this there will be several Storage Elements (SE's) on the grid that keep a copy of the data. Each of these copies is accessed using a Storage URL (SURL) which contains the name of the SE and the path to where the data is found on the SE. Neither of these are however very user friendly so a third way of addressing the data is used called a Logical File Name (LFN). The LFN is whatever name makes sense for the data provided it is unique, it is comparable to an alias for the data. This can be summarized by the following diagram: The actual middleware components involved are the Replica Manager Catalogue (RMC) that maps the LFN's to the GUID, and the Replica Location Service (RLS) which maps the GUID to the SURL's. Modify your previous jdl file to include the parameter InputData with a value of lfn:TutorialLFN.mpg and the parameter DataAccessProtocol with the values file, gridftp and rfio. Run the job and again observe where the Resource Broker is able to match the jobs requirements. These parameters have the effect of specifying that the job needs to run on a CE that is close to the specified LFN. The Access Protocols listed are the means by which you wish to access the data. When you submit your job the RB matches those CE's that advertise (via the Information Service) support for at least one of these protocols. You may wish to experiment and find out if all of these protocols are actually supported on gilda. Modify "runshell.sh" and "runshell.jdl" so that you can see the contents of TutorialLFN.mpg. To copy the file you should use the command edg-rm --vo gilda copyFile <LFNNAME> <LOCALNAME> where <LFNNAME> and <LOCALNAME> should be replaced appropriately. Make sure <LOCALNAME> contains the full path to the file (e.g."file://home/genius01/myfile.mpg"). The immediate question then is what is the full path to the file. The job when it runs on the WN is not running with your username, instead it is running with a generic username for your VO. This means there is no way of knowing at the time of writing your shell script what the local directory you can write to is. The one thing that is known is that the shell script will be running from a directory you can write to, hence use the environmental variable $PWD in your path. This is the end of the tutorial. |
Latest News2nd EchoGRID Strategic Workshop, 29-30 October 2007, Beijing, China: Towards a Shared EU & Chinese Vision for Grid Research Perspectives ... Read moreTechnical sitesTry the GRIDBecome a User
Want to become a user of the EGEE Grid? Click here
Related Projects
If your project is related to EGEE, please register it
Information Sheets
Click here for all the EGEE Information Sheets.
Contact EGEE
Get in touch with the EGEE Project Office at . If you have questions about the Website contact: . |
|
EGEE in Other Countries: | | | | | | | | | | | | | | | | | | | EGEE-II is a project funded by the European Union under contract number INFSO-RI-031688 |