RESTful Access to Data Manager (version beta 4.3)
IntroductionPart of the functionality that we are including in the GenomeSpace CDK is a Java API that provides client applications authentication and authorization services for the GenomeSpace system, as well as access to data file services. However, if your application is not written in Java and you want to add GenomeSpace support to it, you can still access these services by calling them directly over the web. HTTP Client Requirements
We highly recommend you utilize a fully-featured HTTP client library, such as Apache HttpClient for Java or Httplib2 for Python. It will make your life and your code a lot simpler.
|
Verb | URL |
GET |
/datamanager/v1.0/file/users/test/dir1 |
Each user name will have a default directory assigned and created to reach this default directory.
Verb | URL |
GET |
/datamanager/v1.0/defaultdirectory |
will redirect to the default directory URL. Currently, the default directory for all users is at URL:
https://dm.genomespace.org/datamanager/v1.0/file/Home
To redirected to the logged in user's personal directory, where they would have full read/write privileges:
Verb | URL |
GET |
/datamanager/v1.0/personaldirectory |
Currently, this would redirect to a URL that would look like this:
https://dm.genomespace.org/datamanager/v1.0/users/theUserName
To create a directory:
Verb | URL | Body |
PUT |
/datamanager/v1.0/file/users/test/newdirname |
{"isDirectory":true} |
where newdirname
is the new directory name. Note that DataManager expects the parent directory to exist (in this case, /users/test
). If directory creation is successful, the GSFileMetaData for the new directory will be returned.
GenomeSpace offers two different methods to upload files. The simpler one we will refer to as "Single PUT upload". It is easier to implement on the client, but it is limited to files of up to 5GB. For larger files you will have to do a "Multipart file upload". In this second method, GenomeSpace provides the client with the information necessary for it to do an upload using Amazon S3's multipart upload protocol.
Uploading a file is a two step process. You must first obtain from Data Manager a signed Amazon S3 URL. Then you, will PUT your file to the generated URL.
To obtain the signed URL, you will do a GET with a URL that looks like:
Note that we changed the path from /uploadurls to /uploadurl
/datamanager/v1.0/uploadurl/users/test/mydir/AnotherLittleFile.txt?Content-Length=6&Content-MD5=nwYkOry4nHDgwzHGHYcfpw%3D%3D&Content-Type=application/octet-stream
Note that the base URL (https://dmtest.genomespace.org/datamanager
) is followed by /uploadurl and then by the destination path and file name (/users/test/mydir/AnotherLittleFile.txt
)
The 3 query parameters included in the URL
Query Param | Description |
Content-Length |
The size in bytes of the file |
Content-MD5 |
The Base64 encoded MD5 hash for the file (note in the example the value is URL-encoded.%3D is the character ‘=’ url encoded. You should URL encode the values as well. To check your code, on Mac OSX you can obtain the correct Content-MD5 value for a file by issuing the following command:openssl md5 -binary THEFILENAME | openssl base64 |
Content-Type |
The content type you would like to assign to the file |
In response, the Data Manager service will return an Amazon S3 URL that you will use to PUT the file.
The returned URL might look something like this (included here for illustration, but what it looks like is not important. Just use whatever was returned by the web service):
https://genomespace-input.s3.amazonaws.com/users/test/mydir/AnotherLittleFile.txt?AWSAccessKeyId=AKIAIDXKHSCMYX5BHNLA&Expires=1296076583&Signature=G4tZJCObhPsvcdZxKJkBY7%2Bq378%3D
With your file PUT you will need to include 4 HTTP headers:
Content-Length
Content-MD5
(should not be URL encoded)
Content-Type
x-amz-meta-md5-hash
is no longer needed or used. Do not include or upload will fail
Amazon S3 will return an HTTP status code of 200 on the successful completion of the upload.
If you have a file larger than 5GB, you will have to do a multipart upload. In addition to overcoming the size limit of the "Single PUT upload", the multipart upload will allow for faster uploads. The S3 multipart protocol requires splitting the file into smaller chunks that can be uploaded concurrently a multithreaded implementation.
The S3 REST APIs are complicated. We highly recommend using an existing S3 library instead of dealing with the APIs directly. Amazon offers SDKs for Java, Ruby, PHP, and .NET.
The first step is to request the information needed for the upload from GenomeSpace.
Verb | URL |
GET |
/datamanager/v1.0/uploadinfo/dir1/file1.txt |
The part that follows /uploadinfo/ is the destination path of the file you will be uploading. In response, the Data Manager will return a JSON S3FileUploadInfo object. This JSON response includes temporary Amazon credentials and S3 specific details you will be using during the upload.
To download the file, you will need either the absolute directory file path or you would have obtained the URL for the file already.
Both the absolute file path and url are available in the GSFileMetada
object (properties named path and url respectively).
The URL to GET will look something like below (minus protocol, server,port number):
/datamanager/v1.0/file/users/test/mydir/AnotherLittleFile.txt
To avoid problems with special characters, you should URL encode each URL path element and the file name itself. (Do not just do a URL encode on the whole URL or you will end up with a URL that will not work).
The Data Manager will respond with a redirect to the Amazon S3 location of the file. If your HTTP client library has redirects enabled, the redirection should happen automatically.
Ownership of files and directories is established by who is the owner of the top level directory. Any file and directory that is directly or indirectly under a users home directory is owned by that user. As such, the owner user is able to read, write, delete, and grant permissions on any object below.
Example:
/users/fred
is home directory for a user named “fred”.
User fred is the owner of directory /users/fred
as well as /users/fred/dir1
and /users/fred/dir1/myfile.txt
because the directory and the file are below the home directory.
Data Manager supports sharing of files and directories through the use of access control lists (ACLs). The owner of a file or directory can grant read and write permissions to any other user or group (for more information on users and groups see blabha).
Each ACL points to the object that it is associated with and includes a list of access control entries (ACE) identifying the user or group (generically referred as a Security Identity (SID) and the permission that has been granted.
Read permission on a file means that its contents can be downloaded. Write permissions mean that the file can be either updated/replaced or deleted.
On a directory read permission mean that the grantee can list the contents of the directory. Write permission mean that they can upload new files and delete and update existing ones.
ACL grants are inherited down the directory structure. So if you grant read permissions on a parent directory to another user, that user will be able read every file and directory below it in the hierarchy.
Example:
I user fred grants “read” permissions on directory /users/fred/dir1 to user kathy, she will be able to get a listing of /users/fred/dir1
and every directory below dir1, like /users/fred/dir1/dir2
and /users/fred/dir1/dir2/dir3
.
Kathy will also be able to download /users/fred/dir1/myfile.txt
since the file inherited the read permission from its parent directory.
If fred goes ahead and also grants write permissions to kathy on /users/fred/dir1
, then kathy will be able to create and delete new directories and files below /users/fred/dir1
.
There are two ways to create an ACL. The first, PUT an ACL object:
Verb | URL | Body |
PUT |
/datamanager/v1.0/acl/file/dir1/fileOrDir |
An ACL JSON object |
The other method is to POST the ACEs associated with the the ACL
Verb | URL | Body |
POST |
/datamanager/v1.0/acl/file/dir1/fileOrDir |
An JSON array of ACE JSON objects |
Both methods will return the resulting ACL JSON object.
For our running examples, the URLs would look something like:
https://dm.genomespace.org/datamanager/v1.0/acl/file/users/fred/dir1
and
https://dm.genomespace.org/datamanager/v1.0/acl/file/users/fred/dir1/dir2
Once in existence, as you would expect, you can GET the ACL:
Verb | URL |
GET |
/datamanager/v1.0/acl/file/dir1/fileOrDir |
You can also get all the ACLs that affect a particular file object by adding the “hierarchy” query parameter to the URL:
Verb | URL |
GET |
/datamanager/v1.0/acl/file/dir1/fileOrDir?hierarchy=true |
In this case, instead of getting a single JSON object you will get a JSON array of ACLs, one for each ACL found in the directory hierarchy for the file object.
To delete an ACL:
Verb | URL |
DELETE |
/datamanager/v1.0/acl/file/dir1/fileOrDir |
or if you want to just remove a specific access control entry on an ACL:
Verb | URL |
DELETE |
/datamanager/v1.0/ace/theAceId |
You will need the ACE id that is included in the ACL JSON serialization.
To get the effective permissions on any file object you can examine the “effectiveACL” property included in every GSFileMetadata object to obtain the consolidated permissions on that object. That is, who has read and write permissions on the object, either because of ownership, permissions granted directly on that object and permissions that have been inherited through the directory structure.
To emphasize, the ACL object attached to GSFileMetadata object is not a “real” ACL object, but flattened view of the hierarchical permissions and is included as a convenience.
Data Manager has the ability to convert some files into formats that can be consumed by other applications.
The data format that the Data Manager thinks original file is identified in the GSFileMetadata
property dataFormat
(could be empty if it does not recognize the format).
GSFileMetadata
also specifies the property availableDataFormats
. This will be an array of GSDataFormat
objects that identify the formats this file can be requested in.
To GET the file in a specific format you will build a URL that looks as follows:
Verb | URL |
GET | /datamanager/file/dir1/dir2/fileName.ext?dataformat=http://www.genomespace.org/datamanager/dataformat/lowercasetxt |
Remember to URL-encode the value for the dataformat parameter.
The query parameter dataformat value is the URL for the format. This URL can be obtained from the GSDataFormat
url property in the availableDataFormats
array.
Verb | URL |
DELETE |
/datamanager/v1.0/file/dir1/dir2/fileOrDirName |
Note: To delete a directory, it needs to be empty
Verb | URL | Headers |
PUT |
/datamanager/v1.0/file/dir1/destFileOrDirName |
x-gs-copy-source |
Note: The URL identifies the new object that will be created by the copy.
The source file or directory is identified in the custom x-gs-copy-source
header . The header value should have look like /dir1/dir3/sourceFile
(the base URL nor the URL path segment /datamanager/v1.0/file
should be included).
Verb | URL | Headers |
PUT |
/datamanager/v1.0/file/dir1/destFileOrDirName?dataformat=http://www.genomespace.org/datamanager/dataformat/dataformatname |
x-gs-copy-source |
Note: The URL identifies the new object that will be the source of the conversion. Note the dataformat query parameter. This is expected to be URL for the destination format. The possible conversions for any file on GenomeSpace can be obtained from examining that file objects GSFileMetadata object (see Obtain Metadata on a File or Directory). Make sure you URL-encode the dataformat parameter.
Also, the destination URL file name is expected to be consistent with the request data format e.g., if your source file is a GCT file and you are converting it to a GXP file, then the URL for the file needs to end with .gxp .
The source file or directory is identified in the custom x-gs-copy-source
header . The header value should have look like /dir1/dir3/sourceFile
(the base URL nor the URL path segment /datamanager/v1.0/file
should be included).
Verb | URL |
GET |
/datamanager/v1.0/filemetadata/dir1/dir2/destFileOrDirName |
This will return a JSON GSFileMetadata object. See Appendix A and B.
If you want to get the response HTTP headers for a file or directory without actually GETting the file object, you can use the HEAD verb call. Can be useful, for example, to get the size of a file from the content-length header if for some reason you want to avoid looking at the GSFileMetadata object.
Verb | URL |
HEAD |
/datamanager/v1.0/file/dir1/dir2/fileOrDirName |
Verb | URL |
GET |
/datamanager/v1.0/dataformat |
Will return an array of GSDataFormat
objects (see Appendix C).
Property Name | Type | Description |
name | string | The name of the file or directory |
path | string | The absolute file path for the file |
url | string | The url for the file |
parentUrl | string | The url for the parent directory |
size | number | The size of the file in bytes. Will be 0 if directory |
owner | SID object | The user id for the owner of the file. See appendix E |
isDirectory | boolean | Distinguish between file and directory. Will be true or false. |
isLink(new) | boolean | Indicate whether this is a link to another file object. |
targetPath(new) | string | If this file object is a link (i.e.,"isLink":true), targetPath will show the full file path of the linked file. |
lastModified | date | Last modified time stamp in xsd:dateTime format. Empty for directory |
dataFormat | object (GSDataFormat) | The data format of the file. Empty for directory and for files for which the format is unknown |
availableDataFormats | array(GSDataFormat) | The formats this file can be converted to. Will always include at least the same format as in dataFormat. Empty if format is unknown or if this is a directory |
effectiveAcl | ACL | Describes the permissions that are effective on this object. i.e., it includes permissions that have been set explicitly on this object and any others that have been inherited. See Appendix E. Note: because this is a “synthetic” and not a “real” ACL, the ACL does not have an id. |
Example:
{ "name":"twoRecords.gct", "url":"https:\/\/dev.broadinstitute.org\/datamanager\/file\/users\/test\/twoRecords.gct", "parentUrl":"https:\/\/dev.broadinstitute.org\/datamanager\/v1.0\/file\/users\/test", "path":"\/users\/test\/twoRecords.gct", "owner":{ "id":"test", "name":"test", "type":"User" }, "size":225, "lastModified":"2011-04-20T11:53:00-04:00", "isDirectory":false, "isLink":false, "dataFormat":{ "name":"gct", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/gct\/0.0.0", "fileExtension":"gct" }, "availableDataFormats":[ { "name":"gxp", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/gxp\/0.0.0", "fileExtension":"gxp" }, { "name":"genomicatab", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/genomicatab\/0.0.0", "fileExtension":"tab" }, { "name":"gct", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/gct\/0.0.0", "fileExtension":"gct" } ], "effectiveAcl":{ "object":{ "objectId":"\/users\/test\/twoRecords.gct", "objectType":"DataManagerFileObject" }, "accessControlEntries":[ { "sid":{ "id":"test", "name":"test", "type":"User" }, "permission":"R" }, { "sid":{ "id":"test", "name":"test", "type":"User" }, "permission":"W" } ] } }
Property name | Type | Description |
contents | array(GSFileMetadata) | One entry per child file or directory |
directory | object(GSFileMetadata) | The metadata for the directory |
{ "directory":{ "name":"testDir1", "url":"https:\/\/dev.broadinstitute.org\/datamanager\/file\/users\/test\/testDir1", "parentUrl":"https:\/\/dev.broadinstitute.org\/datamanager\/v1.0\/file\/users\/test", "path":"\/users\/test\/testDir1", "owner":{ "id":"test", "name":"test", "type":"User" }, "size":0, "isDirectory":true, "isLink":false, "effectiveAcl":{ "object":{ "objectId":"\/users\/test\/testDir1", "objectType":"DataManagerFileObject" }, "accessControlEntries":[ { "sid":{ "id":"test", "name":"test", "type":"User" }, "permission":"R" }, { "sid":{ "id":"test", "name":"test", "type":"User" }, "permission":"W" } ] } }, "contents":[ { "name":"ASillyLittleFile.txt", "url":"https:\/\/dev.broadinstitute.org\/datamanager\/file\/users\/test\/testDir1\/ASillyLittleFile.txt", "path":"\/users\/test\/testDir1\/ASillyLittleFile.txt", "owner":{ "id":"test", "name":"test", "type":"User" }, "size":6, "lastModified":"2011-10-18T12:37:31-04:00", "isDirectory":false, "isLink":false, "dataFormat":{ "name":"txt", "description":"Plain text format", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/txt\/0.0.0", "fileExtension":"txt" }, "availableDataFormats":[ { "name":"uppercasetxt", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/uppercasetxt\/0.0.0", "fileExtension":"uppertxt" }, { "name":"lowercasetxt", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/lowercasetxt\/0.0.0", "fileExtension":"lowertxt" }, { "name":"txt", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/txt\/0.0.0", "fileExtension":"txt" } ], "effectiveAcl":{ "object":{ "objectId":"\/users\/test\/testDir1\/ASillyLittleFile.txt", "objectType":"DataManagerFileObject" }, "accessControlEntries":[ { "sid":{ "id":"test", "name":"test", "type":"User" }, "permission":"R" }, { "sid":{ "id":"test", "name":"test", "type":"User" }, "permission":"W" } ] } }, { "name":"aLittleSubdir", "url":"https:\/\/dev.broadinstitute.org\/datamanager\/file\/users\/test\/testDir1\/aLittleSubdir", "path":"\/users\/test\/testDir1\/aLittleSubdir", "owner":{ "id":"test", "name":"test", "type":"User" }, "size":0, "isDirectory":true, "isLink":false, "effectiveAcl":{ "object":{ "objectId":"\/users\/test\/testDir1\/aLittleSubdir", "objectType":"DataManagerFileObject" }, "accessControlEntries":[ { "sid":{ "id":"test", "name":"test", "type":"User" }, "permission":"R" }, { "sid":{ "id":"test", "name":"test", "type":"User" }, "permission":"W" } ] } } ] }
Property | Type | Description |
name | string | The name of the format |
url | string | The URL for the format. This is the official format identifier |
fileExtension | string | The file extension associated with this format. This is optional. |
description | string | A human-readable description of the format. This is optional. |
Example:
{ "name":"txt", "description":"Plain text format", "url":"http:\/\/www.genomespace.org\/datamanager\/dataformat\/txt\/0.0.0", "fileExtension":"txt" }
/datamanager/v1.0/dataformat
as described earlier in this document.Name | URL | Description |
gct | http://www.genomespace.org/datamanager/dataformat/gct | GenePattern expression dataset file format. |
gmt | http://www.genomespace.org/datamanager/dataformat/gmt | Gene set file from MSigDB |
gxp | http://www.genomespace.org/datamanager/dataformat/gxp | Main file format for Genomica. |
Genomica tab | http://www.genomespace.org/datamanager/dataformat/genomicatab | Alternate Genomica format for loading gene expression data. |
txt | http://www.genomespace.org/datamanager/dataformat/txt | Plain text file |
lowercasetxt | http://www.genomespace.org/datamanager/dataformat/lowercasetxt | All lower case text. For demo purposes. Will go away in future. |
uppercasetxt | http://www.genomespace.org/datamanager/dataformat/uppercasetxt | All upper case text. For demo purposes. Will go away in future. |
nowhitespace | http://www.genomespace.org/datamanager/dataformat/nowhitespace | Text with no white space. For demo purposes. Will go away in future. |
/datamanager/v1.0/dataformatconverter
Input Format | Output Format |
gmt | Genomica tab |
gxp | gct |
Genomica tab | gct |
gct | Genomica tab |
gct | gxp |
{ "id":"03b00579-5356-414c-afc7-3432ece90029", "object":{ "objectId":"\/users\/test\/dir1\/AnotherLittleFile.txt", "objectType":"DataManagerFileObject" }, "accessControlEntries":[ { "id":"9b02e70a-f25b-4951-90ae-6759517be31b", "sid":{ "id":"test2", "name":"test2", "type":"User" }, "permission":"R" }, { "id":"df603747-8c5c-4e80-b187-c7e912bc39a6", "sid":{ "id":"86d3ff94-7ada-44c5-bec2-afdbd7be54b4", "name":"DmClientTestGroup", "type":"Group" }, "permission":"W" } ] }
Property | Type | Description |
id | string | The ACL id. Generated by the server and will not appear sometimes, like within “effectiveAcl” property in GSFileMetadata. |
object/objectId | string | The identifier for the object that the ACL targets. Currently, ACL only targets file objects, so this will be the full path of the file or directory. |
object/objectType | string | Identifies the type of object targeted by ACL. Currently, it is always DataManagerFileObject |
accessControlEntries | array of ACEs | The access control entries. |
Property | Type | Description |
id | string | The ACE id. Generated by the server and will not appear sometimes, like within “effectiveAcl” property in GSFileMetadata |
sid | a SID object | The security identity being granted permissions. |
permission | string | Only valid values are “R” and “W” for read and write. |
Property | Type | Description |
id | string | The id for the SID |
name | string | The name |
type | string | The type of SID. At this time, only valid values are “User” and “Group” |
Note that version has been removed
Property | Type | Description |
path | string | The GenomeSpace path of the file to upload |
s3BucketName | string | The name of the Amazon S3 bucket that will contain the new file. |
s3ObjectKey | string | The Amazon S3 object key for the new file. |
genomeSpaceFileUrl | string | The GenomeSpace URL of the new file. |
s3FileUrl | string | The Amazon S3 URL of the new file. |
secretKey | string | Temporary S3 secret key. Part of Amazon credentials authorized to do the upload. |
accessKey | string | Temporary S3 access key. Part of Amazon credentials authorized to do the upload. |
sessionToken | string | Temporary S3 session token key. Part of Amazon credentials authorized to do the upload. |
Example:
{ "path":"/users/test/clientTest1/AFileNamedDummy.txt", "s3BucketName":"genomespace-dev", "s3ObjectKey":"users/test/clientTest1/AFileNamedDummy.txt", "genomeSpaceFileUrl":"https://dmdev.genomespace.org:8444/datamanager/file/users/test/clientTest1/AFileNamedDummy.txt", "s3FileUrl":"https://genomespace-dev.s3.amazonaws.com/users/test/clientTest1/AFileNamedDummy.txt", "amazonCredentials": { "secretKey":"1Gr3q/VAJRJFQTU3MzsTHnFH1Px1wz4gLjyF2ncQ", "accessKey":"ASIAJXISRHBODWMCUCXQ", "sessionToken":"AQoDYXdzEPNtiDYmMT6BA=="} }