Ask Your Question

rnahf's profile - activity

2014-08-22 12:29:45 -0500 answered a question How large a dataset can be uploaded to a Generic Member Node (GMN) via the DataONE API?

(This answer is summarized from an internal email thread on the DataONE developers list and the comment to the question by waltz)

If the dataset (data file) is represented as one of the 'DATA' file format types (see the complete list), you should be fine, as there is no prescribed file size limit in GMN for any format type. However, if the dataset's file format is a 'METADATA' file format (NetCDF, for example), after making it into the specific GMN node, it will be uploaded to DataONE Coordinating Node in order to build the search record in the central search index, and the Coordinating Node doesn't accept METADATA files larger than 1Gb.

So, any dataset represented as a METADATA filetype has an effective size limit of 1Gb, since it will not be able to be registered, even though it could be successfully uploaded to GMN.

Following are a couple salient points from the discussion on testing, in response to the idea of trying to upload a 1Tb size file into GMN:

Doing this through the API will be limited to the restrictions of transferring any large file over HTTP, and any fragilities that may exist in the specific target MN implementation (one hopes that there are none). That is to say, that on a stable local network it should be possible, but across the public internet it may be flakey due to dropping connections. That said, we regularly transfer 0.5TB across the internet globally with GBIF and we only have to drop into retry mechanisms occasionally.

reply by the GMN developer:

It should work as long as the network connection is stable during the transfer...

One important concern when dealing with such large objects is to make sure that both client and MN just stream the data through to/from disk and doesn't attempt to buffer up the object. Both the client library for Python and GMN were designed to avoid any buffering and I performed tests with GB sized files to make sure none was occurring.

Most systems will run into problems trying to buffer a 1Tb file in memory!

Hope that helps.

2014-07-19 12:40:11 -0500 received badge  Enlightened (source)
2014-05-16 07:39:56 -0500 received badge  Good Answer (source)
2014-04-04 14:57:31 -0500 received badge  Nice Answer (source)
2014-04-03 18:43:31 -0500 answered a question Engineering dataset

Currently, there are no engineering-specific repositories registered to DataONE.

It seems like you already found the list of current Member Nodes / repositories on the DataONE website. If you haven't already visited the contribute data page, it explains a bit about finding the appropriate Member Node (repository). I think for your engineering data the Data-Up / ONEShare node is the one option that might work for your data if you want to use an existing repository. Information about that node and how to contact the administrators can be found on their summary sheet. DataUp also has their own home page here.

Regarding setting up your own Member Node, it does require a longer-term commitment of resources to keep the server maintained and online, but if you are interested, please see the Member Node section of the DataONE site.

ADDITIONAL THOUGHTS: DataONE is looking to expand it's range of domains, so if you think hosting a repository is possible, or can suggest a repository (such as NEES.org) that you think would be a good addition to DataONE, please contact DataONE at http://www.dataone.org/contact.

2013-10-25 12:40:00 -0500 received badge  Supporter (source)
2013-06-17 15:52:35 -0500 answered a question How can we access javadocs for the different released versions of the DataONE API?

Note: DataONE is currently working to make javadocs available online for all released versions, and to consolidate the javadocs from different projects into one site for each version. Until then, below is the current situation.

Currently (as of June, 2013), DataONE only has the javadocs corresponding to unreleased (trunk) development versions online here. Since the api is only additive within a major version (and is still in version 1), while far from ideal, will provide the most complete and more thoroughly edited wording than what can be found in the earlier releases. In cases where new methods have been added, every attempt has been made to provide a @since annotation, so one would know when the method was added to the package. Other than bug fixes, DataONE is vigilant to preserve the behavior of individual methods to ensure compatibility with other versions.

Javadocs are now included in the release packages of d1_libclient_java, found at: http://releases.dataone.org/online/d1_libclient_java/, starting at v1.2.3.

To get those, download for the version you want, uncompress, and use your browser to point to the file docs/apidocs/index-all.html

Hope that helps.

2013-06-12 18:06:48 -0500 commented answer Why are identifier encoding tests failing?

Hope you don't mind, I changed the correct answer to the original explanation, since it seemed to address the question asked (which led you to post the fix for your apache server). I think technically, your post should have been a comment to my solution, offering extra information.

2013-06-12 18:03:39 -0500 received badge  Scholar (source)
2013-06-07 12:02:59 -0500 answered a question What other external jar files need to be included when using DataONE libclient within environments like Matlab?

While we don't maintain a list of dependencies outside of the distributions, we have made distributions of d1_libclient_java bundled with its dependencies available at http://releases.dataone.org/online/d1_libclient_java, organized by version. There, you will find archives in .zip, .tar.gz, and .bz2 formats.

Just download an archive of the appropriate type for your machine, and look in the /lib directory, where you will find the full set of dependency jars for that d1_libclient_java release.

Note: the pom.xml file that maven uses can also be downloaded from that site, but it has only libclient's direct dependencies (not the dependencies of the dependencies). For those who use maven for their development project, it is the file that's used to dynamically download the dependencies.

2013-05-03 10:07:57 -0500 received badge  Student (source)
2013-05-01 11:24:37 -0500 received badge  Notable Question (source)
2013-04-03 13:57:47 -0500 received badge  Popular Question (source)
2013-04-02 16:04:31 -0500 received badge  Editor (source)
2013-04-02 16:02:22 -0500 received badge  Teacher (source)
2013-04-02 16:02:22 -0500 received badge  Self-Learner (source)
2013-04-02 15:40:59 -0500 answered a question Why are identifier encoding tests failing?

In general, the main culprits are either how your code handles the identifier (whether it can handle unicode characters properly), or how the web server handles the URL.

Looking at the tests above, I see that all of the identifiers of those that fail have a "/" (forward slash) in them, so there is a problem with how these identifiers are interpreted by the web service. Many web servers for security reasons alter URLs such as foo/../../../somewhere/outside/web/context to maintain the context of the request, so that would be my leading guess of where the problem lies.

Checking the exception text:

ServiceFailure: 0000: NON-D1-EXCEPTION: status: 404 response headers: 
          header = value
          Vary = Accept-Encoding
          Date = Tue, 02 Apr 2013 18:49:5...: path-ascii-doc-example-10.1000/182

it seems an unexpected response was received - so either the service returns non-dataone exceptions in some cases, or the request never reached the dataone service and the web server returned it.

Putting it all together, my hunch is that it is the latter - the web server itself is failing to pass the request on to the DataONE handler, and returning a standard response. I would look at how your web server is handling security for these types of requests.

If you are using apache/tomcat, try adding the following lines to catalina.properties

org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true
org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH=true

see http://tomcat.apache.org/security-6.html for more information about how it gets involved with the URL.

2013-04-02 14:40:49 -0500 asked a question Why are identifier encoding tests failing?

I am deploying a member node, and the identifier encoding tests in the web tester are failing for some identifiers, but not others. For example:

AssertionError: http://127.0.0.1/mn/v1 Failed 1 or more identifier encoding tests Node Test Summary for node: http://127.0.0.1/mn/v1
          Test 0: OK : common-unicode-ascii-safe-ABCDEFGHIJKLMNOPQRSTUVWXYZ
          Test 1: OK : common-unicode-ascii-safe-abcdefghijklmnopqrstuvwxyz
          Test 2: OK : common-unicode-ascii-safe-0123456789
          Test 3: OK : common-unicode-ascii-safe-:@$-_.!*()',~
          Test 4: OK : common-unicode-ascii-safe-unreserved-._~
          Test 5: OK : common-unicode-ascii-safe-sub-delims-$!*()',
          Test 6: OK : common-unicode-ascii-safe-gen-delims-:@
          Test 7: OK : common-unicode-ascii-escaped-"#<>[]^`{}|
          Test 8: OK : common-unicode-ascii-escaped-tomcatBlocked-\
          Test 9: OK : common-unicode-ascii-escaped-tomcatBlocked-%5C
          Test 10: OK : common-unicode-ascii-semi-colon-test-%3B
          Test 11: OK : common-unicode-ascii-escaped-%
          Test 12: OK : common-unicode-ascii-escape-anyway-+
          Test 13: OK : path-unicode-ascii-safe-&=&=
          Test 14: OK : path-unicode-ascii-escaped-;
          Test 15: OK : path-unicode-ascii-escaped-?
          Test 16: Error:: ServiceFailure: 0000: NON-D1-EXCEPTION: status: 404 response headers: 
          header = value
          Vary = Accept-Encoding
          Date = Tue, 02 Apr 2013 18:49:5...: path-unicode-ascii-escaped-/
          Test 17: OK : path-unicode-ascii-escaped-%3F
          Test 18: OK : path-unicode-ascii-escaped-%2F
          Test 19: Error:: ServiceFailure: 0000: NON-D1-EXCEPTION: status: 404 response headers: 
          header = value
          Vary = Accept-Encoding
          Date = Tue, 02 Apr 2013 18:49:5...: path-unicode-ascii-escaped-double-//case
          Test 20: Error:: ServiceFailure: 0000: NON-D1-EXCEPTION: status: 404 response headers: 
          header = value
          Vary = Accept-Encoding
          Date = Tue, 02 Apr 2013 18:49:5...: path-unicode-ascii-escaped-double-trailing//
          Test 21: OK : path-unicode-ascii-escaped-double-%2F%2Fcase
          Test 22: OK : path-unicode-ascii-escaped-double-trailing%2F%2F
          Test 23: OK : common-unicode-bmp-1byte-escaped-¡¢£
          Test 24: OK : common-unicode-bmp-2byte-escaped-䦹䦺
          Test 25: OK : common-ascii-doc-example-urn:lsid:ubio.org:namebank:11815
          Test 26: Error:: ServiceFailure: 0000: NON-D1-EXCEPTION: status: 404 response headers: 
          header = value
          Vary = Accept-Encoding
          Date = Tue, 02 Apr 2013 18:49:5...: path-ascii-doc-example-10.1000/182
          Test 27: Error:: ServiceFailure: 0000: NON-D1-EXCEPTION: status: 404 response headers: 
          header = value
          Vary = Accept-Encoding
          Date = Tue, 02 Apr 2013 18:49:5...: path-ascii-doc-example-http://example.com/data/mydata?row=24
          Test 28: OK : common-bmp-doc-example-ฉันกินกระจกได้
          Test 29: OK : common-bmp-doc-example-Is_féidir_liom_ithe_gloine

I can't make sense of the all of the test output. What's going on?