Ask Your Question

jones's profile - activity

2017-11-21 15:19:55 -0500 answered a question How does DataONE deal with varying matadata schemas?

DataONE supports community metadata standards through a series of crosswalks that map each metadata standard to a common SOLR schema. For example, even though the title of a data set is found in different locations in FGDC, EML, and ISO19139, all are mapped to a common title field in SOLR. The SOLR schema that we map to is described in the DataONE Architecture documentation. The mappings from common metadata standards are also described for EML, FGDC, and Dryad as examples.

When an incoming metadata document is received, it is parsed and key fields from the metadata are extracted and indexed in SOLR according to the crosswalk, making them available in the metadata search service, and through https://search.dataone.org.

The SOLR metadata crosswalk among the various metadata standards that DataONE indexes does not contain all fields from all schemas. Each of the original metadata documents is available for download, so all metadata are preserved, but viewing specialized fields may require downloading the full metadata document for a data package.

2016-10-11 13:11:06 -0500 received badge  Famous Question (source)
2016-05-03 15:03:32 -0500 received badge  Notable Question (source)
2016-05-03 15:03:32 -0500 received badge  Famous Question (source)
2015-10-14 13:28:15 -0500 commented answer How do I remove content from my Member Node?

The API operations for these, in order of preference for use, are: MN.update(), MN.archive(), and MN.delete(). See the [MN API documentation](https://purl.dataone.org/architecture/apis/MN_APIs.html) for details, and note that the delete() operation can only be called by administrators.

2015-10-14 13:28:15 -0500 received badge  Commentator
2015-07-23 14:21:02 -0500 marked best answer Does a repository have to implement all of the DataONE Services to participate?

If I have a data repository that provides read-only access to our data, can I become a Member Node without allowing people to create content in my repository. Also, if I do want to allow people to create content in my repository, can I control who is able to do so as the repository administrator?

2014-11-13 23:57:32 -0500 commented answer DataONE OAI-PMH support?

Note that we do, however, provide the listObjects() REST service (http://releases.dataone.org/online/api-documentation-v1.2.0/apis/CN_APIs.html#CNRead.listObjects) to enumerate the objects on a node, and these can easily be harvested over REST.

2014-07-19 12:44:02 -0500 commented answer How often (at what frequency) does DataONE harvest Member Node metadata?

The MN sets the schedule when they call CNRegister.register() for their node. The schedule can be updated using CNRegister.updateNodeCapabilities(). The schedule is visible for any given node in their node registry entry, which can be viewed from the CNCore.listNodes() service. You can see the current list of nodes and their sync schedules with a curl command like this:

curl  https://cn.dataone.org/cn/v1/node

The schedule lines will be embedded in that output, such as:

<schedule hour="*" mday="*" min="0/3" mon="*" sec="10" wday="?" year="*"/>

which says to sync once every 3 minutes on the 10 second mark. Documentation of the schedule format is in the Quartz Scheduler documentation.

2014-04-27 23:28:11 -0500 received badge  Popular Question (source)
2014-04-04 13:06:59 -0500 commented answer Engineering dataset

In addition to ONEShare, you could contribute your data set to the KNB repository, which is open to submissions.

2013-10-28 11:33:06 -0500 edited answer Does DataONE have services available for librarians outside the earth and environmental sciences?

The Best Practices (http://www.dataone.org/best-practices) are quite universal and can be promoted across all disciplines. The Data Management Plan Tool (DMPTool; https://dmptool.org/), to which we have contributed, is also applicable across disciplines. Particularly if your institution supports InCommon (https://incommon.org/), you can tailor the DMPTool to tell your researchers about the resources your institution has to offer. Other DataONE resources promoted through the Resources link on our website (http://www.dataone.org/resources) are also quite general.

From a cyberinfrastructure perspective, there are common needs for data management across all disciplines. For example, DataONE supports Dublin Core as a metadata standard, which is not discipline-specific. Other metadata formats (such as Ecological Metadata Language) have features targeted at the needs of earth and environmental scientists, such as the need to reference data based on place, time, and species. We are working with institutions that are hosting data from a broad range of disciplines, and we are always interested in ways of extending DataONE to meet broader science needs. We started with the Earth and environmental sciences, and we're building on that core strength. If you have specific needs or interests, please feel free to contact us.

2013-10-25 21:29:39 -0500 received badge  Nice Answer (source)
2013-10-25 12:40:14 -0500 received badge  Nice Answer (source)
2013-08-26 14:59:57 -0500 received badge  Necromancer (source)
2013-08-26 14:39:05 -0500 answered a question How can we access javadocs for the different released versions of the DataONE API?

DataONE makes Javadoc API documentation available for both released versions of the software and for the current development branch.

Release documentation

Development branch documentation

2013-08-26 13:47:59 -0500 answered a question How do I develop a Java-based Member Node software implementation?

DataONE Member Nodes must faithfully implement the Member Node REST interface in order to interoperate with the other members of DataONE. This common REST service interface allows a Member Node to communicate with other Member Nodes, with client tools like R, and with the Coordinating Nodes at DataONE. Member Node services are categorized in 4 tiers (Tiers 1-4), and nodes can choose to implement the services at Tier 1 or above.

The simplest way to implement the interfaces in Java is to utilize our DataONE Client Library, which provides Java methods for calling each of the DataONE REST services. We have an overview of How To use the DataONE Java Client Library in your application, as well as JavaDoc APIs for the libclient library.

As an implementation is developed, it can be tested using our Member Node API testing service. There are multiple versions of the tester listed there, each of which tests against different released versions of the software stack. Generally you'll want to use the most recent one.

The overall process of becoming a Member Node is described in our MN Checklist.

For development assistance, you can contact us on IRC at irc.ecoinformatics.org on channel #dataone, or via our email list at developers@dataone.org.

2013-07-22 10:12:41 -0500 received badge  Self-Learner (source)
2013-07-17 18:27:59 -0500 answered a question What causes the Coordinating Node to show a different number of data and metadata objects than a Member Node?

These discrepancies occur when either: synchronization is not functioning properly, so the MN and CN have not synced their content since the MN made changes, or if the MN makes changes without properly notifying the CN. There are two cases to consider:

Case 1: Fewer objects on CN than MN: check <dateSysMetadataModified>

In this case, the likely problem is that the CN is not discovering objects on the MN during the synchronization process. This can happen if the MN inserts some objects, but fails to set the <dateSysMetadataModified> field to the current time. Some MNs back-date these system metadata modification times, not realizing that the CN relies on that date to determine which objects might have changed and need to be synced. If the <dateSysMetadataModified> date is set to a date that is before the most recent synchronization time, then that object will never be noticed by the CN, and never harvested. To fix this, modify the system metadata and set the <dateSysMetadataModified> field to the time now, and the next time the synchronization occurs, the new object changes will be noticed and picked up.

Case 2: More objects on CN than MN: ensure archive and obsoletes/obsoleted are set

In this case, some objects are on the CN which are not present on the MN. The only way for this to happen is if the MN has removed the content that was previously associated with a particular identifier (often in the process of trying to change an identifier). The solution is simple as well -- remember that identifiers (pids) are both persistent and non-reusable, so if you want to remove an identifier from your system, first insert the content using the new identifier, being sure to reference the old identifier in the <obsoletes> system metadata field. Then, delete the old identifier, but be sure to keep its system metadata, reference the new pid in the <obsoletedBy> field, and mark the object as <archived>true</archived>. This will tell the CN that the object identified by the PID is no longer active on the MN, and that it has been replaced by the new PID. This will allow the CN to properly understand and index the changes made on the MN.

See the documentation on SystemMetadata for more details: http://mule1.dataone.org/ArchitectureDocs-current/apis/Types.html#Types.SystemMetadata

2013-07-17 18:00:16 -0500 asked a question What causes the Coordinating Node to show a different number of data and metadata objects than a Member Node?

Sometimes, when looking for Member Node (MN) content on the Coordinating Node (CN), there can be a discrepancy in the count of objects that persists even after synchronization has run. For example, a Member Node might be reporting that it contains 1017 FGDC metadata objects, but the CN might only list 965. Or, the CN might be reporting more objects than the MN. What are the major causes, and how does a Member Node operator avoid these discrepancies?

2013-07-07 16:43:58 -0500 received badge  Necromancer (source)
2013-07-07 16:06:25 -0500 received badge  Critic (source)
2013-07-07 16:06:07 -0500 commented answer Can I store data with DataONE?

The answer starts with "Data cannot be stored', which is somewhat misleading. Better if it lead with a positive statement about how data can be stored with DataONE.

2013-06-14 11:25:16 -0500 received badge  Notable Question (source)
2013-06-13 13:07:59 -0500 commented answer What other external jar files need to be included when using DataONE libclient within environments like Matlab?

Note also that the project is a Maven project, and so d1libclientjava can be built and the dependencies will be pulled in during a Maven build.

2013-05-31 16:49:38 -0500 answered a question Can I store data with DataONE?

Yes, data are stored in the Member Nodes that make up the DataONE federation. You can upload your data to a Member Node that suits your disciplinary focus and your desired suite of tools, and then that data will be accessible via DataONE tools and services. A full list of DataONE Member Nodes that might be able to store your data is available at DataONE Current Member Node page. Different Member Nodes are designed to store different types and amounts of data. Member Nodes also have different policies about which users are allowed to store data and the specific processes for submitting data and handling quality assurance. From the current list of Member Nodes, the current open repositories that are part of DataONE include:

The KNB Member node is an open repository supporting ecological and environmental science data, and has been in existence since the late 1990's. Data can be uploaded over the web at the KNB site, as well as via data management tools such as Morpho, and analysis tools such as R and python. The KNB currently houses several tens of thousands of data packages.

The ONEShare Member Node is an open repository for capturing data that is managed in Microsoft Excel. The DataUp plugin for Excel provides the ability to directly interact with the ONEShare node and deposit your data.

A variety of other Member Nodes may also support upload of your data if you are part of their community. For example, the LTER Member Node supports archival of LTER data, the SANParks Member Node supports data from the South African National Parks, and the Merritt Member Node supports the University of California community. See the list of current Member Nodes for more details on each.

2013-05-09 15:27:07 -0500 edited answer How can we download a specific set of metadata fields for a large number of metadata records?

We've already parsed out all of those fields from all of the metadata documents and indexed them, so by far the easiest way to get the information you want is to use our CN query() service. In order to not overload the server, please query for a maximum of 1000 records at a time. Each of these should only take a second or so to run, so it should only take a couple of minutes to run through all of the metadata documents that we have and get those fields. To do that, you would write a one or two line script that calls our query service, and each time through the loop you would use curl to execute a query like this:

https://cn.dataone.org/cn/v1/query/solr/fl=id,title,abstract,keywords&q=formatType:METADATA&rows=1000&start=0

Note the last two parameters (rows, and start). Rows indicates how many records to retrieve, and start indicates where in the record set to start retrieving records. So in your first call, set start=0 to get the first block of 1000, and in your second call set start=1000 to get the second block of 1000 records, etc. When you get fewer than 1000 records back from the call, then you've gotten all of the records back (see the numFound field in the query response to see how many records match your query).

Note that many of these metadata documents are newer versions of the same metadata, so if you only want the newest revision of any given metadata document, you would want to use a query that filters out all of the obsoleted revisions. To do that, you could use a query like this:

https://cn.dataone.org/cn/v1/query/solr/fl=id,title,abstract,keywords&q=formatType:METADATA+-obsoletedBy:*&rows=1000&start=0

At present (April 2013), there are about 127,000 overall metadata documents, and about 46,000 if you filter out obsolete revisions. So, filtering out the obsoleted metadata records will be even faster.

In either case, the return document is a SOLR result set record with all of the fields encoded in an easily parseable XML format. Here's an example of what you will get back for one of the metadata documents:

<doc>
    <str name="abstract">
    To establish a long term data base on the nutrient dynamics of a salt marsh estuarine system.   
    This data can be used in correlation with a number of other estuarine data sets to obtain a 
    broader definition of the over all estuarine ecosystem.
    </str>
    <str name="id">doi:10.6073/AA/knb-lter-nin.2578.3</str>
    <arr name="keywords">
        <str>nutrient dynamics</str>
        <str>North Inlet Estuary</str>
        <str>Baruch Institute</str>
        <str>Georgetown, South Carolina</str>
    </arr>
    <str name="title">
    Daily Water Sample Chlorophyll a, and Phaeophytin a data for North Inlet Estuary system, Georgetown, SC.
    </str>
</doc>

I think this should do what you want.

2013-05-08 18:21:07 -0500 edited question How do I query and access data using the DataONE R client?

I am trying to play around with the dataone R package. Is it possible for me to get whatever keys/etc. are needed to at least search D1 metadata through the package? It's not clear how to get the proper permissions...

2013-05-08 18:20:41 -0500 answered a question How do I query and access data using the DataONE R client?

Technically you don't need a certificate, at least to use the client read-only to search and download publicly accessible data. Here's a simple R script showing a query that finds a metadata record and an associated data file, and then downloads the data and converts that to a data frame.

library(dataone)
cli <- D1Client()
results <- d1SolrQuery(cli, list(q="Commercial Harvest",fl="identifier,title,author,documents"))
print(results)
# Note the identifiers of the data are listed in the 'documents' element for each metadata record
d1object <- getD1Object(cli, "doi:10.5063/AA/mbauer.56.1")
mydf <- asDataFrame(d1object)
summary(mydf)

You will see a bunch of WARNings about not having a certificate -- we should probably turn down the volume there as its annoying. If you do want to log in to write data to a Member Node, you can do so by logging into CILogon at this URL and downloading your certificate, which will be good for 18 hours:

https://cilogon.org/?skin=DataONE

You can log in with an institutional account if your university is part of InCommon, or you can use a Google account or ProtectNetwork account if not. That should put a certificate in the /tmp dir (or equivalent on your OS) which the R client should notice and use. See also: https://ask.dataone.org/question/40/can-i-use-my-university-account-to-login-to-dataone/

2013-05-01 10:36:17 -0500 received badge  Notable Question (source)
2013-05-01 09:41:53 -0500 marked best answer Where can I chat with DataONE developers to resolve issues when deploying a Member Node?

I am deploying a Member Node, and have questions about where I can chat with developers about technical issues. Is there an IRC server or other place to communicate with developers?

Also, are there other ways to communicate with the DataONE community about development issues?