Establishing an Orcid Participation Baseline Using the Orcid Public API

by Keith Komos

What is Orcid?

Orcid is a service that provides you with a numeric identifier that ties you with your scholarly output. For those of you lucky enough to be named “John Smith” or something else that’s quite common, Orcid will help folks find the real you. If your name changes because of marriage, or if it has any inconsistencies, or if (like in the example I just provided) you have a very common name, Orcid solves a lot of problems for you.

Research institutions have been pushing this technology because of how much easier it makes academic life for faculty and grad students. Your publications and output can, often automatically, be added to your Orcid account and then that can be proliferated across all kinds of different systems.orcid_logo

What are we trying to do?

Since research institutions are the early adopters of this technology and there are various outreach efforts across hundreds of organizations right now, a way to evaluate and track how many folks that have actually signed up for Orcid in your organization would be quite helpful. Orcid doesn’t provide a very robust searching or filtering interface for doing this, but luckily they do provide a public API that allows us to ferret out this information.

What do we need before we start?

  • Your organization’s Ringgold number
  • A free Orcid account of your own in order to connect to the API
  • A bit of programming background
  • Access to server software and a database.
  • Knowledge of how to connect through OAuth. That’s how the Orcid API let’s you in.
  • I’m using an Apache server with PHP installed and a MySQL database.

How do we do it?

Create a database with a few fields like name, orcid_id, education, and employment.

Connect to the Orcid API using the OAUTH protocol to get your unique token. Save this token to a variable because you’ll need it for every API request.

Formulate your initial query. You may need to make several passes to get everything, since Orcid groups returns in batches of 100. My query for the University of Houston looked like this:

"search/orcid-bio/?q=14743+AND+HOUSTON&start=“0”&rows=100";

Replace “14743” with YOUR organization’s Ringgold number. The other part is optional but it helped me reduce false positives. Also, I specified in the HTTP header that I wanted JSON to come back, and used CURL to make the command. Here’s my header:

 $header[] = "Content-Type: application/json" ;
 $header [] = "Authorization: Bearer " . $token;
 $ch = curl_init("http://pub.orcid.org/v1.2/" . $query);
 curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
 curl_exec($ch);
 curl_close($ch);

So with these results, I simply parse the JSON and save the name and Orcid ID to the database. Nothing with Employment or Education yet, since this API call doesn’t return that info anyway.

You could actually stop here. But I want to know if their education was at the University of Houston or if their employment is/was. To do that, I need to traverse this list, using the Orcid ID numbers of everyone I just pulled in, and get that additional information. This query is very simple:

$orcidID . "/orcid-profile";

Again, I specify JSON as my return. This gets me the education and employment information I wanted to add to my stats. But here’s where it gets a little weird. I want to save these arrays as strings, so I need to convert them. I make a special point to capture the beginning and end date, so that I can determine if they are a current student or employee. Here’s my code for that:

foreach ( $data['orcid-profile']['orcid-activities']['affiliations']['affiliation'] as $w):
         if($w['type'] == 'EMPLOYMENT') {
             $employment .= $w['organization']['name'] . '---' . $w['department-name'] . '---' . $w['role-title'] . '---' . $w['start-date']['year']['value'] . '-' . (empty($w['end-date']['year']['value']) ? 'Present' : $w['end-date']['year']['value']) . ';;';
         } 
 endforeach;

We’re done querying the ORCiD API, but even with the cleaning up and reformatting we’ve done as we went along, there’s still a bit more to do. We now write a simple script that goes through and looks for the phrase “University of Houston” and the word “Present” in the date part of the string. If they both occur then we flag it as a present student or employee. We can also reduce false positives at this point by deleting records that don’t contain “University of Houston” anywhere in these fields.

Now what?

Tally it all up and display it in a table if you want to. Now you have a membership baseline for your institution. How many employees? How many students? You can go through the process again on a quarterly basis or so to see how membership changes over time, and use these metrics to measure the effectiveness of an Orcid outreach campaign.

Drop me a line if you have any questions!