Saturday, 24 August 2013

On Users, Projects, Visibility and Permissions

This week, I've been thinking a lot about the way the DataPortal will be used in the future - specifically the usage by future project users.

Currently every survey stored by WISERD, or described within the WISERD metadata and stored elsewhere (such as the UK Data Archive), has it's metadata globally searchable and viewable by anyone who has registered and logged in.

This is because there is currently no concept within the DataPortal of any users being different from any other users, so all users have the same viewing permissions and download abilities.

While this is fine for any information which has been cleared as "Open" - and obviously this is the ideal situation - it is easy to conceive of situations where information could be stored within the DataPortal which should not be visible to the entire world.

I've been thinking of 3 main use-cases to try to provide a solution for.

  • Data Entry team members may add part of a surveys metadata, but don't want the survey to be "live" until all the data has been entered
  • The WISERD hub team want to moderate the data being entered into the DataPortal by external partners, and can control what text is shown "live" until it has been sanity checked.
  • A Surveys entered during the lifetime of a project is not due to be made public until the end of the project, but can be viewed by project members during.
To implement this, 5 new Database tables have been added, to define 3 different "concepts".

The rest of this post uses Capitalised Words to denote the concepts which now exist. Their definitions follow, as they are currently envisioned, or where implementation has begun:

There is now the idea that a user has a "Role", Users and Surveys belong to a "Project" a survey has a "Visibility".

The tables map as follows:

Project ID - Project Name
Standard, every project has an id.

User ID - Project ID
User ID taken from the regular user details table, any number of users can be matched with any number of projects.

Survey ID - Project ID
Each Survey is owned by one Project. The survey can be transferred between projects if and when visibility ownership or responsibilities shift between projects.

Visibility ID - Visibility State Name
These visibilities denote what type of users can view a survey. So far there is Global (anyone can see), Project (visible by users listed as being in the Project which owns that Survey), and it is anticipated that a "UserOnly" visibility will be added in future.

Survey ID - Visibility ID
Assigning a Survey a Visibility. A Survey has one-and-only-one visibility, so ideally there should be as many entries in this table as there are entries in the Survey table. Some defaults may be assumed if there is no visibility set: if the Survey has a Project owning it, assign it as project only visibility until set otherwise; if the Survey has no Project set, set it as global ownership, and assume it is owned by the "wiserd_global" project.

The "wiserd_global" project is a catch-all Project for unclaimed Surveys, and has all its Survey's as globally visible, on the assumption that they were added prior to this Survey-owning paradigm being set in place.

I have also coded up a GUI to go along with the data entry tools, which allows a logged in user to assign the entered Survey to a Project of which that User is a member. Users with Roles which have certain Privileges can create Surveys in the Projects they have access to, and higher level Roles allow assigning Users to such Projects. These hierarchies will be drawn out in further detail after communication with project heads and the WISERD Hub data entry team to ensure that the right number of Roles will exist, with the right permissions and functions available to them.

Obviously the aim here is for people to continue doing their job as they have done before, without adding undue layers of complexity. Hopefully, once a user has been assigned the appropriate Role and has been added to the relevant Projects, they should notice no difference to their everyday tasks. Additionally, changing a Users list of accessible Projects or Roles should not cause any adverse effects to a User, other than changing their ability to interact with a given Survey.

Roles listed here are currently not implemented, but will be designed with the following ideas in mind.They will be defined better once I have the Data Entry Tools properly adding new Surveys to completion. The Roles will be something along the lines of :

Any standard logged in user; can search only Global Visibility Surveys

Project User
A User assigned as a member of a Project, can view all Global Surveys, as well as View and Create abilities on Surveys within Projects they are members of.

WISERD Hub Admin
Able to View, Create and change Visibilities of Surveys and Create Projects. Project Users may tag Surveys for Global Visibility, but the Survey will first be moderated by a Hub Admin before the Global Visibility can be assigned.

Can Create, View, Modify, Moderate, change Visibility of Surveys, elevate Users to different Roles, create new Projects and assign Users to those Projects. Any and all other tasks can be completed by a SuperAdmin, and it is assumed that only a few users will be required with this Role. It is also likely that the Role will involve direct access to the database, in situations where a GUI does not exist to provide the required functionality.

Obviously any often repeated tasks by any User with any Role should be replaced by an automated task, or a GUI which aids this tasks completion.

So far, the implementation in no way affects the functionality of the DataPortal, and the new database tables and GUI additions sit on top of the existing service. In this way, backward compatibility should be assured, with the defaults labelled above as the clarification required where new functions require additional information not provided by the previous database.

I'll post again soon on how the Roles are defined, specifically how Yii's inbuilt RBAC (Role Based Access Control) abilities are used to provide permissions and access control to specific users.

Friday, 9 August 2013

On Data Entry Tools and Services

This week, having returned from my Summer break, I've been looking at the data entry tools originally used to add Survey and Qualitative Metadata to the DataPortal.

These originally (GeoPortal v. alpha) took the form of a VB.NET based web front-end for adding Qualitative metadata and a stand-alone VB application "wizard" tool for adding Survey Metadata.

As hinted at in a previous post here, I'm not a fan of VB, and as the main DataPortal code has now been moved over to more cross-platform friendly languages, I decided to convert the code from these tools too.

I have started rewriting the Survey Metadata entry tool in Java, although I am yet to decide if the eventual tool will be in this form. I'll come back to this later.

The Qualitative tool GUI was relatively trivial to move over the Javascript (ExtJS). The metadata stored is a list of most commonly used words, output from Edina Unlock, and output from OpenCalais.

Most common words are easily found when iterating through the text in PHP, we store words used more than a set threshold averaged over the number of pages.

The OpenCalais output is retrieved from the Enlighten webservice, a simple HTTP POST with an API key, the text to scan, and a small piece of XML to define search parameters. The result returns RDF, in theory...

As it turns out, the RDF returned from OpenCalais is embedded in a "String" tag, inside some XML. This means that a standard XML parser will fail, as it finds the RDF and bugs out. An RDF parser will fail due to the funky XML in the way. After several hours of frustration, I googled the correct magic words to return this API which, bizarrely, modifies the default PHP xml_parser with some new methods to clear out the junk wrapper, returns the RDF proper, and hands it off to "RAP - RDF API for PHP" (website, SourceForge), to create a PHP model of the document. This can then be iterated through to find the relevant tags we're interested in. The RAP API is actually really well designed, and clearly can do a lot more than what I require here, and it's one to keep in mind for later. I'm sure future RDF style metadata objects will be created as we go along with this project.

Edina Unlock is the next challenge. It seems that at sometime in the last 2 years, (since GeoPortal alpha was first created), the service has changed it's API, and no longer works. This will be one of the next challenges going forward, but the documentation seems sound enough.

There are two API's, "text" and "places". At first glance, it appears that "text" jobs are now submitted, remotely executed, the user polls for progress, then retrieves results once they are produced. This is wrapped in a user auth model to poll for a specific users job. It's a bit more technical for implementation, and will involve our data entry team to watch a progress bar till the results are ready, but I guess it makes sense on their end if they're receiving a ton of requests. I'm not sure yet if this service is required, as we have Edina Unlock working already, we'll see.

The "places" API seems far easier to use, with simple jsonp or HTTP GET's returning JSON formatted data.

I'll update this post with my experiences of this once I've figured it out.

As always, the DataPortal code is on GitHub, so you can check out my solutions should you desire more in depth demonstrations of how I've managed to get it all working.