[BiO BB] System for Crowdsourcing Data Analysis [Was: position in cancer informatics]

Fri Jul 20 15:34:03 EDT 2012

Hi All,

Stefan Decker wrote:
> The discussion seem to point to a deeper question: how to enable crowd
> sourcing of the analysis of these kind of data sets? This may involve
> running of analysis code or maybe even manual work.
> What kind of computational infrastructure would we need to enable
> this? And how do we validate and aggregate results?

There is a system online [1] for crowdsourcing data analysis knowledge in
Executable English , with examples, such as [2]. The knowledge is used to
answer questions over web databases, with English explanations of the
results for validation.   In some cases, the explanations can be used as
plans.

[3] is a short overview paper, and besides the live system [1], there are
several presentations, movies etc on the site.

Apologies if you have seen this before, and thanks for comments.

                                                    -- Adrian

[1]  Internet Business Logic
A Wiki and SOA Endpoint for Executable Open Vocabulary English Q/A over SQL
and RDF
Online at www.reengineeringllc.com
Shared use is free, and there are no advertisements

[2]  www.reengineeringllc.com/demo_agents/MedMine2.agent

[3]
www.reengineeringllc.com/A_Wiki_for_Business_Rules_in_Open_Vocabulary_Executable_English.pdf

On Fri, Jul 20, 2012 at 10:00 AM, David Booth <david at dbo <david at dbooth.org>

oth.org <david at dbooth.org>> wrote:

> On Fri, 2012-07-20 at 10:22 +0100, Stefan Decker wrote:
> > The discussion seem to point to a deeper question: how to enable crowd
> > sourcing of the analysis of these kind of data sets? This may involve
> > running of analysis code or maybe even manual work.
> > What kind of computational infrastructure would we need to enable
> > this? And how do we validate and aggregate results?
>
> Unfortunately, in the USA at least, the biggest barriers are not
> technical, but social, because: (a) health information privacy laws such
> as HIPAA
> http://www.hhs.gov/ocr/privacy/
> make it difficult or impossible to publish the raw data that would be
> most useful for research; and (b) researchers do not have the incentive
> to publish their data that might allow other researchers to make
> discoveries.
>
> There is a tension between privacy and the usefulness of data for
> research, because full de-identification removes information that can be
> critical to determining cause and effect, such as dates, times and
> locations.
>
> We need better ways -- both bottom-up, such as http://weconsent.us/, and
> top-down, such as legal changes -- to both encourage the availability of
> research data and to facilitate appropriate access to it, such as
> establishing well-defined tiers of access for different purposes.
>
> We need technical solutions that will help us work through and around
> these social barriers.
>
>
> --
> David Booth, Ph.D.
> http://dbooth.org/
>
> Opinions expressed herein are those of the author and do not necessarily
> reflect those of his employer.
>
>
>