1. 程式人生 > >How to Share Data (Hint: “Thoughtfully”)

How to Share Data (Hint: “Thoughtfully”)

This blog post is part of a blog series on “Open Data for Public Good,” a collaboration between the AWS Institute and AWS Open Data aimed at identifying emerging issues around open data and offering best practices for data practitioners. Read the first post here.

This is a false color image of Wellington New Zealand, produced by data captured by the Landsat 8 satellite on May 29 2015. The red color indicates vegetation. Landsat is data made available from the U.S. Geological Survey.

Sharing data requires more than just making it available for download or creating an API to access it. In many ways, sharing data is similar to shipping a software product. Just like software; data is made up of digital information; it requires documentation; it will be used by groups of users who may require support; and it may become vital to those users’ work. Another common characteristic of software is that it often gets updated over time as software developers learn from their users and adapt to new technologies.

So how should you share data? Thoughtfully. There are technical and non-technical considerations. Below, we explore the non-technical considerations, which require a focus on data governance and community engagement.

Trust

For data to be useful, it must come from a reliable source that users trust. If users do not believe that data has been produced or documented with sufficient rigor, they will be less likely to rely on it. The USGS Landsat program provides an important lesson in trust.

Launched in 1972, the Landsat program has been a reliable source of data and imagery of the Earth for decades. This continues to be the case despite an incident in 2003 when one of the instruments on Landsat 7 failed, which caused gaps in data produced by the sensor from that point on. The Landsat team worked to document how users could still use data from Landsat 7 despite the limitations imposed by the instrument failure. This transparency in operations helped maintain the trust of users in data generated by the program. If users understand the value and limitations of data, they will be more likely to use it, even if it’s not perfect. This requires open and clear communication with users.

Jupiter: Hubble’s decades of observations of the planets in the outer Solar System allow astronomers to study their seasonal variations and provide support for NASA’s dedicated suite of spacecraft that visit these celestial bodies. Credit: NASA, ESA, and A. Simon (NASA/GSFC).

Jupiter: Hubble’s decades of observations of the planets in the outer Solar System allow astronomers to study their seasonal variations and provide support for NASA’s dedicated suite of spacecraft that visit these celestial bodies. Credit: NASA, ESA, and A. Simon (NASA/GSFC).

Documentation

If data is not documented, the audience will be limited. There are times when users will do the detective work required to interpret poorly documented data, but a lack of documentation will usually frustrate users to the point that they will not trust the data. Users should be able to understand when data was created, the methodology used to create it, how to interpret the values contained in it, and if there are any licenses that may limit how the data can be used. Documentation should also include a method to contact someone who can answer questions about the data. Ideally, documentation should include tutorials that users can follow to get hands-on experience with data.

Reliability

Developers will not put in the effort to create tools or applications based on data if they have no assurance that the data will be available in the future. Assuring that data will be available on an ongoing basis is important when sharing large volumes of data.

The cloud provides infrastructure for storage, low-latency access, and transfer of data. This becomes more important as data volumes grow. When data is shared in the cloud, users can access data quickly and directly from the source, which assures them that they can reliably access a trustworthy copy of the data without the need to duplicate storage in their own account.

Conclusion

In an era where governments and organizations are opening their data up to the public, making sure that data is shared in a deliberate and transparent way is key to establishing trust with users and increasing the utility of data. For information on technical considerations behind sharing data, visit opendata.aws.

A post by Jed Sundwall, Manager, Open Data Program, AWS

相關推薦

How to Share Data (Hint: “Thoughtfully”)

This blog post is part of a blog series on “Open Data for Public Good,” a collaboration between the AWS Institute and AWS Open Data aimed at ident

Data Wrangling文摘:How to share data with a statistician

原文地址:GitHub - jtleek/datasharing: The Leek group guide to data sharing  https://github.com/jtleek/datasharing This is a guide for anyone who needs to

How to Access Data in a Property Tree

clas 代碼 3.0 float itl compute iter () find 在屬性樹裏怎麽訪問數據?屬性樹類似於(幾乎是)一個標準容器,其值類型為pair。它具有通常的成員函數,如insert、push_back、find、erase等,當然可以使用這些函數來填充

high-speed New Algorithm Enables Wi-Fi Connected Vehicles to Share Data

www.inhandnetworks.com Graphic: Christine Daniloff Next month at the ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing, a tea

How To Learn Data Science If You're Broke

Over the last year, I taught myself data science. I learned from hundreds of online resources and studied 6–8 hours every day. My goal was to start a caree

How To Learn Data Science If You’re Broke

Advice for executing your curriculum.1. Concepts will come at you faster than you can learn them.There are literally thousands of web pages and forums expl

Ask HN: How to manage data for local microservice dev envs?

Microservice development accommodates several kinds of dev env setups. For cross-service integration testing during development (rather than test automatio

How to Prepare Data For Machine Learning

Tweet Share Share Google Plus Machine learning algorithms learn from data. It is critical that y

How to Load Data in Python with Scikit

Tweet Share Share Google Plus Before you can build machine learning models, you need to load you

How to read *.data in Matlab and Python

See P2 of "HW2_545_2018_"In Matlab:z = dlmread('spambase.data',',');In Python:import numpy as npz = np.genfromtxt('spambase.data', dtype=float, delimi

How to Change Default Location for Outlook Data File (PST & OST)

note right folder dialog https error data locate http Is there a way to change the default location of new .pst file when create a new e-

How to convert BigDecimal to Double in spring-data-mongodb framework

public 行存儲 沒有 err 自己 dbr tom odbc sim 問題描述:我們都知道對於涉及錢的數據必須使用BigDecimal類型進行存儲,今天在查詢mongo時仍然有精度問題,雖然我在代碼中使用了Big Decimal類型,但mongo中使用的是double

How to SUM and GROUP BY of JSON data?

How to SUM and GROUP BY of JSON data? Source: StackOverflow.com Question Some server-side code actually generates a JSON formatted stri

[轉]How to display the data read in DataReceived event handler of serialport

本文轉自:https://stackoverflow.com/questions/11590945/how-to-display-the-data-read-in-datareceived-event-handler-of-serialport   問: I have the followin

How To Load CSV Machine Learning Data in Weka (如何在Weka中載入CSV機器學習資料)

How To Load CSV Machine Learning Data in Weka 原文作者:Jason Brownlee 原文地址:https://machinelearningmastery.com/load-csv-machine-learning-data-weka/

How To Build A Money Data Type In JavaScript

Last time I wrote a step-by-step example of how to apply Inside Out Test-Driven Development to a problem using JavaScript. That post used the Number type t

Ask HN: How to best teach algorithms and data structures?

Here's what I try to do:1. Start with the intuition. (The big concepts i.e. for BigO trying to figure out the best, worst and average number of steps it ta

Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data

Simpson’s Paradox occurs when trends that appear when a dataset is separated into groups reverse when the data are aggregated. In the restaurant recommenda

Ask HN: How to model numerical energy data in Wolfram Alpha

I'm working on a dataset that contains energy supply and consumption data. This is just a hobby and idea is to do visualizations and simple moodelling base

Ask HN: How to implement caching for dynamic user data in sites like HN, Reddit?

Why would you start by caching it?What are you storing the data in currently? If relational, I'd advise starting with simple relational tables (post_commen