1. 程式人生 > >generation/crowdsourcing platform, for building open datasets | Hacker News

generation/crowdsourcing platform, for building open datasets | Hacker News

-- Background --

Metro allows data science projects to be powered by a crowd of people who self-generate the data for it. One of its primary uses is to create open datasets collaboratively, where every contributor is able to access all of the data.

I've been building it for the past few months and I want to gather some feedback! We can build a useful dataset of translations, and then we can start making new DataSources to power new datasets (labeled images, named entity recog.).

-- How it works --

Data generation happens on your computer, using "DataSources". A DataSource is a community-made, open-source plugin for Metro, which generates data for you.

You simply install the Metro browser extension and activate the DataSources which power the project. You'll also need to signup, which doesn't require email verification right now so it takes about 10 seconds.

-- Sentence Translation Project --

I made an Open Data project for gathering sentence-level translations in 7 languages, and I would you to try it out!

It's powered by a DataSource (https://metro.exchange/datasources/text-translation/) which allows you to highlight any text, right-click, press a "translate" button, and enter your translation.

-- Future --

I want Metro to be able to support open-data generation of any scale and eventually be the backbone for startups powered by ethical, self-generated data because it provides access to data from any platform on the internet while giving users true autonomy over their data.

Any feedback, help, or just usage of the system is really useful for trying to improve the problems that I just can't see yet.

Thank you!