NELL learns by reading the web, the Curse of Dimensionality Visualised and Google's Physical Web (#wic)
This week there’s NELL, a computer system that learns by reading the web and asking for feedback on Twitter, the Curse of Dimensionality visualised, Google’s ‘Physical Web’ project and the Floodwatch research project on personal data and advertising.
This Week in Context
Your Weekly Update on All Things Context, October 10 2014
Further on personal data, tomorrow our our founder will be speaking about “Privacy in the era of Artificial Intelligence: An oxymoron?” at Bavo Van Den Heuvel’s 10 years Data Protection Officer celebration (congratulations!!), and I have joined the Privacy vs UX debated with a post on the Argus Labs blog: For ‘significantly better products’ UX design should balance privacy & personalisation.
“When a machine learns from experience, there are few guarantees about whether or not it will learn what you want. And it might learn something that you didn’t want it to learn, and yet it can’t forget. This is just the beginning.”
Diane Ackerman, excerpted from ‘The Human Age: The World Shaped by Us”
1. NELL, a computer system that learns as it reads the web, and.. solicits feedback on twitter
Never-Ending Language Learning (NELL), is a machine learning research project at Carnegie Mellon that learns by reading the web. It “reads”, or extract facts from text found in hundreds of millions of web pages. So far, NELL has accumulated over 50 million candidate beliefs by reading the web, and it is considering these at different levels of confidence, having high confidence in over 2 million of these beliefs. Follow NELL on twitter, where it solicits feedback on its beliefs.
(Quartz has an overview of some more awesome bots on Twitter.)
2. The Curse of Dimensionality: Visualised
Feature engineering is what makes machine learning more of an art than a science. The more meaningful features you come up with, the easier it becomes for a classifier to learn a function that can separate datapoints of class A from datapoints of class B. Suprisingly, classification rates only tend to increase up till a certain point when we keep adding features, after which unexpected errors seem to emerge. This is due to the so called curse of dimensionality, nicely visualised by Christopher Olah on his GitHub blog. (Nominated by V42.)
3. Google’s open source blueprint lets connected devices commune without specialized apps
A new Google Project, the Physical Web, wants to address the problem of scaling the ‘Internet of Things’. It aims to create an open standerd for the Internet of Things which would allow users to interact with vending machines, posters, .. in a location-aware organic way through Bluetooth and web technology, without the need for a specialised app.
4. Data Mining Reveals The Secret To Matching Crowdfunding Projects To Investors
From Joren‘s reading list comes a machine learning algorithm that should be able to accurately match crowd-sourcing project proposals with potential investors, like first dates on Valentines day. The best strategy achieves, on average, 84% of accuracy in predicting a list of potential investors’ Twitter accounts for any given project.
5. You are not your browser history
Floodwatch, a ‘collective ad monitoring tool for social good’, is a Chrome extension that tracks the ads you see as you browse the internet. The project wants to fight against ‘surveillance advertising’ by developing a detailed picture of what information the ad industry is using to decide exactly which ad to show to a particular person:
“As we grow, our next steps include improving our ad detection algorithm, automatic ad classification, and developing further analytical and visualization tools to help you understand how your ad data compares to others’, and further work on reverse-engineering ad profiles to gain insight into advertisers’ methodologies.“
Enjoy the reads, and have a great weekend!
And if you haven’t done so yet, kindly consider subscribing to the Week in Context here.