[Theory-annc] Theory Seminar: Liu Yang to speak about "Active Property Testing"

Umesh Vazirani vazirani at eecs.berkeley.edu
Mon Sep 24 15:52:20 PDT 2012

Title: Active Property Testing
Speaker: Liu Yang, CMU
4 pm, Sept 24, Wozniak Lounge.


One motivation for property testing of boolean functions is the idea that
testing can provide a fast preprocessing step before
learning.  However, in most machine learning applications, it is not possible
to request for labels of arbitrary examples constructed by an algorithm.
Instead, the dominant query paradigm
in applied machine learning, called {\em active learning}, is one where the algorithm
may query for labels, but {\em only on points in a given (polynomial-sized)
unlabeled sample}, drawn from some underlying distribution D.
In this work, we bring this well-studied model to the domain of {\em testing}.

We develop both general results for this {\em active testing} model as
well as efficient testing algorithms for several important properties for learning,
demonstrating that testing can still yield substantial benefits in
this restricted setting. For example, we show that testing unions of
d intervals can be done with O(1) label requests in our setting,
whereas it is known to require \Omega(d) labeled examples for
learning (and \Omega(\sqrt{d}) for passive testing [KR00] where
the algorithm must pay for {\em every} example drawn from D).  In
fact, our results for testing unions of intervals also yield
improvements on prior work in both the classic query model (where any
point in the domain can be queried) and the passive testing model as
well. For the problem of testing linear separators in R^n over the
Gaussian distribution, we show that both active and passive testing
can be done with O(\sqrt{n}) queries, substantially less than the
\Omega(n) needed for learning, with near-matching lower bounds.  We
also present a general combination result in this model for building
testable properties out of others, which we then use to provide
testers for a number of assumptions used in semi-supervised learning.

In addition to the above results, we also develop a general
notion of the {\em testing dimension} of a given property with respect
to a given distribution, that we show characterizes (up to
constant factors) the intrinsic number of label requests needed to
test that property.  We develop such notions for both the active
and passive testing models.  We then use these dimensions to prove a
number of lower bounds, including for linear separators and the class
of dictator functions.

More information about the Theory-annc mailing list