"Man, I love
science!" – Beakman's World
I work on large scale machine learning at Google. I am also an Adjunct Professor at New York University. Previously, I was a portfolio manager and research director at Cubist Systematic Strategies, applying machine learning to quantitative trading. Before that, I was variously a hedge fund cofounder, CTO, quantitative portfolio manager, machine learning researcher and software engineer at Ophir Partners, Trexquant, WorldQuant, Merrill Lynch, Microsoft Research, IBM Research, Google and Bloomberg. You can learn more about my professional history here, along with my academic publications and lectures.
I graduated with a Ph.D. in machine learning from the Machine Learning Department
within Carnegie Mellon University's
School of Computer Science, under the supervision of
William W. Cohen.
My research is generally concerned with machine learning and data mining, with an underlying interest in
producing features and models that are robust to changes in the distribution of their underlying data (Thesis proposal, ICDM 2007). To this end, I am particularly interested in transfer learning with an emphasis on domain adaptation.
I was working on The Querendipity Project, whose goal
is to more accurately integrate and exploit the many heterogeneous sources of information available to a modern scientist.
Taking advantage of, among other sources, citation networks (such as CiteSeer),
full text archives (such as PubMed Central), and curated databases (such as
the Saccharomyces Genome Database), we are able to help users discover both
relevant and novel research related to their interests (ICWSM 2009).
I was also a member of the SLIF team working on mining text
and images together for bioinformatics applications. Our team was one of four finalists in the $50,000 Elsevier Grand Challenge. Specifically, my work deals with using the text of biological journal articles (e.g. captions, abstracts and main text) along with their associated images (depicting cells, proteins, graphs, etc) in order to better identify entities in both media. The combination of these two expressions (text and images) of the same underlying concept (the experiment being
performed) into new features, jointly describing both the text and images, is a closer representation of the actual
object a user would be interested in, rather than disjoint features of text and images alone. A related problem is that of
transfer learning. In this case, we use models and named entity extractors trained on one type of data (abstract
text, for instance) and adapt them to be applied to a related, but distinct type of data (caption text) (CIKM 2008, ACL 2008). The intuition is
that it is easier to learn a certain concept once a related concept has already been mastered.
I have also been lucky to pursue related work outside of school during summer internships. While working with Hang Li and Tie-Yan Liu in the Web
Search and Mining group at Microsoft Research Asia we developed novel semi-supervised and transfer learning based methods for improving internet search through
query-dependent ranking (SIGIR 2008). The idea behind this work is that, regardless of the specific topic users are interested
in, there are common features linking certain types of queries together. For instance, users searching for either
person or company name might both be most interested in the corresponding home page (a navigational query), while
searchers for a disease or country name might be more interested in authoritative sources of information about these
(informational queries). By modeling and leveraging these distributions of types of queries we can better decide
what, exactly, users
want and deliver that to them.
Relatedly, while in the Data
Analytics group at IBM Research Watson, I worked with Naoki Abe and Yan Liu on methods
causal models from temporally ordered data (KDD 2007). We felt that the interpretability offered by a causal model was quite
valuable for the end user in understanding the process being studied. This type of understanding is an essential
component of the scientific process since it leads the researcher to an idea of what experiment to perform next.
An accurate predictive model, without interpretation, provides little insight as to what direction is best to pursue.
This was also the motivation behind my work with Richard
Scheines and Joseph E. Beck on discovering predictive,
semantically and scientifically interpretable high-level features as functions of raw, event level data (AAAI 2006, 2005).
I did my undergraduate work
in the Intrusion
Detection System Group within the Computer
Science Department of Columbia
University, under the supervision of Professor
Salvatore J. Stolfo and Eleazar
Eskin. My work there dealt with applying kernel
methods and support vector machines to the problem of
clustering data (binaries, system calls,
packets, etc.) in order to identify possible attacks (DMSA 2002).
In my spare time I work on applying machine learning
techniques towards opponent modeling for
Texas Hold 'em poker and Tic-Tac-Toe.
- Amr Ahmed, Andrew Arnold, Luis Pedro Coelho, Joshua
Kangas, Abdul-Saboor Sheikh, Eric Xing, William Cohen and Robert F. Murphy (2009).
"Structured Literature Image Finder."
In proceedings of the Annual Meeting of The ISMB BioLINK Special
(BioLINK), June 28-29, 2009, Stockholm, Sweden.
- Andrew Arnold and William W. Cohen (2009).
"Information Extraction as Link Prediction: Using Curated Citation Networks to
Improve Gene Detection."
In proceedings of the AAAI Conference on Weblogs and Social Media
(ICWSM), May 17-20, 2009, San Jose, CA. (Extended version) (Poster).
- Andrew Arnold and William W. Cohen (2008).
"Intra-document Structural Frequency Features for Semi-supervised Domain Adaptation."
In proceedings of the Association
for Computing Machinery Conference on Information and
Knowledge Management (CIKM), October 26-30, 2008, Napa Valley, CA. (Slides)
- Andrew Arnold, Ramesh Nallapati and William W. Cohen (2008).
"Exploiting Feature Hierarchy for Transfer Learning in Named
In proceedings of the 46th Annual Meeting of the Association
for Computational Linguistics: Human Language Technologies (ACL:HLT), June 15-20, 2008, Columbus, OH. (Slides)
- Xiubo Geng, Tie-Yan Liu, Tao Qin, Andrew Arnold, Hang Li and Harry Shum (2008).
"Query Dependent Ranking Using K-Nearest Neighbor."
In proceedings of the 31st Annual International ACM
SIGIR Conference, July 20-24, 2008, Singapore.
Andrew Arnold, Ramesh Nallapati and William W. Cohen (2007).
"A Comparative Study of Methods for Transductive Transfer Learning." In proceedings of the IEEE International Conference on Data Mining (ICDM)
2007 Workshop on Mining and Management of Biological Data, October 28, 2007, Omaha, NE. (Extended version) (Slides)
Andrew Arnold, Yan Liu and Naoki Abe (2007).
"Temporal Causal Modeling with Graphical Granger Methods." In proceedings
of the Thirteenth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, Aug 12-15, 2007, San Jose, CA. (Slides) (Video)
Andrew Arnold, Joseph E. Beck and Richard Scheines (2006).
"Feature Discovery in the Context of Educational Data Mining: An Inductive
In proceedings of the AAAI2006
Workshop on Educational
Boston, MA, 7-13.
Andrew Arnold, Richard Scheines, Joseph E. Beck and Bill Jerome (2005).
"Time and Attention: Students, Sessions, and Tasks."
In proceedings of the AAAI2005
Workshop on Educational
Pittsburgh, PA, 62-66.
- Eleazar Eskin, Andrew Arnold, Michael Prerau, Leonid Portnoy and
Salvatore Stolfo (2002). "A Geometric Framework for Unsupervised Anomaly
Detection: Detecting Intrusions in Unlabeled Data."
In Daniel Barbara and Sushil Jajodia (editors), Applications of Data Mining
Computer Security, Kluwer.
- "Intra-document Structural Frequency Features for Semi-supervised Domain Adaptation." Association for Computing
Machinery Conference on Information and
Knowledge Management (CIKM), Napa, CA (October 29, 2008).
- "Exploiting Document Structure and Feature
Hierarchy for Semi-supervised Domain Adaptation." Machine Learning
Lunch. Carnegie Mellon University, Pittsburgh, PA (September 29, 2008). (Video)
- "Exploiting Feature Hierarchy for Transfer Learning in Named Entity
Recognition." 46th Annual Meeting of the Association
for Computational Linguistics: Human Language Technologies (ACL:HLT), Columbus, OH (June 16, 2008).
- "A Comparative Study of Methods for Transductive Transfer Learning." IEEE International Conference on Data Mining (ICDM)
2007 Workshop on Mining and Management of Biological Data, Omaha, NE (October 28,
- "Temporal Causal Modeling with Graphical Granger Methods." Thirteenth ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, San Jose, CA (August 13,
- "A Comparison of Methods for Transductive Transfer Learning." Information Retrieval and Mining Seminar. Microsoft
Research Asia, Beijing, China (May 30, 2007).
- "Feature Discovery in the
Context of Educational Data Mining: An Inductive
Approach." IBM Mathematical Sciences Department Seminar. IBM Watson Research, Yorktown Heights, NY (July 6,
- ""Causal Modeling for Anomaly
Detection." IBM Mathematical Sciences
Department 2006 Summer Student Seminar Series. IBM Watson Research, Yorktown Heights, NY (June 23, 2006).
andrew . arnold @ gmail . com