Overview • Abstract • Course Outcomes • Prerequisites • Schedule • Supplementary material • Homework • Course project

Grading:
- Attendance and participation: 25%
- Homework: 10%
- Midterm exam: 15%
- Course project: 50% total
- Project proposal: 15%
- Midterm presentation: 35%
- Final presentation: 50%
Collaboration policy: As in the real world, collaboration is encouraged, but plagiarism is not. Transparency is the difference. If you collaborate (with other members of this class, other classes, colleagues, friends, random people on the internet) that is fine, just state so. If the contributions of authors for a particular work is uneven, just give a rough estimate of each author's contribution (e.g., A did most of the math, B did most of the programming, and C did the literature review). Feel free to use all publicly available resources on the internet, but please cite them if they are used as more than basic background research (both to give proper credit to the original author and to help your peers discover new resources). Since this is a special topics course, I tend to assume students are interested in learning the material and thus give the benefit of the doubt. If proper credit is not given, however, or if bad faith / dishonesty is shown, consequences can be severe, including failing the class and referral to the administration.
The fast-growing field of news analytics requires large databases, fast computation, and robust statistics. This course introduces the tools and techniques of analyzing news, how to quantify textual items based on, for example, positive or negative sentiment, relevance to each stock, and the amount of novelty in the content. Applications to trading strategies are discussed, including both absolute and relative return strategies, and risk management strategies. Students will be exposed to leading software in this space. Students will benefit from some familiarity with basic probability, statistics and programming (python), and an interest in natural language processing (NLP) or computational linguistics. While the course will introduce a few trading strategies, it will also focus on NLP as a tool in its own right, applicable to domains outside of quantitative trading strategies. There will be readings, discussion, homework, a midterm exam and a final project.
After this course you should be able to: - Build a basic trading strategy based on natural language signals:
- Identify, locate and clean appropriate data sources.
- Formulate a trading hypothesis based on natural language signals.
- Investigate this hypothesis qualitatively and quantitatively, using statistical, programming, nlp and trading best practices.
- Present the results of your investigation to your peers for feedback and analysis.
- Read an academic paper / industry whitepaper about natural language techniques applied to trading and have a basic understanding of it.
- Have a sense of where the state of the art is currently and where it might head in the near future. Know the difference between science fiction and reality.
- Decide if you would like to pursue further research in this area.
- Foundations of Financial Technology (FRE-GY 6153) or equivalent:
- Basic knowledge of financial markets (What is a stock? How does it trade?)
- Basic statistics (What is variance?)
- Big Data in Finance (FRE-GY 7221) or equivalent:
- Basic programming ability (Parse a csv file and calculate the variance of the values. Python/R/Matlab)
- Test: Given enough time and access to the internet could you:
- Determine the 10 largest US stocks by market capitalize as of 12/31/2017
- Download the closing prices for these stocks for the last 5 Tuesdays of 2017
- Calculate the variance of each stock during that period
**Tuesday, September 4, 2018**:- Course overview
- Introduction to natural language processing (NLP) and machine learning (ML).
- HW 1 assigned (HW 1 data), due 6:00 pm (beginning of class) on Tuesday, September 11, 2018 via e-mail to the instructor.
Slides: - NLP (Stanford)
- Text Processing (Stanford)
- Naive Bayes (Stanford)
- Sentiment (slides) (Stanford)
- Sentiment (tutorial) (Stanford)
- word2vec (Stanford)
Supplemental: - Maximum Entropy Classifiers (Stanford)
- Information Extraction and Named Entity Recognition (Stanford)
- Summarization (Stanford)
**Tuesday, September 11, 2018**:- Machine learning folow-up:
- Overfitting / Bias-variance tradeoff (Berkeley)
- Introduction to quantitative trading (GA Tech)
- Project introduction, discussion.
- HW 1 is due.
- Machine learning folow-up:
**Tuesday, September 18, 2018**:- Natural language processing for quantitative trading.
- Project proposals are due.
- HW1 grades returned.
- Midterm exam review..
**Tuesday, September 25, 2018**:- Midterm exam
- Machine learning for quantitative trading.
**Tuesday, October 2, 2018**:- Project midterm presentations and discussion.
- Advanced topics in natural language processing and machine learning.
- Project midterm reports are due.
**Tuesday, October 9, 2018**:- NO CLASS (NYU Legislative Day - Classes will meet according to a Monday schedule)
**Tuesday, October 16, 2018**:- NO CLASS.
- Work on projects, prepare for presentations.
**Tuesday, October 23, 2018**:- Project presentations and discussion.
There are - Natural Language Processing, Dan Jurafsky and Christopher Manning, Stanford Coursera.
- Natural Language Processing, Jason Eisner, Johns Hopkins (JHU).
- Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schutze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.
- Overfitting / Bias-variance tradeoff, Daniel Geng and Shannon Shih, UC Berkeley.
- Language and Statistics , Roni Rosenfeld , CMU.
- Sentiment Analysis and Opinion Mining (tutorial), Bing Liu, UIC.
- Sentiment Analysis and Opinion Mining (book), Bing Liu, UIC.
- Introduction to Natural Language Processing, David Smith, UMass.
- Natural Language Processing with Deep Learning, Richard Socher, Stanford.
- Machine Learning for Trading, Tucker Balch, GA Tech.
- Lecture notes, Octavian Blaga.
- NLP and Sentiment Driven Automated Trading, Atish Davda, Parshant Mittal, Michael Kearns, UPenn.
- Max Dama on Automated Trading, Max Dama
- Quantopian
- QuantStart
- Quant StackExchange
- Quora
- IEX (please attribute appropriately):
- Stock charting data (many other types of data are available via a similar API).
- HW 1 assigned (HW 1 data), due 6:00 pm (beginning of class) on Tuesday, September 11, 2018 via e-mail to the instructor.
TBD |