COVID-19 modifications are marked in red
Course: News Analytics and Machine Learning (NYU FRE GY 7871 - I2)
Term: Fall 2020, first half
Instructor: Andrew Arnold (firstname.lastname@example.org)
Disclaimer: All views and opinions expressed by the instructor in this course are his own and do not reflect the views, opinions, or confidential information of any of his current or former employers.
Office hours: by appointment
Litai Ren (email@example.com). Office hours: 9 - 11 am (EST) every Friday, or by appointment 2 - 11 pm (EST) every Friday. E-mail for zoom id.
Time: Tuesdays, 6:00 PM - 8:41 PM
Course style: Due to COVID-19, this class will be taught virtually. There will be no in-person meetings. Classes will consist of:
Given the small class size, the course will be taught as a colloquium. New topics will be introduced in interactive lecture format, and then discussed and expanded by the group. These topics will then be built upon in the team projects, which will be further discussed and presented to the class. Active class attendance and participation is required (whether via live zoom or virtual forum participation). Since this will be a challenging mode of instruction/learning for all, we will try to be as accommodating and adaptive as possible. Your active participation and feedback (whether live or virtual) will be crucial for making the class as successful as possible!
- Live on-line lectures (via Zoom), including:
For those not able to attend lectures live, the sessions will be recorded, and questions/discussion can take place via NYU Classes forum or e-mail.
- Instructor presentation with virtual whiteboard and slides
- Live student questions/interaction
- Pre-recorded lectures / slide presentations
- Off-line resources such as slides, textbooks, videos, etc
- Virtual classroom discussion / presentations
- Virtual project group discussions / collaboration
* Note about late registration: Since the class only meets seven times and the first homework is assigned on the first day of class, it may be difficult to make up for missed homework and attendance if you miss even the first day of class. Please let me know if you are considering joining the class late so we can discuss the implications.
- Attendance and participation*: 25% (non-live attendance consists of viewing recorded lectures and answering discussion questions)
- Homework*: 10%
- Midterm exam: 15% (this will be a live, proctored exam)
- Course project: 50% total (when necessary, presentations can be pre-recorded, taking questions off-line)
- Project proposal: 15%
- Midterm presentation: 35%
- Final presentation: 50%
Collaboration policy: As in the real world, collaboration is encouraged, but plagiarism is not. Transparency is the difference. If you collaborate (with other members of this class, other classes, colleagues, friends, random people on the internet) that is fine, just state so. If the contributions of authors for a particular work is uneven, just give a rough estimate of each author's contribution (e.g., A did most of the math, B did most of the programming, and C did the literature review). Feel free to use all publicly available resources on the internet, but please cite them if they are used as more than basic background research (both to give proper credit to the original author and to help your peers discover new resources). Since this is a special topics course, I tend to assume students are interested in learning the material and thus give the benefit of the doubt. If proper credit is not given, however, or if bad faith / dishonesty is shown, consequences can be severe, including failing the class and referral to the administration.
Diversity, equity and inclusion: The NYU Tandon School values an inclusive and equitable environment for all our students. I hope to foster a sense of community in this class and consider it a place where individuals of all backgrounds, beliefs, ethnicities, national origins, gender identities, sexual orientations, religious and political affiliations, and abilities will be treated with respect. It is my intent that all students’ learning needs be addressed both in and out of class, and that the diversity that students bring to this class be viewed as a resource, strength and benefit. If this standard is not being upheld, please feel free to speak with me.
The fast-growing field of news analytics requires large databases, fast computation, and robust statistics. This course introduces the tools and techniques of analyzing news, how to quantify textual items based on, for example, positive or negative sentiment, relevance to each stock, and the amount of novelty in the content. Applications to trading strategies are discussed, including both absolute and relative return strategies, and risk management strategies. Students will be exposed to leading software in this space.
Students will benefit from some familiarity with basic probability, statistics and programming (python), and an interest in natural language processing (NLP) or computational linguistics. While the course will introduce a few trading strategies, it will also focus on NLP as a tool in its own right, applicable to domains outside of quantitative trading strategies.
There will be readings, discussion, homework, a midterm exam and a final project.
After this course you should be able to:
- Build a basic trading strategy based on natural language signals:
- Identify, locate and clean appropriate data sources.
- Formulate a trading hypothesis based on natural language signals.
- Investigate this hypothesis qualitatively and quantitatively, using statistical, programming, nlp and trading best practices.
- Present the results of your investigation to your peers for feedback and analysis.
- Read an academic paper / industry whitepaper about natural language techniques applied to trading and have a basic understanding of it.
- Have a sense of where the state of the art is currently and where it might head in the near future. Know the difference between science fiction and reality.
- Decide if you would like to pursue further research in this area.
- Foundations of Financial Technology (FRE-GY 6153) or equivalent:
- Basic knowledge of financial markets (What is a stock? How does it trade?)
- Basic statistics (What is variance?)
- Big Data in Finance (FRE-GY 7221) or equivalent:
- Basic programming ability (Parse a csv file and calculate the variance of the values. Python/R/Matlab)
- Test: Given enough time and access to the internet could you:
If so, you are qualified to take this course.
- Determine the 10 largest US stocks by market capitalize as of 12/31/2018
- Download the closing prices for these stocks for the last 5 Tuesdays of 2018
- Calculate the variance of each stock during that period
- Tuesday, September 8, 2020:
- Course overview
- Introduction to natural language processing (NLP) and machine learning (ML).
- Tuesday, September 15, 2020:
- Tuesday, September 22, 2020:
- Machine learning follow-up (cont.):
- Natural language processing for quantitative trading.
- Tuesday, September 29, 2020:
- Midterm exam
- Machine learning for quantitative trading.
- Tuesday, October 6, 2020:
- Project midterm reports are due.
- Project midterm presentations and discussion.
- Advanced topics in natural language processing and machine learning.
- Tuesday, October 13, 2020:
- Advanced topics in quantitative trading.
- Tuesday, October 20, 2020:
- Project presentations and discussion.
There are many excellent nlp courses taught around the world each year, most with lectures freely available on the internet. If there is a particular topic you would like more background on, or further topics we did not have time to explore in class, I encourage you to take advantage of these resources. As always, if you do reference this material in your work, please cite it.
Unfortunately, there are not as many publicly available resources on developing quantitative trading strategies. Nevertheless, there are still a (growing) number of excellent resources, including:
Here are some publicly available datasets:
- Natural Language Processing, Dan Jurafsky and Christopher Manning, Stanford Coursera.
- Natural Language Processing, Jason Eisner, Johns Hopkins (JHU).
- Foundations of Statistical Natural Language Processing, Chris Manning and Hinrich Schutze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.
- Overfitting / Bias-variance tradeoff, Daniel Geng and Shannon Shih, UC Berkeley.
- Language and Statistics , Roni Rosenfeld , CMU.
- Sentiment Analysis and Opinion Mining (tutorial), Bing Liu, UIC.
- Sentiment Analysis and Opinion Mining (book), Bing Liu, UIC.
- Introduction to Natural Language Processing, David Smith, UMass.
- Natural Language Processing with Deep Learning, Richard Socher, Stanford.
- HW 1 assigned (HW 1 data), due 6:00 pm (beginning of class) on Tuesday, September 22, 2020 via e-mail to the TA.