HOW TO BUILD YOUR OWN ALGORITHMIC TRADING BUSINESS 如何建立你自己的算法交易业务
ERNEST P. CHAN 欧内斯特·P·陈
Quantitative Trading 量化交易
Founded in 1807, John Wiley & Sons is the oldest independent publishing company in the United States. With offices in North America, Europe, Australia, and Asia, Wiley is globally committed to developing and marketing print and electronic products and services for our customers’ professional and personal knowledge and understanding. 约翰·威利公司成立于 1807 年,是美国历史最悠久的独立出版公司。威利在北美、欧洲、澳大利亚和亚洲设有办事处,致力于为客户开发和推广印刷及电子产品和服务,满足其专业和个人知识与理解的需求。
The Wiley Trading series features books by traders who have survived the market’s ever changing temperament and have prospered-some by reinventing systems, others by getting back to basics. Whether a novice trader, professional, or somewhere in-between, these books will provide the advice and strategies needed to prosper today and well into the future. For a list of available titles, visit our Web site at www.WileyFinance.com. Wiley 交易系列图书由那些在市场不断变化的情绪中生存并繁荣的交易者撰写——有些通过重新设计系统,有些则回归基础。无论您是新手交易者、专业人士,还是介于两者之间,这些书籍都将提供在当今及未来取得成功所需的建议和策略。欲了解可用书目,请访问我们的网站:www.WileyFinance.com。
Quantitative Trading 量化交易
How to Build Your Own Algorithmic Trading Business 如何建立您自己的算法交易业务
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions. 未经出版商事先书面许可,或通过向版权清算中心(Copyright Clearance Center, Inc.,地址:222 Rosewood Drive, Danvers, MA 01923,电话:(978) 750-8400,传真:(978) 646-8600,网站:www.copyright.com)支付相应的单份复制费用授权,严禁以任何形式或任何手段(电子、机械、复印、录音、扫描或其他方式)复制、存储于检索系统或传播本出版物的任何部分,除非符合 1976 年美国版权法第 107 或 108 条的规定。向出版商申请许可应联系 John Wiley & Sons, Inc.的许可部门,地址:111 River Street, Hoboken, NJ 07030,电话:(201) 748-6011,传真:(201) 748-6008,或通过网址 http://www.wiley.com/go/permissions 在线申请。
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. 责任限制/免责声明:尽管出版商和作者在编写本书时已尽最大努力,但他们不对本书内容的准确性或完整性作出任何陈述或保证,并明确否认任何适销性或特定用途适用性的默示保证。销售代表或书面销售材料不得创建或延伸任何保证。本书所含建议和策略可能不适合您的具体情况。您应在适当情况下咨询专业人士。出版商和作者均不对任何利润损失或其他商业损害承担责任,包括但不限于特殊的、附带的、间接的或其他损害。
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. 如需了解我们其他产品和服务的一般信息或技术支持,请在美国境内致电客户服务部 (800) 762-2974,或在美国境外致电 (317) 572-3993,传真 (317) 572-4002。
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. For more information about Wiley products, visit our web site at www.wiley. com. Wiley 还以多种电子格式出版其图书。部分印刷版内容可能无法在电子书中获得。有关 Wiley 产品的更多信息,请访问我们的网站 www.wiley.com。
Library of Congress Cataloging-in-Publication Data is Available: 国会图书馆编目数据可用:
To my parents, Hung Yip and Ching, and to Ben and Sarah Ming. 献给我的父母,Hung Yip 和 Ching,以及 Ben 和 Sarah Ming。
Contents 目录
Preface to the 2^("nd ")2^{\text {nd }} Edition … xi 第 2^("nd ")2^{\text {nd }} 版序言 … xi
Preface … xv 序言 … xv
Acknowledgments … xxi 致谢 … xxi
CHAPTER 1: The Whats, Whos, and Whys of Quantitative Trading … 1 第一章:量化交易的是什么、谁和为什么 … 1
Who Can Become a Quantitative Trader? … 2 谁能成为量化交易员? … 2
The Business Case for Quantitative Trading … 4 量化交易的商业案例 … 4
Scalability … 5 可扩展性 … 5
Demand on Time … 5 时间需求 … 5
The Nonnecessity of Marketing … 7 营销的非必要性 … 7
The Way Forward … 8 前进的道路 … 8
CHAPTER 2: Fishing for Ideas … 11 第二章:寻找灵感 … 11
How to Identify a Strategy that Suits You … 14 如何识别适合你的策略 … 14
Your Working Hours … 14 你的工作时间 … 14
Your Programming Skills … 15 你的编程技能 … 15
Your Trading Capital … 15 你的交易资金 … 15
Your Goal … 19 你的目标 … 19
A Taste for Plausible Strategies and Their Pitfalls … 20 对合理策略及其陷阱的品味 … 20
How Does It Compare with a Benchmark, and How Consistent Are Its Returns? … 20 与基准相比如何,收益有多稳定? … 20
How Deep and Long Is the Drawdown? … 23 回撤有多深、多长? … 23
How Will Transaction Costs Affect the Strategy? … 24 交易成本将如何影响策略? … 24
Does the Data Suffer from Survivorship Bias? … 26 数据是否存在幸存者偏差?……26
How Did the Performance of the Strategy Change over the Years? … 27 策略的表现随时间如何变化?……27
Does the Strategy Suffer from Data-Snooping Bias? … 28 策略是否存在数据窥探偏差?……28
Does the Strategy “Fly under the Radar” of Institutional Money Managers? … 30 策略是否“躲避”机构资金经理的关注?……30
Summary … 30 摘要 … 30
References … 31 参考文献 … 31
CHAPTER 3: Backtesting … 33 第三章:回测 … 33
Common Backtesting Platforms … 34 常用回测平台 … 34
Excel … 34
MATLAB … 34
Python … 36
R … 38
QuantConnect … 40
Blueshift … 40
Finding and Using Historical Databases … 40 查找和使用历史数据库 … 40
Are the Data Split and Dividend Adjusted? … 41 数据是否进行了拆分和分红调整? … 41
Are the Data Survivorship-Bias Free? … 44 数据是否无生存者偏差? … 44
Does Your Strategy Use High and Low Data? … 46 你的策略是否使用了最高价和最低价数据? … 46
Performance Measurement … 47 绩效衡量 … 47
Common Backtesting Pitfalls to Avoid … 57 常见的回测陷阱及避免方法 … 57
Look-Ahead Bias … 58 前瞻性偏差 … 58
Data-Snooping Bias … 59 数据窥探偏差 … 59
Transaction Costs … 72 交易成本 … 72
Strategy Refinement … 77 策略优化 … 77
Summary … 78 摘要 … 78
References … 79 参考文献 … 79
CHAPTER 4: Setting Up Your Business … 81 第四章:建立你的业务 … 81
Business Structure: Retail or Proprietary? … 81 业务结构:零售还是自营? … 81
Choosing a Brokerage or Proprietary Trading Firm … 85 选择经纪公司或专有交易公司 … 85
Physical Infrastructure … 87 物理基础设施 … 87
Summary … 89 总结 … 89
References … 91 参考文献 … 91
CHAPTER 5: Execution Systems … 93 第 5 章:执行系统 … 93
What an Automated Trading System Can Do for You … 93 自动交易系统能为你做什么 … 93
Building a Semiautomated Trading System … 95 构建半自动交易系统 … 95
Building a Fully Automated Trading System … 98 构建全自动交易系统 … 98
Minimizing Transaction Costs … 101 最小化交易成本 … 101
Testing Your System by Paper Trading … 103 通过模拟交易测试系统 … 103
Why Does Actual Performance Diverge from Expectations? … 104 为什么实际表现与预期不同? … 104
Summary … 107 总结 … 107
CHAPTER 6: Money and Risk Management … 109 第 6 章:资金与风险管理 … 109
Optimal Capital Allocation and Leverage … 109 最佳资本配置与杠杆 … 109
Risk Management … 120 风险管理 … 120
Model Risk … 124 模型风险 … 124
Software Risk … 125 软件风险 … 125
Natural Disaster Risk … 125 自然灾害风险 … 125
Psychological Preparedness … 125 心理准备 … 125
Summary … 130 总结 … 130
Appendix: A Simple Derivation of the Kelly Formula when Return Distribution Is Gaussian … 131 附录:当收益分布为高斯分布时凯利公式的简单推导 … 131
References … 132 参考文献 … 132
CHAPTER 7: Special Topics in Quantitative Trading … 133 第 7 章:量化交易中的专题 … 133
Mean-Reverting versus Momentum Strategies … 134 均值回归策略与动量策略 … 134
Regime Change and Conditional Parameter Optimization … 137 政权变更与条件参数优化 … 137
Stationarity and Cointegration … 147 平稳性与协整 … 147
Factor Models … 160 因子模型 … 160
What Is Your Exit Strategy? … 169 你的退出策略是什么? … 169
Seasonal Trading Strategies … 174 季节性交易策略 … 174
High-Frequency Trading Strategies … 186 高频交易策略 … 186
Is it Better to Have a High-Leverage versus a High-Beta Portfolio? … 188 高杠杆组合和高贝塔组合哪个更好? … 188
Summary … 190 总结 … 190
References … 192 参考文献 … 192
CHAPTER 8: Conclusion … 193 第 8 章:结论 … 193
Next Steps … 197 后续步骤 … 197
References … 198 参考文献 … 198
Appendix: A Ouick Survey of MATLAB … 199 附录:MATLAB 快速调查 … 199
Bibliography … 205 参考文献 … 205
About the Author … 209 关于作者 … 209
Index … 211 索引 … 211
Preface to the 2^("nd ")2^{\text {nd }} Edition 2^("nd ")2^{\text {nd }} 版序言
When I first started thinking about writing the 2^("nd ")2^{\text {nd }} edition, I had a measure of dread. What could I have added that would be new and interesting? After writing the first draft, I was relieved, and incredibly excited, at the prospect of sharing with you my latest knowledge, techniques, and insights, ranging from the addition of some new functions that make our PCA example run more than 10 x faster, to a novel application of machine learning. 当我第一次开始考虑撰写 2^("nd ")2^{\text {nd }} 版时,心中有些忐忑。我还能添加什么新的、有趣的内容呢?写完初稿后,我感到宽慰,同时也非常兴奋,因为我将与大家分享我最新的知识、技术和见解,从新增的一些函数使我们的 PCA 示例运行速度提升超过 10 倍,到机器学习的新颖应用。
In the 1^("st ")1^{\text {st }} edition of this book, published more than a decade ago, I maintained that independent quantitative traders can beat institutional managers at their own game. Many of you have taken that advice to heart, and many retail quantitative trading communities and platforms have been built to serve just such an ambition. But does the premise still hold? 在十多年前出版的这本书的 1^("st ")1^{\text {st }} 版中,我坚持认为独立的量化交易者可以在自己的领域击败机构经理。你们中的许多人已经将这一建议铭记于心,许多零售量化交易社区和平台也因此建立,旨在实现这一目标。但这一前提如今仍然成立吗?
Over the years, many readers reached out and told me how successful they have been in improving and trading the strategies I discussed in my books, and others told me how they have simply been inspired by my books to become successful traders. Our fund is invested in some of these readers, some of whom have been managing many millions more dollars than we are. So, the answer to the above question is a resounding “YES!” 多年来,许多读者联系我,告诉我他们在改进和交易我书中讨论的策略方面取得了多大的成功,还有一些读者告诉我,他们仅仅是因为我的书而受到启发,成为了成功的交易者。我们的基金投资了一些这样的读者,其中一些管理的资金比我们多出数百万美元。因此,对上述问题的回答是响亮的“是!”
I also exhorted retail traders new to quantitative trading to start with the simplest strategies (examples of which are described in this and my previous books). Do simple strategies still work? Or do we all have to become mathematicians or machine learning experts? 我也曾劝诫刚接触量化交易的散户交易者从最简单的策略开始(本书及我之前的书中都有相关示例)。简单策略仍然有效吗?还是我们都必须成为数学家或机器学习专家?
My colleagues and I traded some of the strategies described in this book live since it was first published in 2009, and ran true out-ofsample backtests on others, and I was as surprised as they are that 自 2009 年本书首次出版以来,我和同事们一直在实盘交易书中描述的一些策略,并对其他策略进行了真正的样本外回测,我和他们一样感到惊讶,
many still work after all these years. But the issues of “alpha decay,” and the even-more-dreaded “regime change,” are ever threatening. I will talk more about that below. 许多策略经过多年依然有效。但“阿尔法衰减”问题,以及更令人畏惧的“市场状态变化”问题,始终存在威胁。我将在下面详细讨论这些内容。
Speaking of machine learning and artificial intelligence, I didn’t really think much of those techniques in my first book. In fact, the only artificial intelligence platform that I described there has gone out of business. But you may hear that AI is everywhere nowadays, and many fundamental advances in AI have been made since then. For example, the dropout technique that gave birth to deep learning achieved fame in 2012 (Gershgorn, 2017). Should retail traders still avoid AI/ML? 说到机器学习和人工智能,我在第一本书中其实并没有对这些技术给予太多关注。事实上,我当时提到的唯一一个人工智能平台现在已经倒闭了。但你可能会听说,如今人工智能无处不在,自那以后人工智能领域取得了许多基础性进展。例如,催生深度学习的 dropout 技术在 2012 年声名鹊起(Gershgorn,2017)。那么,散户交易者是否仍应回避人工智能和机器学习呢?
It is as difficult to apply AI/ML to finance in 2021 as it was in 2009, but you may be surprised to hear that we have finally succeeded (Chan, 2020). We have benefited from other giants in the industry who graciously share their insights and knowledge with everyone (López de Prado, 2018). We, in turn, tried to make it easier for every retail trader (even those who are not programmers) to benefit from this technology by launching predictnow.ai. Here is the spoiler: The key to successfully apply AI/ML to finance is to focus on metalabeling - i.e., finding the probability of profit of your own simple basic trading strategy, and not to use it to predict the market directly. Why? Your own trading strategy’s past track record is private; no one else is trying to predict its success. Meanwhile, millions of people around the world are watching the same public market, and everyone is trying to predict where it will go. Competition and arbitrage naturally mean that signal-to-noise ratio is very low and predictive successes are few and far in between. 在 2021 年,将人工智能/机器学习应用于金融领域的难度与 2009 年一样大,但你可能会惊讶地听到我们终于成功了(Chan,2020)。我们受益于行业内其他巨头,他们慷慨地与所有人分享他们的见解和知识(López de Prado,2018)。我们则尝试通过推出 predictnow.ai,让每个散户交易者(即使不是程序员)也能更容易地受益于这项技术。这里有个剧透:成功将人工智能/机器学习应用于金融的关键是专注于元标注——即找到你自己简单基础交易策略的盈利概率,而不是直接用它来预测市场。为什么?你自己的交易策略的过往业绩是私密的;没有其他人试图预测它的成功。与此同时,全球数百万人都在关注同一个公开市场,每个人都在试图预测市场的走向。竞争和套利自然意味着信噪比非常低,预测成功的情况少之又少。
But that’s not all. There is another novel use of machine learning that I will discuss in a completely revised Example 7.1 of this book. 但这还不是全部。我将在本书完全修订的示例 7.1 中讨论机器学习的另一种新颖用法。
Despite our luck with the longevity of some of the strategies I described, most arbitrage opportunities eventually fade awaythe notorious alpha decay that professionals like to lament. Alpha decay can be due to competition-too many people trading the same strategy, but equally often it is due to regime shift caused by market structure or macroeconomic changes. Adapt and evolve your strategies, or watch them die (Lo, 2019). The market is not stationary; why 尽管我描述的一些策略运气不错,能够持续较长时间,但大多数套利机会最终都会消失——这就是专业人士常抱怨的臭名昭著的阿尔法衰减。阿尔法衰减可能是由于竞争——太多人在交易相同的策略,但同样常见的原因是市场结构或宏观经济变化引起的体制转变。要么适应并发展你的策略,要么眼睁睁看着它们消亡(Lo,2019)。市场不是静止不变的;为什么呢?
should your strategies be? The most agonizing decision a quantitative trader needs to make is to decide when to abandon a strategy during a prolonged drawdown, despite repeated efforts to evolve it. It is ultimately a discretionary decision-you have to judge based on your market knowledge whether there is a fundamental reason your strategy stopped working. To gain this market knowledge, you have to constantly absorb public knowledge disseminated on social media. That is the reason I set aside an hour each day to go through my Twitter feed (@chanep). I have highlighted some of the Twitterers I follow in Chapter 2. More so than providing specific strategy examples, I hope my books will also improve your market intuition in making these discretionary decisions. 你的策略应该有多严格?量化交易者最痛苦的决定之一,就是在策略经历长时间回撤、尽管多次尝试改进后,仍需决定何时放弃该策略。这最终是一个主观判断——你必须根据自己的市场知识来判断策略失效是否有根本原因。为了获得这种市场知识,你必须不断吸收社交媒体上传播的公开信息。这也是我每天留出一小时浏览我的 Twitter 动态(@chanep)的原因。我在第二章中重点介绍了一些我关注的推特用户。比起提供具体的策略示例,我更希望我的书能提升你在做这些主观判断时的市场直觉。
One major addition to this edition is the inclusion of Python and R codes to every example. Even though MATLAB is still my favorite backtesting language, there is no reason to exclude the other two most popular languages. Other things that I added and changed in the 2^("nd ")2^{\text {nd }} edition: 本版的一个重要新增内容是为每个示例都加入了 Python 和 R 代码。尽管 MATLAB 仍然是我最喜欢的回测语言,但没有理由排除另外两种最流行的语言。我在 2^("nd ")2^{\text {nd }} 版中添加和修改的其他内容包括:
Chapter 1: A bit more about fully automated trading and marketing your strategies to investors. Also, a scare episode during Covid-19. 第一章:更多关于全自动交易和向投资者推销你的策略的内容。此外,还讲述了 Covid-19 期间的一次惊险经历。
Chapter 2: Updated the educational and trading resources for budding quant traders, including the new URL for my own blog. Also, a good word for Millennium Partners’ founder (not that he needs it). 第 2 章:更新了针对初学量化交易者的教育和交易资源,包括我个人博客的新网址。此外,还特别提到了千禧合伙人创始人(虽然他并不需要这些赞美)。
Chapter 3: Extensive changes on MATLAB code that remove a major bug, and new commentary and codes for Python and R. Description of some new quant trading platforms. One item of particular interest: I discuss a mathematically rigorous way to decide how much backtest data and how long a paper trading period is needed. Another mathematical technique was referenced that determines how data snooping will affect your live Sharpe ratio. 第 3 章:对 MATLAB 代码进行了大量修改,修复了一个重大漏洞,并新增了 Python 和 R 的注释和代码。介绍了一些新的量化交易平台。其中一个特别值得关注的内容是:我讨论了一种数学上严谨的方法,用以决定需要多少回测数据以及纸面交易期应持续多长时间。还引用了另一种数学技术,用来确定数据窥探如何影响你的实际夏普比率。
Chapter 4: Much has changed in the world of brokers and infrastructure providers for algorithmic traders since the first edition. Even the name of the US regulator for brokers has changed. You will find them all updated. 第 4 章:自第一版以来,算法交易者的经纪商和基础设施提供商领域发生了巨大变化。甚至美国经纪商监管机构的名称也已更改。你将在本章中看到所有这些更新。
Chapter 5: It is now much easier than before to build a fully automated trading system. The new ways are described in this chapter. 第 5 章:现在构建一个完全自动化的交易系统比以前容易得多。本章将介绍这些新方法。
Chapter 6: New insights on the Kelly formula and its practical impact. Python and RR codes for demonstrating capital allocation using the Kelly formula are added. Also included is a discussion on why loss aversion is not a behavioral bias, which is opposite to what I previously believed. It stems from a profound mathematical insight that threatens to upend the economics profession. 第 6 章:关于凯利公式的新见解及其实际影响。新增了使用凯利公式进行资金分配演示的 Python 和 RR 代码。同时还讨论了为何损失厌恶并非行为偏差,这与我之前的看法相反。这源自一个深刻的数学洞见,可能颠覆经济学界。
Chapter 7: This chapter is extensively updated. I describe a novel machine learning technique we invented called Conditional Parameter Optimization that can be used to optimize the trading parameters of a strategy based on market regimes. Also added are new high-performance MATLAB/Python/R codes on using PCA, new Python/R codes on checking for stationarity and cointegration, and some surprising out-of-sample results on seasonal trading strategies. I also clarified the difference between timeseries and cross-sectional factors. 第 7 章:本章进行了大幅更新。我介绍了一种我们发明的新型机器学习技术,称为条件参数优化,可用于基于市场状态优化策略的交易参数。还新增了使用 PCA 的高性能 MATLAB/Python/R 代码,新的 Python/R 代码用于检验平稳性和协整,以及一些关于季节性交易策略的令人惊讶的样本外结果。我还澄清了时间序列因子与横截面因子的区别。
Chapter 8: Conclusions remain largely the same. Yes, a retail trader can beat the professionals. But a retail trader can also hire a professional to help generate alpha and diversify. 第 8 章:结论基本保持不变。是的,散户交易者可以击败专业人士。但散户交易者也可以雇佣专业人士来帮助产生阿尔法并实现多元化。
Gershgorn. 2017. “The data that transformed AI research-and possibly the world.” Qz. https://qz.com/1034972/the-data-that-changed-the-direction -of-ai-research-and-possibly-the-world/ Gershgorn. 2017 年。“改变了人工智能研究方向——甚至可能改变世界的数据。”Qz。https://qz.com/1034972/the-data-that-changed-the-direction -of-ai-research-and-possibly-the-world/
Lo, Andrew. 2019. Adaptive Markets: Financial Evolution at the Speed of Thought. Princeton University Press. Lo, Andrew. 2019 年。《适应性市场:以思维速度演进的金融》。普林斯顿大学出版社。
López de Prado, Marcos. 2018. Advances in Financial Machine Learning. Wiley. López de Prado, Marcos. 2018 年。《金融机器学习的进展》。Wiley 出版社。
Preface 前言
By some estimates, quantitative or algorithmic trading now accounts for over 80 percent of the equity trading volume (Economist, 2019). There are, of course, innumerable books on the advanced mathematics and strategies utilized by institutional traders in this arena. However, can an independent, retail trader benefit from these algorithms? Can an individual with limited resources and computing power backtest and execute strategies over thousands of stocks, and come to challenge the powerful industry participants at their own game? 据一些估计,量化或算法交易现在占据了超过 80%的股票交易量(经济学人,2019 年)。当然,有无数关于机构交易者在这一领域所使用的高级数学和策略的书籍。然而,独立的散户交易者能从这些算法中受益吗?一个资源和计算能力有限的个人,能否对成千上万只股票进行回测和执行策略,并在自己的领域挑战强大的行业参与者?
I will show you how this can, in fact, be achieved. 我将向你展示这实际上是如何实现的。
WHO IS THIS BOOK FOR? 这本书适合谁?
I wrote this book with two types of readers in mind: 我写这本书时考虑了两类读者:
Aspiring independent (“retail”) traders who are looking to start a quantitative trading business. 有志于成为独立(“散户”)交易者,想要开始量化交易业务的人。
Students of finance or other technical disciplines (at the undergraduate or MBA level) who aspire to become quantitative traders and portfolio managers at major institutions. 金融或其他技术学科的学生(本科或 MBA 水平),希望成为大型机构的量化交易员和投资组合经理。
Can these two very different groups of readers benefit from the same set of knowledge and skills? Is there anything common between managing a $100\$ 100 million portfolio and managing a $100,000\$ 100,000 portfolio? My contention is that it is much more logical and sensible 这两类截然不同的读者能否从同一套知识和技能中受益?管理一个 $100\$ 100 百万的投资组合和管理一个 $100,000\$ 100,000 投资组合之间有什么共通之处吗?我的观点是,更合逻辑、更明智的做法是
for someone to become a profitable $100,000\$ 100,000 trader before becoming a profitable $100\$ 100 million trader. This can be shown to be true on many fronts. 先成为一个盈利的 $100,000\$ 100,000 交易员,然后再成为一个盈利的 $100\$ 100 百万交易员。这一点在多个方面都可以得到证明。
Many legendary quantitative hedge fund managers such as Dr. Edward Thorp of the former Princeton-Newport Partners (Poundstone, 2005) and Dr. Jim Simons of Renaissance Technologies Corp. (Lux, 2000) started their careers trading their own money. They did not begin as portfolio managers for investment banks and hedge funds before starting their own fund management business. Of course, there are also plenty of counterexamples, but clearly this is a possible route to riches as well as intellectual accomplishment, and for someone with an entrepreneurial bent, a preferred route. 许多传奇的量化对冲基金经理,如前普林斯顿-纽波特合伙公司(Poundstone,2005 年)的爱德华·索普博士和文艺复兴科技公司(Lux,2000 年)的吉姆·西蒙斯博士,都是从用自己的资金交易开始他们的职业生涯的。他们并不是先在投资银行和对冲基金担任投资组合经理,然后才开始自己的基金管理业务。当然,也有许多反例,但显然这是一条通往财富和智力成就的可行路径,对于有创业倾向的人来说,这是一条更受青睐的道路。
Even if your goal is to become an institutional trader, it is still worthwhile to start your own trading business as a first step. Physicists and mathematicians are now swarming Wall Street. Few people on the Street are impressed by a mere PhD from a prestigious university anymore. What is the surest way to get through the door of the top banks and funds? To show that you have a systematic way to profits—in other words, a track record. Quite apart from serving as a stepping stone to a lucrative career in big institutions, having a profitable track record as an independent trader is an invaluable experience in itself. The experience forces you to focus on simple but profitable strategies, and not get sidetracked by overly theoretical or sophisticated theories. It also forces you to focus on the nitty-gritty of quantitative trading that you won’t learn from most books: things such as how to build an order entry system that doesn’t cost $10,000\$ 10,000 of programming resources. Most importantly, it forces you to focus on risk management-after all, your own personal bankruptcy is a possibility here. Finally, having been an institutional as well as a retail quantitative trader and strategist at different times, I only wish that I had read a similar book before I started my career at a bank-I would have achieved profitability many years sooner. 即使你的目标是成为一名机构交易员,作为第一步,自己开设交易业务仍然是值得的。物理学家和数学家如今纷纷涌入华尔街。街上很少有人会被仅仅来自名牌大学的博士学位所打动。进入顶级银行和基金的最可靠途径是什么?那就是证明你拥有系统化的盈利方法——换句话说,有一份业绩记录。除了作为进入大型机构高薪职业的跳板之外,作为独立交易员拥有盈利的业绩记录本身就是一笔宝贵的经验。这种经验迫使你专注于简单但盈利的策略,而不会被过于理论化或复杂的理论所分心。它还迫使你关注量化交易中的细节问题,这些是大多数书籍中学不到的:比如如何构建一个不耗费大量编程资源的订单输入系统。最重要的是,它迫使你关注风险管理——毕竟,你个人破产的可能性是真实存在的。 最后,作为一名曾在不同阶段担任机构和零售量化交易员及策略师的人,我只希望在开始银行职业生涯之前能读到类似的书——那样我本可以早多年实现盈利。
Given these preambles, I won’t make any further apologies in the rest of the book in focusing on the entrepreneurial, independent traders and how they can build a quantitative trading business on their own, while hoping that many of the lessons would be useful on their way to institutional money management as well. 鉴于以上前言,在本书的其余部分我将不再为专注于创业型、独立交易者以及他们如何自行建立量化交易业务而道歉,同时也希望许多经验教训对他们走向机构资金管理的道路有所帮助。
WHAT KIND OF BACKGROUND DO YOU NEED? 你需要什么样的背景?
Despite the scary-sounding title, you don’t need to be a math or computer whiz in order to use this book as a guide to start trading quantitatively. Yes, you do need to possess some basic knowledge of statistics, such as how to calculate averages, standard deviations, or how to fit a straight line through a set of data points. Yes, you also need to have some basic familiarity with Excel. But what you don’t need is any advanced knowledge of stochastic calculus, neural networks, or other impressive-sounding techniques. 尽管标题听起来吓人,但你不需要成为数学或计算机天才才能用这本书作为量化交易入门的指南。是的,你确实需要具备一些基本的统计知识,比如如何计算平均值、标准差,或者如何通过一组数据点拟合一条直线。是的,你还需要对 Excel 有一定的基本了解。但你不需要掌握任何高级的随机微积分、神经网络或其他听起来很厉害的技术。
Though it is true that you can make millions with nothing more than Excel, it is also true that there are tools that, if you are proficient with them, will enable you to backtest trading strategies much more efficiently, and may also allow you to retrieve and process data much more easily than you otherwise can. Best among these tools are MATLAB, Python, and R, and they are the most common research platforms that many institutional quantitative strategists and portfolio managers use. Therefore, I will demonstrate how to backtest the majority of strategies using all three languages. In fact, I have included a brief tutorial in the appendix on how to do some basic programming in MATLAB, which is my favorite among the three. For a tutorial on how to use R for finance, I recommend Regenstein (2018). For Python, I like the book by the inventor of its Pandas package, McKinney (2017). MATLAB for home use costs about as much as Microsoft Office, while Python and RR are free. 虽然仅凭 Excel 你也能赚取数百万,但同样事实是,如果你熟练掌握某些工具,它们能让你更高效地回测交易策略,也可能让你比以往更轻松地获取和处理数据。其中最好的工具是 MATLAB、Python 和 R,它们是许多机构量化策略师和投资组合经理最常用的研究平台。因此,我将演示如何使用这三种语言回测大多数策略。事实上,我在附录中还包含了一个关于如何在 MATLAB 中进行一些基础编程的简短教程,MATLAB 是我三者中最喜欢的。关于如何使用 R 进行金融分析的教程,我推荐 Regenstein(2018)。至于 Python,我喜欢由其 Pandas 包的发明者 McKinney(2017)所著的书。MATLAB 的家庭版价格大约和 Microsoft Office 差不多,而 Python 和 RR 是免费的。
WHAT WILL YOU FIND IN THIS BOOK? 你将在本书中找到什么?
This book is definitely not designed as an encyclopedia of quantitative trading techniques or terminologies. It will not even be about specific profitable strategies (although you can refine the few example strategies embedded here to make them quite profitable). Instead, this is a book that teaches you how to find a profitable strategy yourself. It teaches you the characteristics of a good strategy, how to refine and 这本书绝对不是一本关于量化交易技术或术语的百科全书。它甚至不会涉及具体的盈利策略(尽管你可以对书中嵌入的几个示例策略进行改进,使其相当盈利)。相反,这本书教你如何自己找到一个盈利的策略。它教你一个好策略的特征,如何改进和
backtest a strategy to ensure that it has good historical performance, and, more importantly, to ensure that it will remain profitable in the future. It teaches you a systematic way to scale up or wind down your strategies depending on their real-life profitability. It teaches you some of the nuts and bolts of implementing an automated execution system in your own home. Finally, it teaches you some basics of risk management, which is critical if you want to survive over the long term, and also some psychological pitfalls to avoid if you want an enjoyable (and not just profitable) life as a trader. 回测一个策略,以确保它在历史上表现良好,更重要的是,确保它在未来仍然能保持盈利。它教你一种系统的方法,根据策略在现实中的盈利情况来扩大或缩减策略规模。它教你一些在家中实现自动执行系统的基本构件。最后,它教你一些风险管理的基础知识,这对于你想要长期生存至关重要,同时也教你一些心理陷阱,帮助你避免这些陷阱,从而让你作为交易者的生活既愉快又不仅仅是盈利。
Even though the basic techniques for finding a good strategy should work for any tradable securities, I have focused my examples on an area of trading I personally know best: statistical arbitrage trading in stocks. While I discuss sources of historical data on stocks, futures, and foreign currencies in the chapter on backtesting, I did not include options because those are quite complicated to backtest for someone new to algorithmic trading. If you are really keen on learning that, please read Chan (2017). 尽管寻找良好策略的基本技术适用于任何可交易的证券,但我在示例中主要集中在我个人最熟悉的交易领域:股票的统计套利交易。虽然我在回测章节中讨论了股票、期货和外汇的历史数据来源,但没有包括期权,因为对于算法交易新手来说,期权的回测相当复杂。如果你真的很想学习这方面的内容,请阅读 Chan(2017)。
The book is organized roughly in the order of the steps that traders need to undertake to set up their quantitative trading business. These steps begin at finding a viable trading strategy (Chapter 2), then backtesting the strategy to ensure that it at least has good historical performance (Chapter 3), setting up the business and technological infrastructure (Chapter 4), building an automated trading system to execute your strategy (Chapter 5), and managing the money and risks involved in holding positions generated by this strategy (Chapter 6). I will then describe in Chapter 7 a number of important advanced concepts with which most professional quantitative traders are familiar, and finally conclude in Chapter 8 with reflections on how independent traders can find their niche and how they can grow their business. I have also included an appendix that contains a tutorial on using MATLAB. 本书大致按照交易者建立量化交易业务所需采取的步骤进行组织。这些步骤从寻找可行的交易策略开始(第 2 章),然后对策略进行回测以确保其至少在历史上表现良好(第 3 章),接着搭建业务和技术基础设施(第 4 章),构建自动化交易系统以执行策略(第 5 章),以及管理持有该策略生成的头寸所涉及的资金和风险(第 6 章)。随后,我将在第 7 章介绍一些大多数专业量化交易者熟悉的重要高级概念,最后在第 8 章总结独立交易者如何找到自己的市场定位以及如何发展业务。我还附录了一个关于使用 MATLAB 的教程。
You’ll see two different types of boxed material in this book: 你将在本书中看到两种不同类型的框内内容:
Sidebars containing an elaboration or illustration of a concept 包含对某一概念的详细说明或举例的边栏
Examples, accompanied by Excel (for some), MATLAB, Python, and RR codes (for all) 示例,部分配有 Excel 代码,全部配有 MATLAB、Python 和 RR 代码
Readers who want to learn more and keep up to date with the latest news, ideas, and trends in quantitative trading are welcome to visit my blog predictnow.ai/blog, where I will do my best to answer their questions. You will find that the website contains articles and presentations of many aspects of quantitative trading. Readers of this book will have free access to the software codes contained in this book located at epchan.com/book and will find the password in a later chapter to enter that website. 想要了解更多并及时掌握量化交易最新资讯、理念和趋势的读者,欢迎访问我的博客 predictnow.ai/blog,我会尽力回答大家的问题。你会发现该网站包含了许多关于量化交易各方面的文章和演示。本书读者可免费获取书中包含的软件代码,代码位于 epchan.com/book,密码将在后续章节中提供,以便访问该网站。
-Ernest P. Chan - Ernest P. Chan
October 2020 2020 年 10 月
REFERENCES 参考文献
Economist. 2019. “March of the Machines. The stockmarket is now run by computers, algorithms, and passive managers.” October 5. www.econo-mist.com/briefing/2019/10/05/the-stockmarket-is-now-run-by-computers-algorithms-and-passive-managers. 经济学人。2019 年。“机器的进军。股市现在由计算机、算法和被动管理者操控。”10 月 5 日。www.econo-mist.com/briefing/2019/10/05/the-stockmarket-is-now-run-by-computers-algorithms-and-passive-managers。
Lux, Hal. 2000. “The Secret World of Jim Simons.” Institutional Investor Magazine, November 1. 卢克斯,哈尔。2000 年。“吉姆·西蒙斯的秘密世界。”《机构投资者杂志》,11 月 1 日。
Poundstone, William. 2005. Fortune’s Formula. New York: Hill and Wang. 庞德斯通,威廉。2005 年。《财富公式》。纽约:希尔与王出版社。
Acknowledgments 致谢
For the second edition, I would like to thank Ben Xie, Long Le, Roger Hunter, Tho Du, and Zachary David for their help with Python and R. A big thank you also to my production editor, Purvi Patel, for shepherding this project to its fruition. 对于第二版,我要感谢 Ben Xie、Long Le、Roger Hunter、Tho Du 和 Zachary David 在 Python 和 R 方面的帮助。还要特别感谢我的制作编辑 Purvi Patel,感谢她引导这个项目顺利完成。
I thank Dr. Sergei Belov and Dr. Radu Ciobanu for demonstrating a novel machine-learning technique that we called Conditional Parameter Optimization in Example 7.1, and updating Example 7.4 with his high-performance PCA codes. Radu was the VP of Engineering at PredictNow.ai, our financial machine-learning SaaS, and Sergei is a senior researcher there. 我感谢 Sergei Belov 博士和 Radu Ciobanu 博士在示例 7.1 中展示了一种我们称之为条件参数优化的新型机器学习技术,并用他高性能的 PCA 代码更新了示例 7.4。Radu 曾是 PredictNow.ai(我们的金融机器学习 SaaS)的工程副总裁,Sergei 是那里的高级研究员。
Last but not least, I would like to thank all the readers who wrote me over the years since the publication of the first edition with their questions and doubts, about bugs in the book, and on how they finally achieved success in this ultracompetitive world of quant trading. 最后但同样重要的是,我要感谢自第一版出版以来,多年来给我写信的所有读者,他们提出了问题和疑惑,指出了书中的错误,并分享了他们如何最终在这个竞争极其激烈的量化交易世界中取得成功。
CIIAPTER 1 第一章
The Whats, Whos, and Whys of Quantitative Trading 量化交易的是什么、谁和为什么
If you are curious enough to pick up this book, you probably have already heard of quantitative trading. But even for readers who learned about this kind of trading from the mainstream media before, it is worth clearing up some common misconceptions. 如果你有足够的好奇心拿起这本书,你可能已经听说过量化交易。但即使是那些从主流媒体了解过这种交易方式的读者,也值得澄清一些常见的误解。
Quantitative trading, also known as algorithmic trading, is the trading of securities based strictly on the buy/sell decisions of computer algorithms. The computer algorithms are designed and perhaps programmed by the traders themselves, based on the historical performance of the encoded strategy tested against historical financial data. 量化交易,也称为算法交易,是基于计算机算法的买卖决策进行的证券交易。计算机算法由交易者自己设计并可能编程,基于编码策略在历史金融数据上的表现进行测试。
Is quantitative trading just a fancy name for technical analysis, then? Granted, a strategy based on technical analysis can be part of a quantitative trading system if it can be fully encoded as computer programs. However, not all technical analysis can be regarded as quantitative trading. For example, certain chartist techniques such as “look for the formation of a head and shoulders pattern” might not be included in a quantitative trader’s arsenal because they are quite subjective and may not be quantifiable. 那么,量化交易只是技术分析的一个花哨名称吗?确实,如果一个基于技术分析的策略能够完全编码成计算机程序,它可以成为量化交易系统的一部分。然而,并非所有技术分析都能被视为量化交易。例如,某些图表分析技术如“寻找头肩顶形态的形成”可能不会被量化交易者采用,因为它们相当主观,且可能无法量化。
Yet quantitative trading includes more than just technical analysis. Many quantitative trading systems incorporate fundamental data in their inputs: numbers such as revenue, cash flow, debt-toequity ratio, and others. After all, fundamental data are nothing but 然而,量化交易不仅仅包括技术分析。许多量化交易系统在其输入中还包含基本面数据:如收入、现金流、债务与股本比率等数字。毕竟,基本面数据不过是数字,计算机当然可以处理输入的任何数字!
numbers, and computers can certainly crunch any numbers that are fed into them! When it comes to judging the current financial performance of a company compared to its peers or compared to its historical performance, the computer is often just as good as human financial analysts-and the computer can watch thousands of such companies all at once. Some advanced quantitative systems can even incorporate news events as inputs: Nowadays, it is possible to use a computer to parse and understand the news report. (After all, I used to be a researcher in this very field at IBM, working on computer systems that can understand approximately what a document is about.) 在判断一家公司当前的财务表现相较于同行或其历史表现时,计算机往往和人类金融分析师一样出色——而且计算机可以同时监控成千上万家公司。一些先进的量化系统甚至可以将新闻事件作为输入:如今,可以使用计算机来解析和理解新闻报道。(毕竟,我曾在 IBM 从事这一领域的研究,开发能够大致理解文档内容的计算机系统。)
So you get the picture: As long as you can convert information into bits and bytes that the computer can understand, it can be regarded as part of quantitative trading. 所以你明白了:只要你能将信息转换成计算机能理解的二进制数据,它就可以被视为量化交易的一部分。
WHO CAN BECOME A QUANTITATIVE TRADER? 谁能成为量化交易员?
It is true that most institutional quantitative traders received their advanced degrees as physicists, mathematicians, engineers, or computer scientists. This kind of training in the hard sciences is often necessary when you want to analyze or trade complex derivative instruments. But those instruments are not the focus in this book. There is no law stating that one can become wealthy only by working with complicated financial instruments. (In fact, one can become quite poor trading complex mortgage-backed securities, as the financial crisis of 2007-08 and the demise of Bear Stearns have shown.) The kind of quantitative trading I focus on is called statistical arbitrage trading. Statistical arbitrage deals with the simplest financial instruments: stocks, futures, and sometimes currencies. One does not need an advanced degree to become a statistical arbitrage trader. If you have taken a few high school-level courses in math, statistics, computer programming, or economics, you are probably as qualified as anyone to tackle some of the basic statistical arbitrage strategies. 确实,大多数机构量化交易员都是以物理学家、数学家、工程师或计算机科学家的身份获得高级学位的。当你想分析或交易复杂的衍生品时,这种硬科学的训练通常是必要的。但本书并不关注这些工具。没有法律规定只有通过复杂的金融工具才能致富。(事实上,正如 2007-08 年金融危机和贝尔斯登的倒闭所显示的,交易复杂的抵押贷款支持证券反而可能导致严重亏损。)我关注的量化交易类型被称为统计套利交易。统计套利处理的是最简单的金融工具:股票、期货,有时还有货币。成为统计套利交易员并不需要高级学位。如果你曾修过几门高中水平的数学、统计学、计算机编程或经济学课程,你很可能已经具备了应对一些基本统计套利策略的资格。
Okay, you say, you don’t need an advanced degree, but surely it gives you an edge in statistical arbitrage trading? Not necessarily. 好吧,你说你不需要高级学位,但它肯定能让你在统计套利交易中占据优势?不一定。
I received a PhD from one of the top physics departments of the world (Cornell University). I worked as a successful researcher in one of the top computer science research groups in the world (at that temple of high-techdom: IBM’s T. J. Watson Research Center). Then I worked in a string of top investment banks and hedge funds as a researcher and finally trader, including Morgan Stanley, Credit Suisse, and so on. As a researcher and trader in these august institutions, I had always strived to use some of the advanced mathematical techniques and training that I possessed and applied them to statistical arbitrage trading. Hundreds of millions of dollars of trades later, what was the result? Losses, more losses, and losses as far as the eye can see, for my employers and their investors. Finally, I quit the financial industry in frustration, set up a spare bedroom in my home as my trading office, and started to trade the simplest but still quantitative strategies I know. These are strategies that any smart high school student can easily research and execute. For the first time in my life, my trading strategies became profitable (one of which is described in Example 3.6), and has been the case ever since. The lesson I learned? A famous quote, often attributed to Albert Einstein, sums it up: “Make everything as simple as possible. But not simpler.” 我获得了世界顶尖物理系之一(康奈尔大学)的博士学位。我曾在世界顶尖的计算机科学研究团队之一工作(在那个高科技圣地:IBM 的 T. J. Watson 研究中心),并取得了成功。随后,我在一系列顶级投资银行和对冲基金担任研究员,最终成为交易员,工作过的机构包括摩根士丹利、瑞士信贷等。作为这些权威机构的研究员和交易员,我一直努力运用自己掌握的一些先进数学技术和训练,将其应用于统计套利交易。经过数亿美元的交易,结果如何?对我的雇主及其投资者来说,只有亏损,亏损,亏损,眼前尽是亏损。最终,我因挫败感离开了金融行业,在家中腾出一间备用卧室作为交易办公室,开始交易我所知道的最简单但仍具量化性质的策略。这些策略任何聪明的高中生都能轻松研究和执行。 我人生中第一次,交易策略开始盈利(其中一个策略在示例 3.6 中有所描述),此后一直如此。我学到的教训?一句著名的名言,常被归于阿尔伯特·爱因斯坦,概括了这一点:“把一切尽可能简化,但不要过度简化。”
(Stay tuned: I will detail more reasons why independent traders can beat institutional money managers at their own game in Chapter 8.) (敬请关注:我将在第 8 章详细说明独立交易者为何能在自己的领域击败机构资金经理的更多原因。)
Though I became a quantitative trader through a fairly traditional path, many others didn’t. Who are the typical independent quantitative traders? Among people I know, they include a former trader at a hedge fund that has gone out of business, a computer programmer who used to work for a brokerage, a former trader at one of the exchanges, a former investment banker, a former biochemist, and an architect. Some of them have received advanced technical training, but others have only basic familiarity of high school-level statistics. Most of them backtest their strategies using basic tools like Excel, though others hire programming contractors to help. Most of them have at some point in their career been professionally involved with the financial world but have now decided that being independent suits their needs better. As far as I know, most of them are doing quite well on their own, while enjoying the enormous freedom that independence brings. 虽然我通过相对传统的路径成为了一名量化交易员,但许多其他人并非如此。我认识的典型独立量化交易员包括:一家已倒闭对冲基金的前交易员、一名前券商的计算机程序员、一家交易所的前交易员、一名前投资银行家、一名前生物化学家和一名建筑师。他们中有些接受过高级技术培训,但也有些仅具备高中水平的统计学基础。他们大多数使用像 Excel 这样的基础工具进行策略回测,当然也有一些会雇佣程序开发承包商来协助。大多数人在职业生涯的某个阶段都曾专业地涉足金融领域,但现在他们认为独立更适合自己的需求。据我所知,他们中的大多数人在独立经营中表现相当不错,同时享受着独立带来的巨大自由。
Besides having gained some knowledge of finance through their former jobs, the fact that these traders have saved up a nest egg for their independent venture is obviously important, too. When one plunges into independent trading, fear of losses and of being isolated from the rest of the world is natural, and so it helps to have both a prior appreciation of risks and some savings to lean on. It is important not to have a need for immediate profits to sustain your daily living, as strategies have intrinsic rates of returns that cannot be hurried (see Chapter 6). 除了通过之前的工作获得了一些金融知识外,这些交易者为他们的独立创业积攒了一笔启动资金,这一点显然也很重要。当一个人投身于独立交易时,对亏损的恐惧以及被世界孤立的感觉是很自然的,因此,事先对风险有所了解并且有一些储蓄作为依靠是有帮助的。重要的是不要急需通过交易获得即时利润来维持日常生活,因为策略有其固有的收益率,无法被催促(参见第 6 章)。
Instead of fear, some of you are planning to trade because of the love of thrill and danger, or an incredible self-confidence that instant wealth is imminent. This is also a dangerous emotion to bring to independent quantitative trading. As I hope to persuade you in this chapter and in the rest of the book, instant wealth is not the objective of quantitative trading. 有些人计划交易并非出于恐惧,而是因为对刺激和危险的热爱,或者对瞬间致富抱有极大的自信。这种情绪带入独立量化交易同样是危险的。正如我希望在本章及全书中说服你的那样,量化交易的目标并非瞬间致富。
The ideal independent quantitative trader is therefore someone who has some prior experience with finance or computer programming, who has enough savings to withstand the inevitable losses and periods without income, and whose emotion has found the right balance between fear and greed. 理想的独立量化交易员因此应当是具备一定金融或计算机编程经验的人,拥有足够的储蓄以承受不可避免的亏损和无收入时期,并且情绪在恐惧与贪婪之间找到了合适的平衡。
THE BUSINESS CASE FOR OUANTITATIVE TRADING 量化交易的商业理由
A lot of us are in the business of quantitative trading because it is exciting, intellectually stimulating, financially rewarding, or perhaps it is the only thing we are good at doing. But for others who may have alternative skills and opportunities, it is worth pondering whether quantitative trading is the best business for you. 我们中许多人从事量化交易业务,是因为它令人兴奋、智力上具有挑战性、财务上有回报,或者这可能是我们唯一擅长的事情。但对于那些拥有其他技能和机会的人来说,值得思考量化交易是否是最适合你的业务。
Despite all the talk about untold hedge fund riches and dollars that are measured in units of billions, in many ways starting a quantitative trading business is very similar to starting any small business. We need to start small, with limited investment (perhaps only a $50,000\$ 50,000 initial investment), and gradually scale up the business as we gain know-how and become profitable. 尽管关于数不清的对冲基金财富和以数十亿美元计的资金有很多讨论,但在许多方面,创办量化交易业务与创办任何小型企业非常相似。我们需要从小做起,投资有限(也许只有一个 $50,000\$ 50,000 的初始投资),并随着我们积累经验和实现盈利逐步扩大业务规模。
In other ways, however, a quantitative trading business is very different from other small businesses. Here are some of the most important. 然而,在其他方面,量化交易业务与其他小型企业有很大不同。以下是一些最重要的区别。
Scalability 可扩展性
Compared to most small businesses (other than certain dot-coms), quantitative trading is very scalable (up to a point). It is easy to find yourselves trading millions of dollars in the comfort of your own home, as long as your strategy is consistently profitable. This is because scaling up often just means changing a number in your program. This number is called leverage. You do not need to negotiate with a banker or a venture capitalist to borrow more capital for your business. The brokerages stand ready and willing to do that. If you are a member of a proprietary trading firm (more on this later in Chapter 4 on setting up a business), you may even be able to obtain leverage far exceeding that allowed by Securities and Exchange Commission (SEC) Regulation T. It is not unheard of for a proprietary trading firm to let you trade a portfolio worth $2\$ 2 million intraday even if you have only $50,000\$ 50,000 equity in your account (a xx40\times 40 leverage). If you trade futures, options, or currencies, you can obtain leverage often exceeding xx10\times 10 from a regular brokerage, sparing yourself the trouble of joining a prop trading firm. (For example, at this writing, you only need about $12,000\$ 12,000 in margin cash to trade one contract of the E-mini S&P 500 future, which has a notional market value of about $167,500\$ 167,500.) At the same time, quantitative trading is definitely not a get-rich-quick scheme. You should hope to have steadily increasing profits, but most likely it won’t be 200 percent a year, unlike starting a dot-com or a software firm. In fact, as I will explain in Chapter 6 on money and risk management, it is dangerous to overleverage in pursuit of overnight riches. 与大多数小型企业(某些互联网公司除外)相比,量化交易具有很强的可扩展性(在一定程度上)。只要你的策略持续盈利,你很容易就在自己家中交易数百万美元。这是因为扩展规模通常只需在程序中更改一个数字。这个数字被称为杠杆。你不需要与银行家或风险投资家谈判来为你的业务借入更多资金。券商随时准备并愿意提供这项服务。如果你是专有交易公司的成员(关于设立公司,第四章会详细介绍),你甚至可能获得远超美国证券交易委员会(SEC)T 条例允许的杠杆。专有交易公司允许你在日内交易价值 $2\$ 2 百万的投资组合,而你的账户中仅有 $50,000\$ 50,000 的权益(即 xx40\times 40 倍杠杆)并非罕见。如果你交易期货、期权或货币,通常可以从普通券商那里获得超过 xx10\times 10 倍的杠杆,免去了加入专有交易公司的麻烦。 (例如,在撰写本文时,交易一份 E-mini 标准普尔 500 期货合约只需大约 $12,000\$ 12,000 的保证金现金,而该合约的名义市场价值约为 $167,500\$ 167,500 。)同时,量化交易绝对不是一夜暴富的计划。你应该期望利润稳步增长,但很可能不会达到每年 200%的水平,这与创办一家互联网公司或软件公司不同。事实上,正如我将在第 6 章关于资金和风险管理中解释的那样,追求一夜暴富而过度使用杠杆是非常危险的。
Demand on Time 时间需求
Running most small businesses takes a lot of your time, at least initially. Quantitative trading takes relatively little of your time. By its very nature, quantitative trading is a highly automated business. 经营大多数小型企业需要花费大量时间,至少在初期是如此。量化交易则相对花费较少时间。量化交易本质上是一个高度自动化的业务。
Sometimes, the more you manually interfere with the system and override its decisions, the worse it will perform. (Again, more on this in Chapter 6.) 有时,你越是手动干预系统并覆盖其决策,系统的表现反而越差。(关于这一点,我将在第 6 章中详细说明。)
How much time you need to spend on day-to-day quantitative trading depends very much on the degree of automation you have achieved. For example, at a well-known hedge fund I used to work for, some colleagues come into the office only once a month. The rest of the time, they just sit at home and occasionally remotely monitor their office computer servers, which are trading for them. 你在日常量化交易中需要花费多少时间,很大程度上取决于你实现了多大程度的自动化。例如,在我曾经工作的一个知名对冲基金里,有些同事一个月只来办公室一次。其余时间,他们就在家里,偶尔远程监控办公室的计算机服务器,这些服务器替他们进行交易。
When I started my independent quantitative trading career, I was in the middle of the pack in terms of automation. The largest block of time I needed to spend was in the morning before the market opened: I typically needed to run various programs to download and process the latest historical data, read company news that came up on my alert screen, run programs to generate the orders for the day, and then launch a few baskets of orders before the market opened and start a program that will launch orders automatically throughout the day. I would also update my spreadsheet to record the previous day’s profit and loss (P&L) of the different strategies I ran based on the brokerages’ statements. All of these took about two hours. 当我开始独立的量化交易生涯时,我的自动化程度处于中等水平。我需要花费最多时间的是在市场开盘前的早晨:我通常需要运行各种程序来下载和处理最新的历史数据,阅读警报屏幕上出现的公司新闻,运行程序生成当天的订单,然后在市场开盘前启动几篮子订单,并启动一个程序,全天自动发出订单。我还会更新电子表格,记录基于券商对账单的不同策略前一天的盈亏(P&L)。所有这些大约花费两个小时。
After that, I spent another half hour near the market close to direct the programs to exit various positions, manually check that those exit orders were correctly transmitted, and close down various automated programs properly. 之后,我又花了半个小时左右,在市场收盘前指挥程序退出各种仓位,手动检查这些退出订单是否正确传达,并妥善关闭各种自动化程序。
In between market open and close, everything is supposed to be on autopilot. Alas, the spirit is willing but the flesh is weak: I often cannot resist the urge to take a look (sometimes many looks) at the intraday P&L of the various strategies on my trading screens. In extreme situations, I might even be transfixed by the huge swings in P&L and be tempted to intervene by manually exiting positions. Fortunately, I have learned to better resist the temptation as time goes on. 在市场开盘和收盘之间,一切都应该处于自动驾驶状态。可惜,精神是愿意的,但身体却很脆弱:我经常忍不住想看一眼(有时是多次)交易屏幕上各种策略的盘中盈亏。在极端情况下,我甚至可能被盈亏的大幅波动吸引,诱使我手动干预退出仓位。幸运的是,随着时间的推移,我学会了更好地抵制这种诱惑。
The urge to intervene manually is also strong when I have too much time on my hands. Hence, instead of just staring at your trading screen, it is actually important to engage yourself in some other, more healthful and enjoyable activities, such as going to the gym during the trading day. 当我有太多空闲时间时,手动干预的冲动也很强烈。因此,与其只是盯着交易屏幕,不如让自己参与一些其他更健康、更愉快的活动,比如在交易日去健身房。
As my automation improved and the assets under management grew (both my own and my investors’), fewer and fewer of these steps were taken manually, until they reached zero some years ago. Aside from monitoring and intervening when software and connectivity broke down (and they occasionally did), my colleagues and I routinely do absolutely nothing every day in terms of actually trading our strategies. We are in a fully autonomous vehicle, so to speak, except our eyes are still on the road and ready to apply the brakes when the system breaks down. Just like an autonomous vehicle, our automated trading system will send out all sorts of alarms to whoever is on duty when that happens. 随着我的自动化水平提高以及管理资产规模的增长(包括我自己的和投资者的),这些步骤中手动操作的越来越少,直到几年前完全为零。除了在软件和连接出现故障时进行监控和干预(偶尔会发生),我和我的同事们每天在实际交易策略方面几乎什么都不做。可以说,我们处于一辆完全自动驾驶的车辆中,只不过我们的眼睛仍然盯着路面,准备在系统出现故障时踩刹车。就像自动驾驶车辆一样,当系统出现问题时,我们的自动交易系统会向当班人员发出各种警报。
(Lest you think my trader life is too idyllic to be true, the breakdown did happen when I was on a Caribbean beach during the Covid-19 selloff in February 2020. Fortunately, after frantically texting my office colleagues, we ended the day with a nice profit. What happened back at the cruise ship as it sailed back to Florida was a bit more troubling.) (如果你认为我的交易员生活过于理想化而不真实,系统故障确实发生在 2020 年 2 月新冠疫情抛售期间,当时我正身处加勒比海滩。幸运的是,在疯狂地给办公室同事发短信后,我们当天还是取得了不错的利润。至于那艘驶回佛罗里达的游轮上发生的事情,则要麻烦得多。)
When I said quantitative trading takes little of your time, I am referring to the operational side of the business. If you want to grow your business, or keep your current profits from declining due to increasing competition, you will need to spend time doing research and backtesting on new strategies. But research and development of new strategies is the creative part of any business, and it can be done whenever you want to. So, between the market’s open and close, I do my research; answer emails; chat with other traders, collaborators, or clients; take a hike; and so on. I do some of that work in the evening and on weekends, too, but only when I feel like itnot because I am obligated to. 当我说量化交易占用你很少时间时,我指的是业务的操作层面。如果你想发展你的业务,或者想防止因竞争加剧而导致当前利润下降,你就需要花时间研究和回测新策略。但新策略的研究和开发是任何业务中最具创造性的部分,而且你可以随时进行。因此,在市场开盘和收盘之间,我会做研究;回复邮件;与其他交易员、合作者或客户聊天;去远足;等等。我也会在晚上和周末做一些这类工作,但只有当我想做的时候才做——而不是因为有义务。
The Nonnecessity of Marketing 营销的非必要性
Here is the biggest and most obvious difference between quantitative trading and other small businesses. Marketing is crucial to most small businesses-after all, you generate your revenue from other people, who base their purchase decisions on things other than price alone. In trading, your counterparties in the financial marketplace base their purchase decisions on nothing but the price. Unless you 这是量化交易与其他小型企业之间最大且最明显的区别。营销对大多数小型企业至关重要——毕竟,你的收入来自其他人,而他们的购买决策不仅仅基于价格。在交易中,金融市场中的交易对手的购买决策仅仅基于价格。除非你
are managing money for other people (which is beyond the scope of this book), there is absolutely no marketing to do in a quantitative trading business. This may seem obvious and trivial, but is actually an important difference, since the business of quantitative trading allows you to focus exclusively on your product (the strategy and the software), and not on anything that has to do with influencing other people’s perception of yourself. To many people, this may be the ultimate beauty of starting your own quantitative trading business. Of course, if you plan to manage other people’s money, marketing will be more important. But even then, I have learned that a good investment product (a.k.a. a consistently profitable strategy) practically markets itself. Conversely, even if a superstar salesperson managed to market a mediocre product to an unwitting customer, retention of that customer is going to be an uphill battle. 如果你是为别人管理资金(这超出了本书的范围),那么在量化交易业务中根本不需要做任何营销。这看起来似乎显而易见且微不足道,但实际上这是一个重要的区别,因为量化交易业务允许你专注于你的产品(策略和软件),而不必去影响别人对你的看法。对许多人来说,这可能是创办自己的量化交易业务的最大魅力。当然,如果你计划管理别人的资金,营销就会变得更重要。但即便如此,我也发现一个好的投资产品(即持续盈利的策略)几乎可以自我营销。相反,即使一个超级销售员成功地将一个平庸的产品推销给了一个毫无戒心的客户,留住这个客户也将是一场艰难的战斗。
THE WAY FORWARD 前进的道路
If you are convinced that you want to become a quantitative trader, a number of questions immediately follow: How do you find the right strategy to trade? How do you recognize a good versus a bad strategy even before devoting any time to backtesting them? How do you rigorously backtest them? If the backtest performance is good, what steps do you need to take to implement the strategy, in terms of both the business structure and the technological infrastructure? If the strategy is profitable in initial real-life trading, how does one scale up the capital to make it into a growing income stream while managing the inevitable (but, hopefully, only occasional) losses that come with trading? These nuts and bolts of quantitative trading will be tackled in Chapters 2 through 6. 如果你确信自己想成为一名量化交易员,紧接着会有一系列问题:如何找到合适的交易策略?如何在投入时间进行回测之前就识别出好策略和坏策略?如何严谨地进行回测?如果回测表现良好,实施策略时在业务结构和技术基础设施方面需要采取哪些步骤?如果策略在初期的实盘交易中盈利,如何扩大资金规模,将其转变为不断增长的收入流,同时管理交易中不可避免(但希望只是偶尔发生)的亏损?这些量化交易的基本要点将在第 2 至第 6 章中讨论。
Though the list of processes to go through in order to get to the final destination of sustained and growing profitability may seem long and daunting, in reality it may be faster and easier than many other businesses. When I first started as an independent trader, it took me only three months to find and backtest my first 虽然为了达到持续且增长的盈利这一最终目标,需要经历的流程看起来既长又令人生畏,但实际上这可能比许多其他业务更快、更容易。当我刚开始作为独立交易员时,我只用了三个月时间就找到了并回测了我的第一个策略。
new strategy, set up a new brokerage account with $100,000\$ 100,000 capital, implement the execution system, and start trading the strategy. The strategy immediately became profitable in the first month. Back in the dot-com era, I started an internet software firm. It took about 3 times more investment, 5 times more human power, and 24 times longer to find out that the business model didn’t work, whereupon all investors including myself lost 100 percent of their investments. Compared to that experience, it really has been a breeze trading quantitatively and profitably. 新的策略,开设一个新的经纪账户,投入 $100,000\$ 100,000 资金,实施执行系统,并开始交易该策略。该策略在第一个月就立即实现了盈利。回到互联网泡沫时代,我创办了一家互联网软件公司。那时大约投入了三倍的资金,五倍的人力,花费了二十四倍的时间才发现商业模式行不通,随后包括我自己在内的所有投资者都损失了 100%的投资。相比之下,量化交易并盈利真的是轻而易举。
CIIAPTER 2 第二章
Fishing for Ideas 寻找灵感
Where Can We Find Good 我们在哪里能找到好的
Strategies? 策略?
This is the surprise: Finding a trading idea is actually not the hardest part of building a quantitative trading business. There are hundreds, if not thousands, of trading ideas that are in the public sphere at any time, accessible to anyone at little or no cost. Many authors of these trading ideas will tell you their complete methodologies in addition to their backtest results. There are finance and investment books, newspapers and magazines, mainstream media websites, academic papers available online or in the nearest public library, trader forums, blogs, and on and on. Some of the ones I find valuable are listed in Table 2.1, but this is just a small fraction of what is available out there. 这才是令人惊讶的地方:找到一个交易思路实际上并不是建立量化交易业务中最难的部分。任何时候,公开领域中都有成百上千的交易思路,任何人都可以以极低甚至零成本获取。许多交易思路的作者不仅会告诉你他们的回测结果,还会完整地分享他们的方法论。有金融和投资书籍、报纸和杂志、主流媒体网站、在线或最近公共图书馆的学术论文、交易者论坛、博客,等等。我认为有价值的一些资源列在表 2.1 中,但这只是众多资源中的一小部分。
In the past, because of my own academic bent, I regularly perused the various preprints published by business school professors or downloaded the latest online finance journal articles to scan for good prospective strategies. In fact, the first strategy I traded when I became independent was based on such academic research. (It was a version of the PEAD strategy referenced in Chapter 7.) Increasingly, however, I have found that many strategies described by academics are either too complicated, out of date (perhaps the once-profitable strategies have already lost their power due to competition), or require expensive data to backtest (such as historical fundamental data). Furthermore, many of these academic strategies 过去,由于我自身的学术倾向,我经常浏览商学院教授发布的各种预印本,或下载最新的在线金融期刊文章,以寻找有潜力的策略。事实上,我独立交易的第一个策略就是基于这样的学术研究。(它是第 7 章中提到的 PEAD 策略的一个版本。)然而,随着时间推移,我越来越发现许多学者描述的策略要么过于复杂,要么已经过时(可能曾经盈利的策略由于竞争已经失去效力),要么需要昂贵的数据来进行回测(例如历史基本面数据)。此外,许多这些学术策略
TABLE 2.1 Sources of Trading Ideas 表 2.1 交易想法的来源
Type 类型
URL
Academic 学术
Business schools' finance professors' websites 商学院金融教授的网站
www.hbs.edu/research/research.html
Social Science Research Network 社会科学研究网络
www.ssrn.com
National Bureau of Economic Research 美国国家经济研究局
www.nber.org
Business schools' quantitative finance seminars 商学院的量化金融研讨会
Quantpedia (aggregator of all academic papers on quantitative trading strategies!) Quantpedia(量化交易策略所有学术论文的聚合器!)
quantpedia.com
Financial blogs and podcasts 财经博客和播客
Flirting with Models
www.thinknewfound.com
Mutiny Fund
mutinyfund.com/podcast/
Chat with Traders 与交易者聊天
chatwithtraders.com
Eran Raviv
eranraviv.com
Sibyl/Godot Finance
godotfinance.com
Party at the Moontower 月塔派对
moontowermeta.com
My own! 我自己的!
epchan.blogspot.com
Trader forums 交易员论坛
Elite Trader 精英交易员
www.Elitetrader.com
Wealth-Lab
www.wealth-lab.com
Twitter 推特
Benn Eifert
@bennpeifert
Corey Hoffstein
@choffstein
Quantocracy (retweet of new articles) Quantocracy(新文章转发)
@Quantocracy
Mike Harris
@mikeharrisNY
Euan Sinclair
@sinclaireuan
My own! 我自己的!
@chanep
Newspaper and magazines 报纸和杂志
Stocks, Futures and Options magazine 《股票、期货与期权》杂志
www.sfomag.com
Type URL
Academic
Business schools' finance professors' websites www.hbs.edu/research/research.html
Social Science Research Network www.ssrn.com
National Bureau of Economic Research www.nber.org
Business schools' quantitative finance seminars www.ieor.columbia.edu/seminars/ financialengineering
Quantpedia (aggregator of all academic papers on quantitative trading strategies!) quantpedia.com
Financial blogs and podcasts
Flirting with Models www.thinknewfound.com
Mutiny Fund mutinyfund.com/podcast/
Chat with Traders chatwithtraders.com
Eran Raviv eranraviv.com
Sibyl/Godot Finance godotfinance.com
Party at the Moontower moontowermeta.com
My own! epchan.blogspot.com
Trader forums
Elite Trader www.Elitetrader.com
Wealth-Lab www.wealth-lab.com
Twitter
Benn Eifert @bennpeifert
Corey Hoffstein @choffstein
Quantocracy (retweet of new articles) @Quantocracy
Mike Harris @mikeharrisNY
Euan Sinclair @sinclaireuan
My own! @chanep
Newspaper and magazines
Stocks, Futures and Options magazine www.sfomag.com| Type | URL |
| :--- | :--- |
| Academic | |
| Business schools' finance professors' websites | www.hbs.edu/research/research.html |
| Social Science Research Network | www.ssrn.com |
| National Bureau of Economic Research | www.nber.org |
| Business schools' quantitative finance seminars | www.ieor.columbia.edu/seminars/ financialengineering |
| Quantpedia (aggregator of all academic papers on quantitative trading strategies!) | quantpedia.com |
| Financial blogs and podcasts | |
| Flirting with Models | www.thinknewfound.com |
| Mutiny Fund | mutinyfund.com/podcast/ |
| Chat with Traders | chatwithtraders.com |
| Eran Raviv | eranraviv.com |
| Sibyl/Godot Finance | godotfinance.com |
| Party at the Moontower | moontowermeta.com |
| My own! | epchan.blogspot.com |
| Trader forums | |
| Elite Trader | www.Elitetrader.com |
| Wealth-Lab | www.wealth-lab.com |
| Twitter | |
| Benn Eifert | @bennpeifert |
| Corey Hoffstein | @choffstein |
| Quantocracy (retweet of new articles) | @Quantocracy |
| Mike Harris | @mikeharrisNY |
| Euan Sinclair | @sinclaireuan |
| My own! | @chanep |
| Newspaper and magazines | |
| Stocks, Futures and Options magazine | www.sfomag.com |
work only on small-cap stocks, whose illiquidity may render actual trading profits far less impressive than their backtests would suggest. 只在小盘股上操作,这些股票的流动性不足可能导致实际交易利润远不如回测结果所显示的那么可观。
This is not to say that you will not find some gems if you are persistent enough, but I have found that many traders’ forums or blogs may suggest simpler strategies that are equally profitable. You might be skeptical that people would actually post truly profitable strategies in the public space for all to see. After all, doesn’t this disclosure increase the competition and decrease the profitability of the strategy? And you would be right: Most ready-made strategies 这并不是说如果你足够坚持,就找不到一些宝石,但我发现许多交易者论坛或博客可能会建议一些同样有利可图的更简单策略。你可能会怀疑人们是否真的会在公共空间发布真正有利可图的策略供所有人查看。毕竟,这种披露难道不会增加竞争并降低策略的盈利能力吗?你的怀疑是有道理的:大多数现成的策略
that you may find in these places actually do not withstand careful backtesting. Just like the academic studies, the strategies from traders’ forums may have worked only for a little while, or they work for only a certain class of stocks, or they work only if you don’t factor in transaction costs. However, the trick is that you can often modify the basic strategy and make it profitable. (Many of these caveats as well as a few common variations on a basic strategy will be examined in detail in Chapter 3.) 你可能在这些地方找到的实际上经不起仔细的回测。就像学术研究一样,交易者论坛上的策略可能只在短时间内有效,或者只适用于某一类股票,或者只有在不考虑交易成本的情况下才有效。然而,诀窍是你通常可以修改基本策略使其变得有利可图。(许多这些警告以及一些基本策略的常见变体将在第 3 章中详细探讨。)
For example, someone once suggested a strategy to me that was described in Wealth-Lab (see Table 2.1), where it was claimed that it had a high Sharpe ratio. When I backtested the strategy, it turned out not to work as well as advertised. I then tried a few simple modifications, such as decreasing the holding period and entering and exiting at different times than suggested, and was able to turn this strategy into one of my main profit centers. If you are diligent and creative enough to try the multiple variations of a basic strategy, chances are you will find one of those variations that is highly profitable. 例如,有人曾向我推荐过一个在 Wealth-Lab 中描述的策略(见表 2.1),声称该策略具有很高的夏普比率。当我对该策略进行回测时,结果并不像宣传的那样有效。随后我尝试了一些简单的修改,比如缩短持有期,以及在不同时间点进出场,最终将该策略转变为我的主要盈利来源之一。如果你足够勤奋且富有创造力,尝试基本策略的多种变体,很可能会找到其中一个非常盈利的变体。
When I left the institutional money management industry to trade on my own, I worried that I would be cut off from the flow of trading ideas from my colleagues and mentors. But then I found out that one of the best ways to gather and share trading ideas is to start your own trading blog-for every trading “secret” that you divulge to the world, you will be rewarded with multiple ones from your readers. (The person who suggested the Wealth-Lab strategy to me was a reader who works 12 time zones away. If it weren’t for my blog, there was little chance that I would have met him and benefited from his suggestion.) In fact, what you thought of as secrets are more often than not well-known ideas to many others! What truly make a strategy proprietary and its secrets worth protecting are the tricks and variations that you have come up with, not the plain-vanilla version. 当我离开机构资金管理行业开始自己交易时,我担心会与同事和导师之间的交易思路断了联系。但后来我发现,收集和分享交易思路的最佳方式之一就是开设自己的交易博客——你向世界透露的每一个交易“秘密”,都会从读者那里获得多个回馈。(向我推荐 Wealth-Lab 策略的人是一位相隔 12 个时区的读者。如果不是因为我的博客,我几乎不可能遇到他并从他的建议中受益。)事实上,你认为的秘密往往是许多人都熟知的想法!真正使策略具有专有性和值得保护的秘密,是你自己想出的技巧和变体,而不是普通的基础版本。
Furthermore, your bad ideas will quickly get shot down by your online commentators, thus potentially saving you from major losses. After I glowingly described a seasonal stock-trading strategy on my blog that was developed by some finance professors, a reader promptly went ahead and backtested that strategy and reported that it didn’t work. (See my blog entry, “Seasonal Trades in Stocks,” at 此外,你的糟糕想法会很快被网络评论者驳回,从而可能帮你避免重大损失。在我在博客上热情描述了一种由一些金融教授开发的季节性股票交易策略后,一位读者立即进行了该策略的回测,并报告说该策略不起作用。(参见我博客文章“股票的季节性交易”,网址为 epchan.blogspot.com/2007/11/seasonal-trades-in-stocks.html and the reader’s comment therein. This strategy is described in more detail in Example 7.6.) Of course, I would not have traded this strategy without backtesting it on my own anyway, and indeed, my subsequent backtest confirmed his findings. But the fact that my reader found significant flaws with the strategy is important confirmation that my own backtest is not erroneous. epchan.blogspot.com/2007/11/seasonal-trades-in-stocks.html 以及该文中的读者评论。该策略在示例 7.6 中有更详细的描述。)当然,我自己也不会在没有回测的情况下交易该策略,事实上,我后续的回测也证实了他的发现。但这位读者发现该策略存在重大缺陷这一事实,是对我自己回测结果非错误的重要确认。
All in all, I have found that it is actually easier to gather and exchange trading ideas as an independent trader than when I was working in the secretive hedge fund world in New York. When I worked at Millennium Partners-a 40-billion-dollar hedge fund on Fifth Avenue-one trader ripped a published paper out of the hands of his programmer, who happened to have picked it up from the trader’s desk. He was afraid the programmer might learn his “secrets.” (Lest you think that Millennium Partners is a bad place to work, I should add that its founder, Izzy Englander, personally spoke with my next employer to vouch for me.) That may be because people are less wary of letting you know their secrets when they think you won’t be obliterating their profits by allocating $100\$ 100 million to that strategy. 总的来说,我发现作为一个独立交易者,收集和交流交易想法实际上比我在纽约那个神秘的对冲基金世界工作时要容易得多。当我在第五大道上的一家管理着 400 亿美元资金的对冲基金 Millennium Partners 工作时,有一次一位交易员从他的程序员手中抢过一篇发表的论文,这个程序员碰巧是从交易员的桌子上拿到的。他害怕程序员会学到他的“秘密”。(如果你认为 Millennium Partners 是个糟糕的工作地方,我还得补充一句,它的创始人 Izzy Englander 亲自和我的下一任雇主交谈,为我作了担保。)这可能是因为当人们认为你不会通过分配 $100\$ 100 百万资金到那个策略来摧毁他们的利润时,他们对告诉你他们的秘密会少一些戒备。
No, the difficulty is not the lack of ideas. The difficulty is to develop a taste for which strategy is suitable for your personal circumstances and goals, and which ones look viable even before you devote the time to diligently backtest them. This taste for prospective strategies is what I will try to convey in this chapter. 不,困难不在于缺乏想法。困难在于培养一种鉴别能力,判断哪种策略适合你的个人情况和目标,以及在你投入时间认真回测之前,哪些策略看起来是可行的。这种对潜在策略的鉴别能力是我将在本章中尝试传达的内容。
HOW TO IDENTIFY A STRATEGY THAT SUITS YOU 如何识别适合你的策略
Whether a strategy is viable often does not have anything to do with the strategy itself-it has to do with YOU. Here are some considerations. 一个策略是否可行,往往与策略本身无关——而与你有关。以下是一些需要考虑的因素。
Your Working Hours 你的工作时间
Do you trade only part time? If so, you would probably want to consider only strategies that hold overnight and not the intraday strategies. Otherwise, you may have to fully automate your strategies 你是兼职交易吗?如果是这样,你可能只想考虑持仓过夜的策略,而不是日内策略。否则,你可能需要将策略完全自动化
(see Chapter 5 on execution) so that they can run on autopilot most of the time and alert you only when problems occur. (参见第 5 章执行部分),这样它们大部分时间可以自动运行,只有在出现问题时才提醒你。
When I was working full time for others and trading part time for myself, I traded a simple strategy in my personal account that required entering or adjusting limit orders on a few exchange-traded funds (ETFs) once a day, before the market opened. Then, when I first became independent, my level of automation was still relatively low, so I considered only strategies that require entering orders once before the market opens and once before the close. Later on, I added a program that can automatically scan real-time market data and transmit orders to my brokerage account throughout the trading day when certain conditions are met. So trading can be a “parttime” pursuit for you, even if you derive more income from it than your day job, as long as you trade quantitatively. 当我全职为别人工作,同时兼职为自己交易时,我在个人账户中交易一个简单的策略,该策略只需每天在市场开盘前对几个交易所交易基金(ETF)下达或调整限价单。后来,当我刚开始独立时,我的自动化水平仍然相对较低,所以我只考虑那些需要在市场开盘前和收盘前各下达一次订单的策略。之后,我增加了一个程序,可以自动扫描实时市场数据,并在满足特定条件时,在整个交易日内向我的经纪账户发送订单。因此,即使你从交易中获得的收入超过了你的日常工作,只要你进行量化交易,交易也可以成为你的“兼职”事业。
Your Programming Skills 你的编程技能
Are you good at programming? If you know some programming languages such as Visual Basic or even Java, C#, or C++, you can explore high-frequency strategies, and you can also trade a large number of securities. Otherwise, settle for strategies that trade only once a day, or trade just a few stocks, futures, or currencies. These can often be traded using Excel loaded with your broker’s macros. (This constraint may be overcome if you don’t mind the expense of hiring a software contractor. Again, see Chapter 5 for more details.) 你擅长编程吗?如果你懂一些编程语言,比如 Visual Basic,甚至是 Java、C#或 C++,你可以探索高频策略,也可以交易大量的证券。否则,就只能选择每天只交易一次的策略,或者只交易少量的股票、期货或货币。这些通常可以通过加载了你经纪商宏的 Excel 来交易。(如果你不介意雇佣软件承包商的费用,这个限制可能会被克服。更多细节请参见第 5 章。)
Your Trading Capital 你的交易资金
Do you have a lot of capital for trading as well as expenditure on infrastructure and operation? In general, I would not recommend quantitative trading for an account with less than $50,000\$ 50,000 capital. Let’s say the dividing line between a high- versus low-capital account is $100,000\$ 100,000. Capital availability affects many choices; the first is what financial instruments you should trade and what strategies you should apply to them. The second is whether you should open a retail brokerage account or a proprietary trading account (more on this in Chapter 4 on setting up your business). For now, 你是否有大量资金用于交易以及基础设施和运营支出?一般来说,我不建议用少于 $50,000\$ 50,000 资金的账户进行量化交易。假设高资本账户和低资本账户的分界线是 $100,000\$ 100,000 。资金的多少会影响许多选择;首先是你应该交易哪些金融工具,以及应该对它们应用哪些策略。其次是你应该开设零售经纪账户还是专有交易账户(关于这点将在第 4 章“建立你的业务”中详细介绍)。目前,
I will consider instrument and strategy choices with capital constraint in mind. 我会在考虑资金限制的情况下选择交易工具和策略。
With a low-capital account, we need to find strategies that can utilize the maximum leverage available. (Of course, getting a higher leverage is beneficial only if you have a consistently profitable strategy.) Trading futures, currencies, and options can offer you higher leverage than stocks; intraday positions allow a Regulation T leverage of 4 , while interday (overnight) positions allow only a leverage of 2 , requiring double the amount of capital for a portfolio of the same size. Finally, capital (or leverage) availability determines whether you should focus on directional trades (long or short only) or dollar-neutral trades (hedged or pair trades). A dollarneutral portfolio (meaning the market value of the long positions equals the market value of the short positions) or market-neutral portfolio (meaning the beta of the portfolio with respect to a market index is close to zero, where beta measures the ratio between the expected returns of the portfolio and the expected returns of the market index) require twice the capital or leverage of a long- or short-only portfolio. So even though a hedged position is less risky than an unhedged position, the returns generated are correspondingly smaller and may not meet your personal requirements. For certain brokers (such as Interactive Brokers), they offer a portfolio margin, which depends on the estimated risk of your portfolio. For example, if your portfolio holds only long positions of risky smallcap stocks, they may require minimum 50 percent overnight margin (equivalent to a maximum leverage of 2). But if your portfolio holds a dollar-neutral portfolio of large-cap stocks, they may require only 20 percent or less of overnight margin. To sign up for portfolio margin, your broker may require that your account’s net asset value (NAV) meets a minimum, often about $100K\$ 100 \mathrm{~K}. But for that $100K\$ 100 \mathrm{~K} of cash, you may be able to hold a portfolio that consists of $250K\$ 250 \mathrm{~K} of long stock positions, and $250K\$ 250 \mathrm{~K} of short stock positions. 对于低资金账户,我们需要找到能够利用最大杠杆的策略。(当然,只有当你的策略持续盈利时,获得更高的杠杆才有意义。)交易期货、货币和期权可以提供比股票更高的杠杆;日内持仓允许 4 倍的 Regulation T 杠杆,而隔夜持仓只允许 2 倍杠杆,这意味着同样规模的投资组合需要双倍的资金。最后,资金(或杠杆)的可用性决定了你是应该专注于方向性交易(仅做多或仅做空),还是美元中性交易(对冲或配对交易)。美元中性投资组合(即多头头寸的市值等于空头头寸的市值)或市场中性投资组合(即投资组合相对于市场指数的贝塔值接近零,贝塔衡量投资组合预期收益与市场指数预期收益的比率)需要的资金或杠杆是仅做多或仅做空投资组合的两倍。 因此,尽管对冲头寸的风险低于非对冲头寸,但其产生的回报相应较小,可能无法满足您的个人需求。对于某些经纪商(如 Interactive Brokers),他们提供组合保证金,这取决于您投资组合的估计风险。例如,如果您的投资组合仅持有风险较高的小盘股多头头寸,他们可能要求至少 50%的隔夜保证金(相当于最大杠杆为 2 倍)。但如果您的投资组合持有一个美元中性的大盘股组合,他们可能只要求 20%或更低的隔夜保证金。要申请组合保证金,您的经纪商可能要求您的账户净资产价值(NAV)达到最低标准,通常约为 $100K\$ 100 \mathrm{~K} 。但凭借这 $100K\$ 100 \mathrm{~K} 的现金,您可能能够持有一个由 $250K\$ 250 \mathrm{~K} 的多头股票头寸和 $250K\$ 250 \mathrm{~K} 的空头股票头寸组成的投资组合。
Capital availability also imposes a number of indirect constraints. It affects how much you can spend on various infrastructure, data, and software. For example, if you have low trading capital, your online brokerage will not be likely to supply you with real-time market data for too many stocks, so you can’t really have a strategy that 资金的可用性也会带来许多间接的限制。它影响你在各种基础设施、数据和软件上的支出。例如,如果你的交易资金较少,在线经纪商很可能不会为你提供太多股票的实时市场数据,因此你实际上无法制定一个
requires real-time market data over a large universe of stocks. (You can, of course, subscribe to a third-party market data vendor, but then the extra cost may not be justifiable if your trading capital is low.) Similarly, clean historical stock data with high frequency costs more than historical daily stock data, so a high-frequency stocktrading strategy may not be feasible with small capital expenditure. For historical stock data, there is another quality that may be even more important than their frequencies: whether the data are free of survivorship bias. I will define survivorship bias in the following section. Here, we just need to know that historical stock data without survivorship bias are much more expensive than those that have such a bias. Yet if your data have survivorship bias, the backtest result can be unreliable. 需要大量股票实时市场数据的策略。(当然,你可以订阅第三方市场数据供应商,但如果你的交易资金较少,额外的费用可能难以合理化。)同样,干净的高频历史股票数据比每日历史股票数据更昂贵,因此高频股票交易策略在资金投入较少的情况下可能不可行。对于历史股票数据,还有一个可能比频率更重要的质量:数据是否没有幸存者偏差。我将在下一节中定义幸存者偏差。这里,我们只需知道,没有幸存者偏差的历史股票数据比带有这种偏差的数据要贵得多。然而,如果你的数据存在幸存者偏差,回测结果可能不可靠。
The same consideration applies to news-whether you can afford a high-coverage, real-time news source such as Bloomberg determines whether a news-driven strategy is a viable one. Same for fundamental (i.e., companies’ financial) data-whether you can afford a good historical database with fundamental data on companies determines whether you can build a strategy that relies on such data. 同样的考虑也适用于新闻——你是否能够负担得起高覆盖率、实时新闻来源(如彭博社)决定了基于新闻的策略是否可行。对于基本面数据(即公司的财务数据)也是如此——你是否能够负担得起包含公司基本面数据的优质历史数据库,决定了你是否能够构建依赖这些数据的策略。
Table 2.2 lists how capital (whether for trading or expenditure) constraint can influence your many choices. 表 2.2 列出了资本(无论是用于交易还是支出)限制如何影响你的诸多选择。
This table is, of course, not a set of hard-and-fast rules, just some issues to consider. For example, if you have low capital but opened an account at a proprietary trading firm, then you will be free of many of the considerations above (though not expenditure on infrastructure). I started my life as an independent quantitative trader with $100,000\$ 100,000 at a retail brokerage account (I chose Interactive Brokers), and I traded only directional, intraday stock strategies at first. But when I developed a strategy that sometimes requires much more leverage in order to be profitable, I signed up as a member of a proprietary trading firm as well. (Yes, you can have both, or more, accounts simultaneously. In fact, there are good reasons to do so if only for the sake of comparing their execution speeds and access to liquidity. See “Choosing a Brokerage or Proprietary Trading Firm” in Chapter 4.) 这张表当然不是一套严格的规则,只是一些需要考虑的问题。例如,如果你的资金较少,但在一家专有交易公司开设了账户,那么你将免于考虑上述许多问题(尽管基础设施支出除外)。我作为独立量化交易员的起点是在零售经纪账户(我选择了 Interactive Brokers)中用 $100,000\$ 100,000 开始的,起初我只交易方向性、日内股票策略。但当我开发出一种有时需要更多杠杆才能盈利的策略时,我也成为了一家专有交易公司的会员。(是的,你可以同时拥有两个或更多账户。事实上,如果仅仅是为了比较它们的执行速度和流动性访问,这样做是有充分理由的。详见第 4 章“选择经纪公司或专有交易公司”。)
Despite my frequent admonitions here and elsewhere to beware of historical data with survivorship bias, when I first started I 尽管我在这里和其他地方经常告诫大家要警惕带有幸存者偏差的历史数据,但当我刚开始时,我
TABLE 2.2 How Capital Availability Affects Your Many Choices 表 2.2 资金可用性如何影响你的多种选择
Low Capital 资金较少
High Capital 高资本
Proprietary trading firm's membership 专有交易公司的会员资格
Retail brokerage account 零售经纪账户
Futures, currencies, options 期货、货币、期权
Everything, including stocks 所有内容,包括股票
Intraday 日内
Both intra- and interday (overnight) 日内和隔夜(跨日)
Directional 方向性
Directional or market neutral 方向性或市场中性
Small stock universe for intraday trading 用于日内交易的小型股票范围
Large stock universe for intraday trading 用于日内交易的大型股票范围
Daily historical data with survivorship bias 带有存活偏差的每日历史数据
No historical fundamental data on stocks 没有股票的历史基本面数据
Survivorship bias-free historical fundamental data on stocks 无幸存者偏差的股票历史基本面数据
Low Capital High Capital
Proprietary trading firm's membership Retail brokerage account
Futures, currencies, options Everything, including stocks
Intraday Both intra- and interday (overnight)
Directional Directional or market neutral
Small stock universe for intraday trading Large stock universe for intraday trading
Daily historical data with survivorship bias High-frequency historical data, survivorship bias-free
Low-coverage or delayed news source High-coverage, real-time news source
No historical news database Survivorship bias-free historical news database
No historical fundamental data on stocks Survivorship bias-free historical fundamental data on stocks| Low Capital | High Capital |
| :--- | :--- |
| Proprietary trading firm's membership | Retail brokerage account |
| Futures, currencies, options | Everything, including stocks |
| Intraday | Both intra- and interday (overnight) |
| Directional | Directional or market neutral |
| Small stock universe for intraday trading | Large stock universe for intraday trading |
| Daily historical data with survivorship bias | High-frequency historical data, survivorship bias-free |
| Low-coverage or delayed news source | High-coverage, real-time news source |
| No historical news database | Survivorship bias-free historical news database |
| No historical fundamental data on stocks | Survivorship bias-free historical fundamental data on stocks |
downloaded only the split-and-dividend-adjusted Yahoo! Finance data using a now defunct program. But now you have easy and free access to that via many third-party APIs (more on the different databases and tools in Chapter 3). This database is not survivorship bias-free-but I was still using it for most of my backtesting for more than two years! In fact, a trader I know, who traded a milliondollar account, typically used such biased data for his backtesting, and yet his strategies were still profitable. How could this be possible? Probably because these were intraday strategies. So, you see, as long as you are aware of the limitations of your tools and data, you can cut many corners and still succeed. (There are now affordable survivorship-bias-free stock databases such as Sharadar, so I recommend you pay a small fee to use them.) 以前我只用一个现已废弃的程序下载经过拆股和分红调整的雅虎财经数据。但现在你可以通过许多第三方 API 轻松免费地获取这些数据(关于不同数据库和工具的更多内容见第 3 章)。这个数据库并没有消除幸存者偏差——但我仍然用了它超过两年进行大部分回测!事实上,我认识的一位交易者,他管理着一百万美元的账户,通常也使用这种带有偏差的数据进行回测,但他的策略依然盈利。这怎么可能呢?很可能是因为这些是日内交易策略。所以,你看,只要你了解工具和数据的局限性,你就可以走捷径并依然取得成功。(现在有价格合理的无幸存者偏差股票数据库,比如 Sharadar,所以我建议你支付少量费用来使用它们。)
Though futures afford you high leverage, some futures contracts have such a large size that it would still be impossible for a small account to trade. For instance, though the E-mini S&P 500 future (ES) on the Chicago Mercantile Exchange has a margin requirement of only $12,000\$ 12,000, it has a market value of about $167,500\$ 167,500, and a 10 percent or larger daily move will wipe out your account that holds only the minimum margin cash. In case you think that a 10 percent or larger move for the S&P 500 index is extremely rare, check out how many times it happened from February to April 2020. Instead, 虽然期货为你提供了高杠杆,但有些期货合约的合约规模非常大,小账户仍然无法进行交易。例如,芝加哥商品交易所的迷你标普 500 期货(ES)保证金要求仅为 $12,000\$ 12,000 ,但其市值约为 $167,500\$ 167,500 ,而且标普 500 指数每日 10%或更大的波动将会使仅持有最低保证金现金的账户爆仓。如果你认为标普 500 指数出现 10%或更大波动的情况极为罕见,可以查看 2020 年 2 月至 4 月期间发生了多少次。相反,
you can trade the micro E-mini contracts (MES), which has only one-tenth of the margin requirement and market value of the regular E-mini. 你可以交易微型迷你合约(MES),其保证金要求和市值仅为普通迷你合约的十分之一。
Your Goal 你的目标
Most people who choose to become traders want to earn a steady (hopefully increasing) monthly, or at least quarterly, income. But you may be independently wealthy, and long-term capital gain is all that matters to you. The strategies to pursue for short-term income versus long-term capital gain are distinguished mainly by their holding periods. Obviously, if you hold a stock for an average of one year, you won’t be generating much monthly income (unless you started trading a while ago and have launched a new subportfolio every month, which you proceed to hold for a year-that is, you stagger your portfolios.) More subtly, even if your strategy holds a stock only for a month on average, your month-to-month profit fluctuation is likely to be fairly large (unless you hold hundreds of different stocks in your portfolio, which can be a result of staggering your portfolios), and therefore you cannot count on generating income on a monthly basis. This relationship between holding period (or, conversely, the trading frequency) and consistency of returns (that is, the Sharpe ratio or, conversely, the drawdown) will be discussed further in the following section. The upshot here is that the more regularly you want to realize profits and generate income, the shorter your holding period should be. 大多数选择成为交易者的人希望获得稳定的(希望是不断增长的)月度收入,或者至少是季度收入。但你可能是独立富有的,长期资本增值才是你关心的全部。追求短期收入与长期资本增值的策略主要区别在于持有期。显然,如果你平均持有一只股票一年,你不会产生太多的月度收入(除非你早已开始交易,并且每个月启动一个新的子投资组合,然后持有一年——也就是说,你错开投资组合的时间)。更微妙的是,即使你的策略平均只持有一只股票一个月,你的月度利润波动也可能相当大(除非你持有数百只不同的股票,这可能是通过错开投资组合实现的),因此你不能指望每月都能产生收入。持有期(或相反的交易频率)与收益稳定性(即夏普比率或相反的回撤)之间的关系将在下一节中进一步讨论。 这里的结论是,如果你想更频繁地实现利润并产生收入,那么你的持有期应该更短。
There is a misconception aired by some investment advisers, though, that if your goal is to achieve maximum long-term capital growth, then the best strategy is a buy-and-hold one. This notion has been shown to be mathematically false. In reality, maximum long-term growth is achieved by finding a strategy with the maximum Sharpe ratio (defined in the next section), provided that you have access to sufficiently high leverage. Therefore, comparing a short-term strategy with a very short holding period, small annual return, but very high Sharpe ratio, to a long-term strategy with a long holding period, high annual return, but lower Sharpe ratio, it is still preferable to choose the short-term strategy even if your goal is 不过,一些投资顾问存在一个误解,认为如果你的目标是实现最大化的长期资本增长,那么最好的策略就是买入并持有。这一观点已被数学证明是错误的。实际上,最大化长期增长是通过找到具有最大夏普比率(将在下一节定义)的策略来实现的,前提是你能够获得足够高的杠杆。因此,将一个持有期非常短、年回报率较小但夏普比率非常高的短期策略,与一个持有期较长、年回报率较高但夏普比率较低的长期策略进行比较时,即使你的目标是
long-term growth, barring tax considerations and the limitation on your margin borrowing (more on this surprising fact later in Chapter 6 on money and risk management). 长期增长,除非考虑税务因素和保证金借贷的限制(关于这一令人惊讶的事实将在第 6 章关于资金和风险管理中详细讨论),仍然更倾向于选择短期策略。
A TASTE FOR PLAUSIBLE STRATEGIES AND THEIR PITFALLS 对合理策略及其陷阱的品味
Now, let’s suppose that you have read about several potential strategies that fit your personal requirements. Presumably, someone else has done backtests on these strategies and reported that they have great historical returns. Before proceeding to devote your time to performing a comprehensive backtest on this strategy (not to mention devoting your capital to actually trading this strategy), there are a number of quick checks you can do to make sure you won’t be wasting your time or money. 现在,假设你已经阅读了几种符合你个人需求的潜在策略。大概有人已经对这些策略进行了回测,并报告它们在历史上有很好的收益。在你投入时间对该策略进行全面回测之前(更不用说投入资金实际交易该策略),你可以做一些快速检查,以确保不会浪费你的时间或金钱。
How Does It Compare with a Benchmark, and How Consistent Are Its Returns? 它与基准相比如何?收益的稳定性如何?
This point seems obvious when the strategy in question is a stocktrading strategy that buys (but not shorts) stocks. Everybody seems to know that if a long-only strategy returns 10 percent a year, it is not too fantastic because investing in an index fund will generate as good, if not better, returns on average. However, if the strategy is a long-short dollar-neutral strategy (i.e., the portfolio holds long and short positions with equal capital), then 10 percent is quite a good return, because then the benchmark of comparison is not the market index, but a riskless asset such as the yield of the three-month US Treasury bill (which at the time of this writing is just about zero percent). 当所讨论的策略是一个只买入(而不做空)股票的股票交易策略时,这一点似乎显而易见。大家似乎都知道,如果一个仅做多的策略年回报率为 10%,这并不算特别出色,因为投资指数基金平均来说会产生同样好,甚至更好的回报。然而,如果策略是一个多空美元中性策略(即投资组合持有等额资本的多头和空头头寸),那么 10%的回报率就相当不错了,因为此时比较的基准不是市场指数,而是无风险资产,比如三个月期美国国库券的收益率(在本文撰写时大约为零)。
Another issue to consider is the consistency of the returns generated by a strategy. Though a strategy may have the same average return as the benchmark, perhaps it delivered positive returns every month while the benchmark occasionally suffered some very bad months. In this case, we would still deem the strategy superior. 另一个需要考虑的问题是策略产生回报的稳定性。尽管一个策略的平均回报率可能与基准相同,但它可能每个月都实现正回报,而基准偶尔会经历一些非常糟糕的月份。在这种情况下,我们仍然会认为该策略更优。
This leads us to consider the information ratio or Sharpe ratio (Sharpe, 1994), rather than returns, as the proper performance measurement of a quantitative trading strategy. 这使我们考虑用信息比率或夏普比率(Sharpe, 1994)来衡量量化交易策略的表现,而不是单纯的回报率。
Information ratio is the measure to use when you want to assess a long-only strategy. It is defined as 信息比率是在评估仅做多策略时使用的衡量指标。其定义为
" Information Ratio "=(" Average of Excess Returns ")/(" Standard Deviation of Excess Returns ")\text { Information Ratio }=\frac{\text { Average of Excess Returns }}{\text { Standard Deviation of Excess Returns }}
Now the benchmark is usually the market index to which the securities you are trading belong. For example, if you trade only small-cap stocks, the market index should be the Standard & Poor’s small-cap index or the Russell 2000 index, rather than the S&P 500. If you are trading just gold futures, then the market index should be gold spot price, rather than a stock index. 现在,基准通常是你所交易证券所属的市场指数。例如,如果你只交易小盘股,市场指数应为标准普尔小盘股指数或罗素 2000 指数,而不是标普 500 指数。如果你只交易黄金期货,那么市场指数应为黄金现货价格,而不是股票指数。
The Sharpe ratio is actually a special case of the information ratio, suitable when we have a dollar-neutral strategy, so that the benchmark to use is always the risk-free rate. In practice, most traders use the Sharpe ratio even when they are trading a directional (long or short only) strategy, simply because it facilitates comparison across different strategies. Everyone agrees on what the riskfree rate is, but each trader can use a different market index to come up with their own favorite information ratio, rendering comparison difficult. 夏普比率实际上是信息比率的一种特殊情况,适用于我们拥有美元中性策略时,此时使用的基准总是无风险利率。在实际操作中,大多数交易者即使在交易方向性(仅做多或仅做空)策略时也会使用夏普比率,原因仅仅是它便于不同策略之间的比较。大家对无风险利率的定义是一致的,但每个交易者可能会使用不同的市场指数来计算自己喜欢的信息比率,这使得比较变得困难。
(Actually, there are some subtleties in calculating the Sharpe ratio related to whether and how to subtract the risk-free rate, how to annualize your Sharpe ratio for ease of comparison, and so on. I will cover these subtleties in the next chapter, which will also contain an example on how to compute the Sharpe ratio for a dollarneutral and a long-only strategy.) (实际上,计算夏普比率时存在一些细微差别,涉及是否以及如何减去无风险利率,如何将夏普比率年化以便于比较,等等。我将在下一章中讨论这些细节,下一章还将包含一个关于如何计算美元中性和仅做多策略的夏普比率的示例。)
If the Sharpe ratio is such a nice performance measure across different strategies, you may wonder why it is not quoted more often instead of returns. In fact, when a colleague and I went to 如果夏普比率是跨不同策略都很好的绩效衡量指标,你可能会想为什么它不像收益率那样被更频繁引用。事实上,当我和一位同事去到
SAC Capital Advisors (assets under management then: $14\$ 14 billion) to pitch a strategy, their then-head of risk management said to us: “Well, a high Sharpe ratio is certainly nice, but if you can get a higher return instead, we can all go buy bigger houses with our bonuses!” This reasoning is quite wrong: A higher Sharpe ratio will actually allow you to make more profits in the end, since it allows you to trade at a higher leverage. It is the leveraged return that matters in the end, not the nominal return of a trading strategy. For more on this, see Chapter 6 on money and risk management. SAC Capital Advisors(当时管理资产规模: $14\$ 14 十亿美元)在推介一项策略时,他们当时的风险管理主管对我们说:“嗯,高夏普比率当然不错,但如果你能获得更高的回报,我们都可以用奖金去买更大的房子!”这种想法是错误的:更高的夏普比率实际上最终会让你赚取更多利润,因为它允许你以更高的杠杆进行交易。最终重要的是杠杆回报,而不是交易策略的名义回报。关于这方面的更多内容,请参见第 6 章关于资金和风险管理的部分。
(And no, our pitching to SAC was not successful, but for reasons quite unrelated to the returns of the strategy. In any case, at that time neither my colleague nor I were familiar enough with the mathematical connection between the Sharpe ratio and leveraged returns to make a proper counterargument to that head of risk management. SAC pleaded guilty to insider trading charges and ceased to be a hedge fund in 2013.) (而且,我们向 SAC 推介并未成功,但原因与策略的回报无关。无论如何,当时我和我的同事都不够熟悉夏普比率与杠杆回报之间的数学联系,无法对那位风险管理主管提出有力的反驳。SAC 于 2013 年因内幕交易指控认罪,并停止作为对冲基金运营。)
Now that you know what a Sharpe ratio is, you may want to find out what kind of Sharpe ratio your candidate strategies have. Often, they are not reported by the authors of that strategy, and you will have to email them in private for this detail. And often, they will oblige, especially if the authors are finance professors; but if they refuse, you have no choice but to perform the backtest yourself. Sometimes, however, you can still make an educated guess based on the flimsiest of information: 既然你已经知道了夏普比率是什么,你可能想了解你的候选策略的夏普比率是多少。通常,策略的作者不会报告这个数据,你需要私下给他们发邮件询问这个细节。通常情况下,他们会答应,尤其是当作者是金融学教授时;但如果他们拒绝,你只能自己进行回测。不过,有时你仍然可以根据极为有限的信息做出有根据的猜测:
If a strategy trades only a few times a year, chances are its Sharpe ratio won’t be high. This does not prevent it from being part of your multistrategy trading business, but it does disqualify the strategy from being your main profit center. 如果一个策略一年只交易几次,那么它的夏普比率很可能不会很高。这并不妨碍它成为你多策略交易业务的一部分,但它确实使该策略无法成为你的主要利润来源。
If a strategy has deep (e.g., more than 10 percent) or lengthy (e.g., four or more months) drawdowns, it is unlikely that it will have a high Sharpe ratio. I will explain the concept of drawdown in the next section, but you can just visually inspect the equity curve (which is also the cumulative profit-and-loss curve, assuming no redemption or cash infusion) to see if it is very bumpy. Any peak-to-trough of that curve is a drawdown. (See Figure 2.1 for an example.) 如果一个策略出现较深(例如超过 10%)或较长时间(例如四个月或更长)的回撤,那么它不太可能拥有较高的夏普比率。我将在下一节中解释回撤的概念,但你也可以通过直观地观察权益曲线(假设没有赎回或现金注入,这也是累计盈亏曲线)来判断它是否非常波动。该曲线的任何峰值到谷底的部分就是一次回撤。(参见图 2.1 示例。)
As a rule of thumb, any strategy that has a Sharpe ratio of less than 1 is not suitable as a stand-alone strategy. For a strategy that achieves profitability almost every month, its (annualized) Sharpe ratio is typically greater than 2 . For a strategy that is profitable almost every day, its Sharpe ratio is usually greater than 3. I will show you how to calculate Sharpe ratios for various strategies in Examples 3.4, 3.6, and 3.7 in the next chapter. 作为经验法则,任何夏普比率低于 1 的策略都不适合作为独立策略。对于几乎每个月都盈利的策略,其(年化)夏普比率通常大于 2。对于几乎每天都盈利的策略,其夏普比率通常大于 3。我将在下一章的示例 3.4、3.6 和 3.7 中向你展示如何计算各种策略的夏普比率。
How Deep and Long Is the Drawdown? 回撤有多深多长?
A strategy suffers a drawdown whenever it has lost money recently. A drawdown at a given time tt is defined as the difference between the current equity value (assuming no redemption or cash infusion) of the portfolio and the global maximum of the equity curve occurring on or before time tt. The maximum drawdown is the difference between the global maximum of the equity curve with the global minimum of the curve after the occurrence of the global maximum (time order matters here: The global minimum must occur later than the global maximum). The global maximum is called the high watermark. The maximum drawdown duration is the longest it has taken for the equity curve to recover losses. 策略在最近亏损时会出现回撤。某一时刻 tt 的回撤定义为投资组合当前净值(假设没有赎回或现金注入)与该时刻 tt 之前或当时的净值曲线全局最高点之间的差值。最大回撤是净值曲线全局最高点与该最高点之后净值曲线全局最低点之间的差值(时间顺序很重要:全局最低点必须发生在全局最高点之后)。全局最高点称为高水位线。最大回撤持续时间是净值曲线恢复亏损所用的最长时间。
More often, drawdowns are measured in percentage terms, with the denominator being the equity at the high watermark and the numerator being the loss of equity since reaching the high watermark. 回撤通常以百分比形式衡量,分母为高水位线时的净值,分子为达到高水位线后净值的损失。
Figure 2.1 illustrates a typical drawdown, the maximum drawdown, and the maximum drawdown duration of an equity curve. I will include a tutorial in Example 3.5 on how to compute these quantities from a table of daily profits and losses using either Excel, MATLAB, Python, or R. One thing to keep in mind: The maximum drawdown and the maximum drawdown duration do not typically overlap over the same period. 图 2.1 展示了一个典型的回撤、最大回撤以及权益曲线的最大回撤持续时间。在示例 3.5 中,我将包含一个教程,介绍如何使用 Excel、MATLAB、Python 或 R 从每日盈亏表中计算这些指标。需要注意的一点是:最大回撤和最大回撤持续时间通常不会在同一时间段内重叠。
Defined mathematically, drawdown seems abstract and remote. However, in real life there is nothing more gut-wrenching and emotionally disturbing to suffer than a drawdown if you’re a trader. (This is as true for independent traders as for institutional ones. When an institutional trading group is suffering a drawdown, everybody seems to feel that life has lost meaning and spend their days dreading the eventual shutdown of the strategy or maybe even the group as a whole.) It is therefore something we would want to minimize. You have to ask yourself, realistically, how deep and how long a drawdown will you be able to tolerate and not liquidate your portfolio and shut down your strategy? Would it be 20 percent and three months, or 10 percent and one month? Comparing your tolerance with the numbers obtained from the backtest of a candidate strategy determines whether that strategy is for you. 从数学上定义来看,回撤似乎抽象且遥远。然而,在现实生活中,对于交易者来说,没有什么比遭遇回撤更令人心痛和情绪波动的了。(这对于独立交易者和机构交易者同样适用。当一个机构交易团队遭遇回撤时,似乎每个人都会觉得生活失去了意义,整天担心策略最终被关闭,甚至整个团队可能被解散。)因此,这是我们想要尽量减少的风险。你必须现实地问自己,能够容忍多深、多长时间的回撤而不清仓并关闭策略?是 20%且持续三个月,还是 10%且持续一个月?将你的容忍度与候选策略回测得到的数据进行比较,可以决定该策略是否适合你。
Even if the author of the strategy you read about did not publish the precise numbers for drawdowns, you should still be able to make an estimate from a graph of its equity curve. For example, in Figure 2.1, you can see that the longest drawdown goes from around February 2001 to around October 2002. So the maximum drawdown duration is about 20 months. Also, at the beginning of the maximum drawdown, the equity was about $2.3 xx10^(4)\$ 2.3 \times 10^{4}, and at the end, about $0.5 xx10^(4)\$ 0.5 \times 10^{4}. So the maximum drawdown is about $1.8 xx10^(4)\$ 1.8 \times 10^{4}. 即使你阅读的策略作者没有公布具体的回撤数字,你仍然可以通过其权益曲线图来估算。例如,在图 2.1 中,你可以看到最长的回撤大约从 2001 年 2 月持续到 2002 年 10 月左右。因此,最大回撤持续时间约为 20 个月。此外,在最大回撤开始时,权益约为 $2.3 xx10^(4)\$ 2.3 \times 10^{4} ,结束时约为 $0.5 xx10^(4)\$ 0.5 \times 10^{4} 。所以最大回撤约为 $1.8 xx10^(4)\$ 1.8 \times 10^{4} 。
How Will Transaction Costs Affect the Strategy? 交易成本将如何影响策略?
Every time a strategy buys and sells a security, it incurs a transaction cost. The more frequently it trades, the larger the impact of 每当策略买入和卖出一只证券时,都会产生交易成本。交易频率越高,影响就越大。
transaction costs will be on the profitability of the strategy. These transaction costs are not just due to commission fees charged by the broker. There will also be the cost of liquidity-when you buy and sell securities at their market prices, you are paying the bid-ask spread. If you buy and sell securities using limit orders, however, you avoid the liquidity costs but incur opportunity costs. This is because your limit orders may not be executed, and therefore you may miss out on the potential profits of your trade. Also, when you buy or sell a large chunk of securities, you will not be able to complete the transaction without impacting the prices at which this transaction is done. (Sometimes just displaying a bid to buy a large number of shares for a stock can move the prices higher without your having bought a single share yet!) This effect on the market prices due to your own order is called market impact, and it can contribute to a large part of the total transaction cost when the security is not very liquid. 交易成本将影响策略的盈利能力。这些交易成本不仅仅是经纪商收取的佣金费用。还存在流动性成本——当你以市场价格买卖证券时,你实际上是在支付买卖价差。然而,如果你使用限价单买卖证券,则可以避免流动性成本,但会产生机会成本。这是因为你的限价单可能不会被执行,因此你可能会错失交易的潜在利润。此外,当你买卖大量证券时,无法在不影响交易价格的情况下完成交易。(有时仅仅显示一个买入大量股票的买单,就能推动价格上涨,而你甚至还没有买入一股!)这种由于你自己的订单对市场价格产生的影响称为市场冲击,当证券流动性不足时,它可能占据总交易成本的很大一部分。
Finally, there can be a delay between the time your program transmits an order to your brokerage and the time it is executed at the exchange, due to delays on the internet or various softwarerelated issues. This delay can cause a slippage, the difference between the price that triggers the order and the execution price. Of course, this slippage can be of either sign, but on average it will be a cost rather than a gain to the trader. (If you find that it is a gain on average, you should change your program to deliberately delay the transmission of the order by a few seconds!) 最后,从你的程序向经纪商发送订单到订单在交易所执行之间,可能会有延迟,这种延迟是由于互联网或各种软件相关问题引起的。这种延迟可能导致滑点,即触发订单的价格与执行价格之间的差异。当然,这种滑点可能是正负两种情况,但平均来看,它对交易者来说通常是成本而非收益。(如果你发现平均来看是收益,那你应该修改程序,故意延迟订单传输几秒钟!)
Transaction costs vary widely for different kinds of securities. You can typically estimate it by taking half the average bid-ask spread of a security and then adding the commission if your order size is not much bigger than the average sizes of the best bid and offer. If you are trading S&P 500 stocks, for example, the average transaction cost (excluding commissions, which depend on your brokerage) would be about 5 basis points (that is, five-hundredths of a percent). Note that I count a round-trip transaction of a buy and then a sell as two transactions-hence, a round trip will cost 10 basis points in this example. If you are trading ES, the E-mini S&P 500 futures, the transaction cost will be about 1 basis point. Sometimes the authors whose strategies you read about will disclose that they 交易成本因不同类型的证券而有很大差异。通常,你可以通过取某证券平均买卖价差的一半,再加上佣金(如果你的订单规模不比最佳买卖报价的平均规模大多少)来估算交易成本。例如,如果你交易标普 500 股票,平均交易成本(不包括佣金,佣金取决于你的经纪商)大约是 5 个基点(即百分之零点零五)。注意,我将买入然后卖出的往返交易计为两笔交易——因此,在此例中,往返交易的成本将是 10 个基点。如果你交易 ES,即迷你标普 500 期货,交易成本大约是 1 个基点。有时你阅读的策略作者会披露他们的交易成本。
have included transaction costs in their backtest performance, but more often they will not. If they haven’t, then you just to have to assume that the results are before transactions, and apply your own judgment to its validity. 他们在回测表现中包含了交易成本,但更多情况下并没有。如果没有包含,那么你只能假设结果是在扣除交易成本之前的,并根据自己的判断来评估其有效性。
As an example of the impact of transaction costs on a strategy, consider this simple mean-reverting strategy on ES. It is based on Bollinger Bands: that is, every time the price exceeds plus or minus 2 moving standard deviations of its moving average, short or buy, respectively. Exit the position when the price reverts back to within 1 moving standard deviation of the moving average. If you allow yourself to enter and exit every five minutes, you will find that the Sharpe ratio is about 3 without transaction costs-very excellent indeed! Unfortunately, the Sharpe ratio is reduced to -3 if we subtract 1 basis point as transaction costs, making it a very unprofitable strategy. 作为交易成本对策略影响的一个例子,考虑这个基于 ES 的简单均值回归策略。它基于布林带:也就是说,每当价格超过其移动平均线的正负 2 个移动标准差时,分别做空或买入。当价格回归到移动平均线的 1 个移动标准差以内时退出仓位。如果允许每五分钟进出一次,你会发现夏普比率在没有交易成本的情况下约为 3——非常优秀!不幸的是,如果扣除 1 个基点的交易成本,夏普比率会降至-3,使得该策略非常不盈利。
For another example of the impact of transaction costs, see Example 3.7. 关于交易成本影响的另一个例子,请参见示例 3.7。
Does the Data Suffer from Survivorship Bias? 数据是否存在幸存者偏差?
A historical database of stock prices that does not include stocks that have disappeared due to bankruptcies, delistings, mergers, or acquisitions suffer from survivorship bias, because only “survivors” of those often unpleasant events remain in the database. (The same term can be applied to mutual fund or hedge fund databases that do not include funds that went out of business.) Backtesting a strategy using data with survivorship bias can be dangerous because it may inflate the historical performance of the strategy. This is especially true if the strategy has a “value” bent; that is, it tends to buy stocks that are cheap. Some stocks were cheap because the companies were going bankrupt shortly. So if your strategy includes only those cases when the stocks were very cheap but eventually survived (and maybe prospered) and neglects those cases where the stocks finally did get delisted, the backtest performance will, of course, be much better than what a trader would actually have suffered at that time. 一个不包含因破产、退市、合并或收购而消失的股票的历史股价数据库存在幸存者偏差,因为数据库中只保留了那些经历过这些通常不愉快事件后仍“幸存”的股票。(同样的术语也适用于不包含已倒闭基金的共同基金或对冲基金数据库。)使用带有幸存者偏差的数据进行策略回测可能是危险的,因为这可能会夸大策略的历史表现。尤其当策略带有“价值”倾向时更是如此;也就是说,它倾向于买入便宜的股票。有些股票之所以便宜,是因为公司即将破产。因此,如果你的策略只包含那些股票价格非常便宜但最终幸存(甚至繁荣)的情况,而忽略了那些最终被退市的股票情况,那么回测表现当然会比交易者当时实际遭受的情况好得多。
So when you read about a “buy on the cheap” strategy that has great performance, ask the author of that strategy whether it was 所以当你看到一个“低价买入”策略表现优异时,问问该策略的作者是否考虑了这一点。
tested on survivorship bias-free (sometimes called “point-in-time”) data. If not, be skeptical of its results. (A toy strategy that illustrates this can be found in Example 3.3.) 测试时应使用无存活偏差(有时称为“时点”)数据。如果没有使用此类数据,则应对其结果持怀疑态度。(一个说明这一点的示例策略见示例 3.3。)
How Did the Performance of the Strategy Change over the Years? 策略的表现随时间变化如何?
Most strategies performed much better 10 years ago than now, at least in a backtest. There weren’t as many hedge funds running quantitative strategies then. Also, bid-ask spreads were much wider then: So if you assumed the transaction cost today was applicable throughout the backtest, the earlier period would have unrealistically high returns. 大多数策略在十年前的表现远好于现在,至少在回测中是如此。那时运行量化策略的对冲基金还不多。此外,当时买卖价差更大:如果假设当前的交易成本适用于整个回测期,那么早期阶段的收益将被高估,不切实际。
Survivorship bias in the data might also contribute to the good performance in the early period. The reason that survivorship bias mainly inflates the performance of an earlier period is that the further back we go in our backtest, the more missing stocks we will have. Since some of those stocks are missing because they went out of business, a long-only strategy would have looked better in the early period of the backtest than what the actual profit and loss (P&L) would have been at that time. Therefore, when judging the suitability of a strategy, one must pay particular attention to its performance in the most recent few years, and not be fooled by the overall performance, which inevitably includes some rosy numbers back in the old days. 数据中的幸存者偏差也可能导致早期表现良好。幸存者偏差主要夸大早期表现的原因是,回测时间越往前推,缺失的股票就越多。由于部分股票缺失是因为它们已经倒闭,单纯做多的策略在回测的早期阶段看起来会比当时的实际盈亏(P&L)更好。因此,在判断策略的适用性时,必须特别关注其最近几年的表现,而不要被整体表现所迷惑,因为整体表现不可避免地包含了过去一些美化的数字。
Finally, regime shifts in the financial markets can mean that financial data from an earlier period simply cannot be fitted to the same model that is applicable today. Major regime shifts can occur because of changes in securities market regulation (such as decimalization of stock prices or the elimination of the short-sale rule, which I allude to in Chapter 5) or other macroeconomic events (such as the subprime mortgage meltdown). 最后,金融市场的体制转变意味着早期的金融数据可能根本无法适用于今天的同一模型。重大体制转变可能由于证券市场监管的变化(例如股票价格的小数化或取消卖空规则,我在第 5 章中提到过)或其他宏观经济事件(例如次贷危机)而发生。
This point may be hard to swallow for many statistically minded readers. Many of them may think that the more data there is, the more statistically robust the backtest should be. This is true only when the financial time series is generated by a stationary process. Unfortunately, financial time series is famously nonstationary, due to all of the reasons given earlier. 这一点对于许多统计学倾向的读者来说可能难以接受。他们中的许多人可能认为数据越多,回测的统计稳健性就越强。只有当金融时间序列是由平稳过程生成时,这种观点才成立。不幸的是,金融时间序列因之前提到的各种原因而著名地非平稳。
It is possible to incorporate such regime shifts into a sophisticated “super”-model (as I will discuss in Example 7.1), but it is much simpler if we just demand that our model deliver good performance on recent data. 可以将这种状态转变纳入一个复杂的“超级”模型中(正如我将在示例 7.1 中讨论的),但如果我们只要求模型在近期数据上表现良好,那就简单得多。
Does the Strategy Suffer from Data-Snooping Bias? 策略是否存在数据窥探偏差?
If you build a trading strategy that has 100 parameters, it is very likely that you can optimize those parameters in such a way that the historical performance will look fantastic. It is also very likely that the future performance of this strategy will look nothing like its historical performance and will turn out to be very poor. By having so many parameters, you are probably fitting the model to historical accidents in the past that will not repeat themselves in the future. Actually, this so-called data-snooping bias is very hard to avoid even if you have just one or two parameters (such as entry and exit thresholds), and I will leave the discussion on how to minimize its impact to Chapter 3. But, in general, the more rules the strategy has, and the more parameters the model has, the more likely it is going to suffer data-snooping bias. Simple models are often the ones that will stand the test of time. (See the sidebar on my views on artificial intelligence and stock picking.) 如果你构建了一个拥有 100 个参数的交易策略,很可能你可以通过优化这些参数,使得历史表现看起来非常出色。但同样很可能的是,这个策略的未来表现将与其历史表现大相径庭,最终表现非常糟糕。参数越多,你很可能就是在将模型拟合到过去那些不会在未来重现的历史偶然事件上。实际上,即使你只有一两个参数(比如进场和出场阈值),这种所谓的数据窥探偏差也很难避免,我将在第三章讨论如何尽量减少其影响。但总体来说,策略规则越多,模型参数越多,越容易受到数据窥探偏差的影响。简单的模型往往是经得起时间考验的。(参见侧栏中我对人工智能与选股的看法。)
ARTIFICIAL INTELLIGENCE AND STOCK PICKING ^(1){ }^{1} 人工智能与选股 ^(1){ }^{1}
There was an article in the New York Times a short while ago about a new hedge fund launched by Mr. Ray Kurzweil, a pioneer in the field of artificial intelligence. (Thanks to my fellow blogger, Yaser Anwar, who pointed it out to me.) According to Kurzweil, the stock-picking decisions in this fund are supposed to be made by machines that “. . . can observe billions of market transactions to see patterns we could never see” (quoted in Duhigg, 2006). 不久前,《纽约时报》刊登了一篇文章,介绍了人工智能领域先驱雷·库兹韦尔先生新成立的一只对冲基金。(感谢我的博主朋友 Yaser Anwar 提醒我关注这件事。)据库兹韦尔称,这只基金的选股决策将由机器完成,这些机器“……能够观察数十亿的市场交易,从中发现我们人类永远无法看到的模式”(引自 Duhigg,2006 年)。
While I am certainly a believer in algorithmic trading, it is a lot more difficult to successfully apply artificial intelligence to trading. 虽然我确实相信算法交易,但要成功地将人工智能应用于交易却要困难得多。
At the risk of oversimplification, we can characterize artificial intelligence (AI) as trying to fit past data points into a function with many, many parameters. This is the case for some of the favorite tools of AI: neural networks, decision trees, and genetic algorithms. With many parameters, we can for sure capture small patterns that no human can see. But do these patterns persist? 冒着过于简化的风险,我们可以将人工智能(AI)描述为试图用一个包含许多参数的函数来拟合过去的数据点。这正是人工智能的一些常用工具的情况:神经网络、决策树和遗传算法。参数众多,我们确实可以捕捉到人类无法察觉的小模式。但这些模式会持续存在吗?
Or are they random noises that will never replay again? Experts in AI assure us that they have many safeguards against fitting the function to transient noise. And indeed, such tools have been very effective in consumer marketing and credit card fraud detection. Apparently, the patterns of consumers and thefts are quite consistent over time, allowing such AI algorithms to work even with a large number of parameters. However, from my experience, these safeguards work far less well in financial markets prediction, and overfitting to the noise in historical data remains a rampant problem. As a matter of fact, I have built financial predictive models based on many of these AI algorithms in the past. Every time a carefully constructed model that seems to work marvels in backtest came up, they inevitably performed miserably going forward. The main reason for this seems to be that the amount of statistically independent financial data is far more limited compared to the billions of independent consumer and credit transactions available. (You may think that there is a lot of tick-by-tick financial data to mine, but such data is serially correlated and far from independent.) 还是说它们只是永远不会重现的随机噪声?人工智能专家向我们保证,他们有许多防止模型拟合瞬时噪声的保护措施。事实上,这些工具在消费者营销和信用卡欺诈检测方面非常有效。显然,消费者和盗窃的模式随时间相当一致,使得这些人工智能算法即使在参数众多的情况下也能发挥作用。然而,根据我的经验,这些保护措施在金融市场预测中效果远不如预期,历史数据中的噪声过拟合问题依然普遍存在。事实上,我过去曾基于许多这些人工智能算法构建金融预测模型。每当一个经过精心构建、在回测中表现出色的模型出现时,它们在实际应用中不可避免地表现惨淡。造成这种情况的主要原因似乎是,统计独立的金融数据量远远少于数十亿条独立的消费者和信用交易数据。(你可能认为有大量逐笔金融数据可供挖掘,但这些数据是序列相关的,远非独立。)
This is not to say that no methods based on AI will work in prediction. The ones that work for me are usually characterized by these properties: 这并不是说基于人工智能的方法在预测中完全不起作用。对我有效的方法通常具有以下特征:
The targets are nonreflexive-targets that will not change their values in response to too many people successfully predicting them. If returns can be predicted, returns will change in response to the prediction. On the other hand, if weather can be predicted, weather will not change in response. Yet accurate weather prediction can benefit agricultural futures traders. Examples of financial targets that are nonreflexive include earnings surprises and nonfarm payroll surprises, both of which my research team has been successful in predicting (see predictnow.ai/blog/us-nonfarm-employment-prediction-using-riwi-corp-alternative-data/ for the latter). 目标是非反身性的——这些目标的数值不会因为太多人成功预测而发生变化。如果收益可以被预测,收益就会因预测而改变。另一方面,如果天气可以被预测,天气不会因预测而改变。然而,准确的天气预测可以使农业期货交易者受益。非反身性的金融目标的例子包括盈利意外和非农就业意外,我的研究团队在预测这些方面都取得了成功(关于后者,请参见 predictnow.ai/blog/us-nonfarm-employment-prediction-using-riwi-corp-alternative-data/)。
The features (predictors) that are used as input for predictions are meaningful, numerous, and carefully scrubbed and engineered. For example many fundamental stock databases have embedded look-ahead bias because they report “restated” financials, not “point-in-time” financials. This look-ahead bias will make the backtest looks great, but will cause live trading performance to be much worse. 用于预测输入的特征(预测变量)是有意义的、数量众多且经过仔细清洗和工程处理的。例如,许多基本面股票数据库存在嵌入式的前瞻性偏差,因为它们报告的是“重述”财务数据,而非“时点”财务数据。这种前瞻性偏差会使回测结果看起来非常好,但会导致实盘交易表现大打折扣。
The prediction is applied to private instead of public targets. For example, instead of predicting the returns of SPY, Al should be used to predict whether your proprietary trading signals will be profitable. This way, you can avoid competing with many of the world’s best financial machine learners in predicting the exact same target. This application of AI is called metalabeling. See how we applied metalabeling successfully at predictnow. ai/blog/what-is-the-probability-of-profit-of-your-next-trade-introducing-predictnow-ai/. 预测应用于私有目标而非公开目标。例如,不是预测 SPY 的回报率,而是应该使用 AI 来预测你专有的交易信号是否会盈利。通过这种方式,你可以避免与世界上许多顶尖的金融机器学习者竞争预测完全相同的目标。这种 AI 的应用被称为元标记。参见我们如何在 predictnow 成功应用元标记。ai/blog/what-is-the-probability-of-profit-of-your-next-trade-introducing-predictnow-ai/。
Does the Strategy "Fly under the Radar" of Institutional Money Managers? 该策略是否“避开”了机构资金经理的关注?
Since this book is about starting a quantitative trading business from scratch, and not about starting a hedge fund that manages multiple millions of dollars, we should not be concerned whether a strategy is one that can absorb multiple millions of dollars. (Capacity is the technical term for how much a strategy can absorb without negatively impacting its returns.) In fact, quite the opposite-you should look for those strategies that fly under the radar of most institutional investors, for example, strategies that have very low capacities because they trade too often, strategies that trade very few stocks every day, or strategies that have very infrequent positions (such as some seasonal trades in commodity futures described in Chapter 7). Those niches are the ones that are likely to still be profitable because they have not yet been completely arbitraged away by the gigantic hedge funds. 由于本书是关于从零开始建立量化交易业务,而不是关于管理数百万美元的对冲基金,我们不必担心策略是否能够吸收数百万美元的资金。(容量是指策略在不对其回报产生负面影响的情况下能够吸收的资金量的专业术语。)事实上,恰恰相反——你应该寻找那些大多数机构投资者未曾注意到的策略,例如,容量非常低的策略,因为它们交易频繁,或者每天只交易极少数股票,或者持仓非常不频繁的策略(如第 7 章中描述的一些商品期货季节性交易)。这些细分市场很可能仍然有利可图,因为它们尚未被庞大的对冲基金完全套利消除。
SUMMARY 总结
Finding prospective quantitative trading strategies is not difficult. There are: 寻找潜在的量化交易策略并不困难。以下是一些来源:
Business school and other economic research websites. 商学院及其他经济研究网站。
Financial websites and blogs focusing on the retail investors. 面向散户投资者的金融网站和博客。
Trader forums where you can exchange ideas with fellow traders. 交易者论坛,你可以在这里与其他交易者交流想法。
Twitter! 推特!
After you have done a sufficient amount of Net surfing or scrolling through your Twitter feed, you will find a number of promising trading strategies. Whittle them down to just a handful, based on your personal circumstances and requirements, and by applying the screening criteria (more accurately described as healthy skepticism) that I listed earlier: 当你在网上冲浪或浏览推特动态足够多之后,你会发现一些有前景的交易策略。根据你的个人情况和需求,并运用我之前列出的筛选标准(更准确地说是健康的怀疑态度),将它们缩减到只有少数几个:
How much time do you have for babysitting your trading programs? 你有多少时间来监管你的交易程序?
How good a programmer are you? 你的编程水平有多高?
How much capital do you have? 你有多少资金?
Is your goal to earn steady monthly income or to strive for a large, long-term capital gain? 你的目标是赚取稳定的月收入,还是追求较大、长期的资本增值?
Even before doing an in-depth backtest of the strategy, you can quickly filter out some unsuitable strategies if they fail one or more of these tests: 即使在对策略进行深入回测之前,如果策略未能通过以下一项或多项测试,你也可以迅速筛选出一些不合适的策略:
Does it outperform a benchmark? 它的表现是否优于基准?
Does it have a high enough Sharpe ratio? 它的夏普比率是否足够高?
Does it have a small enough drawdown and short enough drawdown duration? 它的回撤是否足够小,回撤持续时间是否足够短?
Does the backtest suffer from survivorship bias? 回测是否存在幸存者偏差?
Does the strategy lose steam in recent years compared to its earlier years? 该策略近年来是否相比早期表现有所减弱?
Does the strategy have its own “niche” that protects it from intense competition from large institutional money managers? 该策略是否拥有自己的“利基市场”,从而避免了与大型机构资金管理者的激烈竞争?
After making all these quick judgments, you are now ready to proceed to the next chapter, which is to rigorously backtest the strategy yourself to make sure that it does what it is advertised to do. 在做出所有这些快速判断之后,你现在可以准备进入下一章,亲自对策略进行严格的回测,以确保它确实能实现宣传的效果。
REFERENCES 参考文献
Duhigg, Charles. 2006. “Street Scene; A Smarter Computer to Pick Stock.” New York Times, November 24. Duhigg, Charles. 2006 年。“街头场景;更智能的计算机选股。”纽约时报,11 月 24 日。
Sharpe, William. 1994. “The Sharpe Ratio.” The Journal of Portfolio Management, Fall. Available at: www.stanford.edu/~wfsharpe/art/ sr/sr.htm. Sharpe, William. 1994 年。“夏普比率。”《投资组合管理杂志》,秋季刊。可在:www.stanford.edu/~wfsharpe/art/sr/sr.htm 查阅。
CIIAPTER 3 第三章
Backtesting 回测
Akey difference between a traditional investment management process and a quantitative investment process is the possibility of backtesting a quantitative investment strategy to see how it would have performed in the past. Even if you found a strategy described in complete detail with all the historical performance data available, you would still need to backtest it yourself. This exercise serves several purposes. If nothing else, this replication of the research will ensure that you have understood the strategy completely and have reproduced it exactly for implementation as a trading system. Just as in any medical or scientific research, replicating others’ results also ensures that the original research did not commit any of the common errors plaguing this process. But more than just performing due diligence, doing the backtest yourself allows you to experiment with variations of the original strategy, thereby refining and improving the strategy. 传统投资管理流程与量化投资流程的一个关键区别在于,量化投资策略可以进行回测,以观察其在过去的表现。即使你找到了一个详细描述且附有所有历史表现数据的策略,你仍然需要自己进行回测。这个过程有多重目的。至少,这种对研究的复现能够确保你完全理解该策略,并且准确地复现它以便作为交易系统实施。正如任何医学或科学研究一样,复现他人的结果还能确保原始研究没有犯下该过程常见的错误。但不仅仅是为了尽职调查,自己进行回测还允许你对原始策略进行变体实验,从而优化和改进策略。
In this chapter, I will describe the common platforms that can be used for backtesting, various sources of historical data useful for backtesting, a minimal set of standard performance measures that a backtest should provide, common pitfalls to avoid, and simple refinements and improvements to strategies. A few fully developed backtesting examples will also be presented to illustrate the principles and techniques described. 在本章中,我将介绍可用于回测的常见平台、对回测有用的各种历史数据来源、回测应提供的一组最基本的标准绩效指标、需要避免的常见陷阱,以及对策略的简单改进和优化。还将展示几个完整的回测示例,以说明所描述的原理和技术。
COMMON BACKTESTING PLATFORMS 常见的回测平台
Numerous commercial platforms are designed for backtesting, some of them costing tens of thousands of dollars. In keeping with the focus on startups in this book, I start with those with which I am familiar and that are free or can be purchased economically and are widely used. 市面上有许多商业回测平台,其中一些价格高达数万美元。鉴于本书关注初创企业,我将从我熟悉的、免费或价格合理且被广泛使用的平台开始介绍。
Excel
This is the most basic and most common tool for traders, whether retail or institutional. You can enhance its power further if you can write Visual Basic macros. The beauty of Excel is “What you see is what you get”-or WYSIWYG (wi-zē-wig) in computing parlance. Data and program are all in one place so that nothing is hidden. Also, a common backtesting pitfall called look-ahead bias, which will be explained later, is unlikely to occur in Excel (unless you use macros, which renders it no longer WYSIWYG) because you can easily align the dates with the various data columns and signals on a spreadsheet. Another advantage of Excel is that backtesting and live trade generation can often be done from the same spreadsheet, eliminating any duplication of programming efforts. The major disadvantage of Excel is that it can be used to backtest only fairly simple models. But, as I explained in the previous chapter, simple models are often the best! 这是交易者最基本且最常用的工具,无论是散户还是机构交易者。如果你会编写 Visual Basic 宏,还可以进一步增强它的功能。Excel 的优点在于“所见即所得”——在计算机术语中称为 WYSIWYG(wi-zē-wig)。数据和程序都集中在一个地方,没有任何隐藏内容。此外,一种常见的回测陷阱——前视偏差(将在后文解释)在 Excel 中不太可能发生(除非你使用宏,这样就不再是 WYSIWYG 了),因为你可以轻松地将日期与电子表格上的各种数据列和信号对齐。Excel 的另一个优点是,回测和实时交易生成通常可以在同一个电子表格中完成,避免了重复编程的工作。Excel 的主要缺点是它只能用来回测相对简单的模型。但正如我在上一章中解释的,简单模型往往是最好的!
MATLAB
MATLAB (www.mathworks.com) used to be one of the most common backtesting platforms used by quantitative analysts and traders in large institutions. It has been taken over by Python (which I will describe as follows), but I still find it to be the most productive language for quants (as opposed to professional software developers). It is easier to use than Python, it is faster, and it has full customer support from the vendor. It is ideal for testing strategies that involve a large portfolio of stocks. (Imagine backtesting a strategy involving 1,500 symbols on Excel-it is MATLAB(www.mathworks.com)曾经是大型机构中量化分析师和交易员最常用的回测平台之一。虽然现在已被 Python 取代(我将在后文介绍 Python),但我仍然认为它是量化人员(与专业软件开发人员不同)最有效率的语言。它比 Python 更易使用,速度更快,并且拥有厂商的全面客户支持。它非常适合测试涉及大量股票组合的策略。(想象一下在 Excel 上回测涉及 1500 个股票代码的策略——这简直是)
possible, but quite painful.) It has numerous advanced statistical and mathematical modules built in, so traders do not have to reinvent the wheel if their trading algorithms involve some sophisticated but common mathematical concepts. (A good example is principal component analysis-often used in factor models in statistical arbitrage trading, and a hassle to implement in other programming languages. See Example 7.4.) Supplementary toolboxes (which cost about $50\$ 50 each) such as Statistics and Machine Learning, Econometrics, and Financial toolboxes are particularly useful to quant traders. 可能,但相当痛苦。)它内置了众多高级统计和数学模块,因此如果交易算法涉及一些复杂但常见的数学概念,交易者无需重新发明轮子。(一个很好的例子是主成分分析——在统计套利交易中的因子模型中经常使用,而在其他编程语言中实现起来很麻烦。见示例 7.4。)附加工具箱(每个大约花费 $50\$ 50 )如统计与机器学习、计量经济学和金融工具箱,对量化交易者特别有用。
There is also a large number of third-party freeware available for download from the internet, many of them very useful for quantitative trading purposes (an example is the cointegration package used in Example 7.2). Finally, MATLAB is very useful in retrieving financial information from various websites. Example 3.1 shows how this can be done. 互联网上还有大量第三方免费软件可供下载,其中许多对量化交易非常有用(例如第 7.2 节中使用的协整包)。最后,MATLAB 在从各种网站获取金融信息方面非常有用。示例 3.1 展示了如何实现这一点。
Despite the seeming sophistication of the platform, it is actually very easy to learn (at least for basic usage), and it is very quick to write a complete backtest program using this language. It is also inexpensive: the home version costs about the same as Microsoft Office. 尽管该平台看起来相当复杂,但实际上非常容易学习(至少对于基本使用而言),并且使用这种语言编写完整的回测程序非常快速。它的价格也不高:家庭版的价格大约与微软 Office 相当。
You can use MATLAB for live trading, too, if you purchase its Trading Toolbox, or if you purchase a third-party toolkit, such as that from undocumentedmatlab.com. 如果购买了 Trading Toolbox,或者购买了第三方工具包(如 undocumentedmatlab.com 提供的工具包),你也可以使用 MATLAB 进行实盘交易。
I will include MATLAB codes for all the backtesting examples in this book as well as provide a quick survey of the MATLAB language itself in the appendix. 我将在本书中所有回测示例中附上 MATLAB 代码,并在附录中提供对 MATLAB 语言本身的快速概述。
Example 3.1: Using MATLAB to Retrieve Yahoo! Finance Data 示例 3.1:使用 MATLAB 获取 Yahoo! Finance 数据
MATLAB is not only useful for numerical computations but also for text parsing. Following is an example of using MATLAB to retrieve a stock’s historical price information from Yahoo! Finance. First, copy the file getMarketDataViaYahoo.m from github.com/Lenskiy/market-data-functions and save it to your local folder. Then save the following code as example3_1.m to your local folder, too: MATLAB 不仅适用于数值计算,也适合文本解析。以下是一个使用 MATLAB 从 Yahoo! Finance 获取股票历史价格信息的示例。首先,从 github.com/Lenskiy/market-data-functions 复制文件 getMarketDataViaYahoo.m 并保存到本地文件夹。然后将以下代码保存为 example3_1.m,也存放到本地文件夹:
% Example 3.1: Download data from Yahoo
initDate = '1-Sep-2020';
symbol = 'AAPL';
aaplusd_yahoo_raw = getMarketDataViaYahoo(symbol,
initDate);
aaplusd_yahoo= timeseries([aaplusd_yahoo_raw.Close,
aaplusd_yahoo_raw.High, aaplusd_yahoo_raw.Low],
datestr(aaplusd_yahoo_raw(:,1).Date));
aaplusd_yahoo.DataInfo.Units = 'USD';
aaplusd_yahoo.Name = symbol;
aaplusd_yahoo.TimeInfo.Format = "dd-mm-yyyy";
figure,
plot(aaplusd_yahoo);
legend({'Close', 'High', 'Low'},'Location',
'northwest');
disp(aaplusd_yahoo_raw.Close)
This program file example3_1.m is available for download from epchan. com/book, with "sharperatio" as both username and password. You can easily modify this code to download as many tickers as you like. This program requires the getMarketDataViaYahoo function to work. That .m file can be downloaded from www.mathworks.com/matlabcentral/ fileexchange/68361-yahoo-finance-and-quandl-data-downloader. (You should save it in the same folder as example3_1.m.)
Python
Python has now taken over MATLAB to become the de facto backtesting language, especially after the numpy and pandas packages became available. With these packages, you can manipulate arrays and time series data just like you do in MATLAB. Python benefits from a large number of third-party packages for specific applications. There is Scikit-learn for machine learning, plotly for interactive data visualization, and seaborn for plotting, just to name a few most commonly used ones. While Python has almost any packages that you need for finance and trading, it is not without flaws: Python 现已取代 MATLAB,成为事实上的回测语言,尤其是在 numpy 和 pandas 包问世之后。借助这些包,你可以像在 MATLAB 中一样操作数组和时间序列数据。Python 拥有大量针对特定应用的第三方包。比如用于机器学习的 Scikit-learn,用于交互式数据可视化的 plotly,以及用于绘图的 seaborn,仅举几例最常用的包。虽然 Python 几乎拥有你在金融和交易中所需的所有包,但它也并非没有缺陷:
Version conflicts abound. Even with my team of professional software developers (one of them has sold his software firm 版本冲突层出不穷。即使有我这支由专业软件开发人员组成的团队(其中一人已经出售了他的软件公司)
for many millions), we spent endless hours integrating different Python packages into a production system. An addition of any new Python package threatens to tear down the whole edifice. Even the following sample program example3_1.py took more than an hour to get it running on my computer, as opposed to my colleague’s computer. 为了数百万的收益,我们花费了无数小时将不同的 Python 包集成到生产系统中。添加任何新的 Python 包都有可能摧毁整个体系结构。即使是下面的示例程序 example3_1.py,在我的电脑上运行也花了一个多小时,而在我同事的电脑上则不然。
It is slow-quite slow, compared to MATLAB. Don’t take my word for it: read this academic study by Aruoba et. al. (2018): “Python is too slow. . . MATLAB and R have considerably improved their performance, in the case of MATLAB to make it competitive, for example, with Rcpp.” 它很慢——相比 MATLAB 来说相当慢。别只听我说:阅读 Aruoba 等人(2018 年)的一项学术研究:“Python 太慢了……MATLAB 和 R 的性能都有了显著提升,例如 MATLAB 已经提升到可以与 Rcpp 竞争的水平。”
There is no customer support, as it is free. You will have to wait for the kindness of strangers on stackoverflow.com to answer your questions. Meanwhile, MATLAB has professional programmers and PhDs on frontline support. 由于是免费的,因此没有客户支持。你只能等待 stackoverflow.com 上陌生人的好心回答你的问题。与此同时,MATLAB 拥有专业程序员和博士组成的一线支持团队。
The Integrated Development Environments (IDEs) of Python are inferior to MATLAB’s. This is still the case, despite the proliferation of free platforms such as Microsoft’s Visual Studio Code. You get what you pay for. Python 的集成开发环境(IDE)不如 MATLAB 的。尽管微软的 Visual Studio Code 等免费平台大量涌现,情况依然如此。你得到的就是你付出的价值。
Despite the large number of free packages, there is still a lack of good statistics and econometrics packages, unlike R. Python’s statmodels are no match for R packages such as mnormt, copula, fGarch, rugarch, or MASS. Python is also no match for MATLAB’s Statistics and Machine Learning and Econometrics Toolboxes. 尽管有大量免费的包,但仍然缺乏像 R 那样优秀的统计和计量经济学包。Python 的 statmodels 无法与 R 的 mnormt、copula、fGarch、rugarch 或 MASS 等包相媲美。Python 也无法与 MATLAB 的统计与机器学习工具箱以及计量经济学工具箱相抗衡。
But if you are a Python fan or if you want to jump on the Python bandwagon because everyone else is using it, don’t let me stop you. To learn more about using Python in finance, start with Wes McKinney’s Python for Data Analysis (McKinney, 2017). McKinney is, of course, the inventor of pandas, a most useful package for quant traders and a veteran of the giant quant fund Two Sigma. 但如果你是 Python 爱好者,或者想跟风使用 Python,因为大家都在用,不要让我阻止你。想了解更多关于在金融领域使用 Python 的内容,可以从 Wes McKinney 的《Python 数据分析》(McKinney, 2017)开始。McKinney 当然是 pandas 的发明者,pandas 是量化交易者非常实用的包,他也是大型量化基金 Two Sigma 的资深成员。
Here is Example 3.1 coded in Python: 以下是用 Python 编写的示例 3.1:
Example 3.1: Using Python to Retrieve Yahoo! Finance Data 示例 3.1:使用 Python 获取雅虎财经数据
Following is an example of using Python to retrieve a stock’s historical price information from Yahoo! Finance. First, run pip install pandas_datareader on your Anaconda prompt. Then save the following code as example3_1. py to your local folder: 以下是一个使用 Python 从雅虎财经获取股票历史价格信息的示例。首先,在你的 Anaconda 提示符中运行 pip install pandas_datareader。然后将以下代码保存为 example3_1.py 到你的本地文件夹:
% Example 3.1: Download data from Yahoo % 示例 3.1:从雅虎下载数据
from pandas_datareader import data as pdr
def test_yfinance():
for symbol in [‘AAPL’, ‘MSFT’, ‘VFINX’,‘BTC-USD’]: for symbol in ['AAPL', 'MSFT', 'VFINX', 'BTC-USD']:
print(">>", symbol, end=’ … ') print(">>", symbol, end=' … ')
data = pdr.get_data_yahoo(symbol,st data = pdr.get_data_yahoo(symbol, st
art=‘2020-09-25’, end=‘2020-10-02’) art='2020-09-25', end='2020-10-02')
print(data) 打印(data)
if name == “main”: 如果 name == “main”:
test_yfinance() 测试_yfinance()
This program file example3_1.py is available for download from epchan. com/book, with “sharperatio” as both username and password. You can easily modify this code to download as many tickers as you like. 该程序文件 example3_1.py 可从 epchan.com/book 下载,用户名和密码均为“sharperatio”。你可以轻松修改此代码以下载任意数量的股票代码。
R
I have used RR as the language for teaching my Financial Risk Analytics course at Northwestern University’s Master’s in Data Science program, which covered everything from time series analysis to copulas. It is a great language if you want to use classical statistical and econometric analyses for your trading (and there is nothing wrong with that!). That is because many academic statisticians and econometricians have implemented their algorithms in R . There aren’t as many implementations of machine-learning algorithms as Python or MATLAB. 我在西北大学数据科学硕士项目的金融风险分析课程中使用 RR 作为教学语言,课程内容涵盖了从时间序列分析到 copulas 的所有内容。如果你想用经典的统计和计量经济学分析来进行交易,这是一门很好的语言(这没有任何问题!)。这是因为许多学术统计学家和计量经济学家都在 R 中实现了他们的算法。相比 Python 或 MATLAB,R 中机器学习算法的实现并不多。
However, I recommend you do not use machine learning if you are creating a new trading strategy, and use ML only for improving your strategy (for reasons explained in Chapter 2 and in more depth at predictnow.ai/finml). Hence, R is a good language for this initial step of strategy exploration, though not as good as MATLAB since its IDE (RStudio) is also free, and, dare I say, primitive. Naturally, like Python, it also comes with zero support. If you want to learn R, 然而,我建议如果你正在创建一个新的交易策略,不要使用机器学习,只在改进策略时使用机器学习(原因在第 2 章以及 predictnow.ai/finml 中有更深入的解释)。因此,R 是策略探索初期的一个不错的语言,尽管不如 MATLAB 好,因为它的集成开发环境(RStudio)虽然免费,但我敢说比较原始。当然,和 Python 一样,它也没有任何支持。如果你想学习 R,
there is an elegant and very readable little book by Jonathan Regenstein (Regenstein, 2018) called Reproducible Finance with R. 有一本由 Jonathan Regenstein(Regenstein,2018)写的优雅且非常易读的小书,名为《用 R 实现可重复金融》(Reproducible Finance with R)。
Example 3.1: Using R to Retrieve Yahoo! Finance Data 示例 3.1:使用 R 获取雅虎财经数据
Following is an example of using R to retrieve a stock’s historical price information from Yahoo! Finance. This program file example3_1.R is available for download from epchan.com/book, with “sharperatio” as both username and password. Just save it to your local folder and run in RStudio. (Installation of required packages is included. Once they are installed, you can comment out the install.packages lines.) 以下是使用 R 从 Yahoo! Finance 获取股票历史价格信息的示例。该程序文件 example3_1.R 可从 epchan.com/book 下载,用户名和密码均为“sharperatio”。只需将其保存到本地文件夹并在 RStudio 中运行即可。(包含所需包的安装。一旦安装完成,可以注释掉 install.packages 行。)
# tidyverse contains the packages tidyr, ggplot2, dplyr, # readr, purrr and tibble install.packages(“tidyverse”) install.packages(“lubridate”) # tidyverse 包含 tidyr、ggplot2、dplyr、# readr、purrr 和 tibble 包
install.packages("tidyverse")
install.packages("lubridate")
install.packages(“readxl”) install.packages("readxl")
install.packages(“highcharter”) install.packages("highcharter")
install.packages(“tidyquant”) install.packages("tidyquant")
install.packages(“timetk”) install.packages("timetk")
install.packages(“tibbletime”) install.packages("tibbletime")
install.packages(“quantmod”) install.packages("quantmod")
install.packages(“PerformanceAnalytics”) install.packages("PerformanceAnalytics")
install.packages(“scales”) install.packages("scales")
library(tidyverse)
library(lubridate)
library(readxl)
library(highcharter)
library(tidyquant)
library(timetk)
library(tibbletime)
library (quantmod) library(quantmod)
library (PerformanceAnalytics) library(PerformanceAnalytics)
library(scales)
symbols <- c(“SPY”,“EFA”, “IJS”, “EEM”,“AGG”) symbols <- c("SPY","EFA", "IJS", "EEM","AGG")
prices <-
getSymbols (symbols,
src = ‘yahoo’, src = 'yahoo',
from = “2012-12-31”,
to = “2017-12-31”,
auto.assign = TRUE,
warnings = FALSE)
print(SPY)
You can easily modify this code to download as many tickers as you like. 你可以轻松修改这段代码来下载任意数量的股票代码。
QuantConnect
QuantConnect is a web-based algorithmic trading platform providing research, backtesting, and live trading tools to support strategy creation in C# or Python. It also provides some 400 TB of financial and alternative data. The open-source engine (called LEAN) takes into account trade fills, slippage, margin, transaction costs, and bidask spread to provide realistic backtest results. As of this writing, the platform supports seven asset classes; equities, equity-options, forex, CFD, crypto, futures, and future-options. QuantConnect allows users to seamlessly transition from backtesting to live trading with no code changes. This is important to ensure that what you backtested is exactly what you will be trading. QuantConnect 是一个基于网页的算法交易平台,提供研究、回测和实盘交易工具,支持使用 C# 或 Python 创建策略。它还提供约 400 TB 的金融及另类数据。其开源引擎(称为 LEAN)考虑了交易成交、滑点、保证金、交易成本和买卖价差,以提供真实的回测结果。截至目前,该平台支持七种资产类别:股票、股票期权、外汇、差价合约(CFD)、加密货币、期货和期货期权。QuantConnect 允许用户无缝地从回测过渡到实盘交易,无需更改代码。这一点非常重要,以确保你回测的策略正是你将要交易的策略。
Blueshift
Blueshift is an integrated platform for research, backtest, and trading offered by QuantInsti. As of this writing, it offers live trading in the United States, India, and FX markets, and it includes min-ute-level data free-of-cost across these markets. (I am sure that by the time you are reading this, it will be offering many more markets.) You can develop your investment or trading strategy in either a Python programming environment or a visual (nonprogramming interface) builder. Moving a strategy from backtesting to live trading is a turnkey operation, ensuring that what you backtested is exactly what you will be trading. Blueshift 是由 QuantInsti 提供的一个集研究、回测和交易于一体的平台。截至本文撰写时,它支持美国、印度和外汇市场的实盘交易,并且在这些市场中提供分钟级别的数据,且免费。(我相信当你阅读这本书时,它会支持更多的市场。)你可以在 Python 编程环境或可视化(非编程界面)构建器中开发你的投资或交易策略。从回测到实盘交易的转换是一个交钥匙操作,确保你回测的策略就是你将要交易的策略。
FINDING AND USING HISTORICAL DATABASES 寻找和使用历史数据库
If you have a strategy in mind that requires a specific type of historical data, the first thing to do is to search for that type of data. You will be surprised how many free or low-cost historical databases are available on the internet for many types of data. (For example, try the search phrase “free historical intraday futures data.”) Table 3.1 includes a number of the databases that I have found useful over the years, most of them either free or very low cost. I have deliberately 如果你有一个策略需要特定类型的历史数据,首先要做的是搜索该类型的数据。你会惊讶于互联网上有多少免费的或低成本的历史数据库可供多种数据类型使用。(例如,试试搜索短语“免费历史日内期货数据”。)表 3.1 列出了我多年来发现有用的多个数据库,其中大多数是免费的或成本非常低。我有意地...
TABLE 3.1 Historical Databases for Backtesting 表 3.1 回测用历史数据库
Source 来源
Pros 优点
Cons 缺点
finance.yahoo.com
Free. Split/dividend adjusted. 免费。已调整拆股/分红。
Has survivorship bias. Can download only one symbol at a time. 存在幸存者偏差。一次只能下载一个代码。
Sharadar.com
Rent, don't buy, tick data! Enriched with identifiers and tags. 租用,不要购买,逐笔数据!附带标识符和标签。
Moderately priced. 价格适中。
CSIdata.com
Low cost. Source of Yahoo! and Google’s historical data. Software enables download of multiple symbols. 低成本。雅虎和谷歌历史数据的来源。软件支持多符号下载。
Has survivorship bias, though delisted stocks' history can be purchased. 存在幸存者偏差,尽管可以购买退市股票的历史数据。
CRSP.com
Survivorship bias free. 无幸存者偏差。
Expensive. Updated only once a month. 价格昂贵。每月仅更新一次。
Daily Futures Data 每日期货数据
Algoseek.com
(See above.) (See above.) Intraday Stock / Futures Data (见上文。) (见上文。) 日内股票/期货数据
CSIdata.com
Algoseek.com
(See above.) (见上文。)
Tickdata.com
Institutional quality. 机构级质量。
Expensive. 昂贵。
Interactive Brokers 盈透证券
Free if you have an account. 如果你有账户,则免费。
Source Pros Cons
finance.yahoo.com Free. Split/dividend adjusted. Has survivorship bias. Can download only one symbol at a time.
Sharadar.com Rent, don't buy, tick data! Enriched with identifiers and tags. Moderately priced.
CSIdata.com Low cost. Source of Yahoo! and Google’s historical data. Software enables download of multiple symbols. Has survivorship bias, though delisted stocks' history can be purchased.
CRSP.com Survivorship bias free. Expensive. Updated only once a month.
Daily Futures Data
Algoseek.com (See above.) (See above.) Intraday Stock / Futures Data
CSIdata.com
Algoseek.com (See above.)
Tickdata.com Institutional quality. Expensive.
Interactive Brokers Free if you have an account. | Source | Pros | Cons |
| :--- | :--- | :--- |
| finance.yahoo.com | Free. Split/dividend adjusted. | Has survivorship bias. Can download only one symbol at a time. |
| Sharadar.com | Rent, don't buy, tick data! Enriched with identifiers and tags. | Moderately priced. |
| CSIdata.com | Low cost. Source of Yahoo! and Google’s historical data. Software enables download of multiple symbols. | Has survivorship bias, though delisted stocks' history can be purchased. |
| CRSP.com | Survivorship bias free. | Expensive. Updated only once a month. |
| | Daily Futures Data | |
| Algoseek.com | (See above.) (See above.) Intraday Stock / Futures Data | |
| CSIdata.com | | |
| | | |
| Algoseek.com | (See above.) | |
| Tickdata.com | Institutional quality. | Expensive. |
| Interactive Brokers | Free if you have an account. | |
left out the expensive databases from Bloomberg, Dow Jones, FactSet, Thomson Reuters, or Tick Data. Though they have almost every type of data imaginable for purchase, these data vendors cater mostly to more established institutions and are typically not in the price range of individuals or startup institutions. 省略了彭博、道琼斯、FactSet、汤森路透或 Tick Data 这些昂贵的数据库。虽然他们几乎提供所有类型的数据供购买,但这些数据供应商主要面向更成熟的机构,价格通常不适合个人或初创机构。
While finding sources of data on the internet is even easier than finding prospective strategies, there are a number of issues and pitfalls with many of these databases that I will discuss later in this section. These issues apply mostly to stock and exchange-traded fund (ETF) data only. Here are the most important ones. 在互联网上寻找数据来源比寻找潜在策略更容易,但许多数据库存在一些问题和陷阱,我将在本节后面讨论。这些问题主要适用于股票和交易所交易基金(ETF)数据。以下是最重要的问题。
Are the Data Split and Dividend Adjusted? 数据是否经过拆分和分红调整?
When a company had its stocks split NN to 1 with an ex-date of TT, all the prices before TT need to be multiplied by 1//N.N1 / N . N is usually 2 , but can be a fraction like 0.5 as well. When NN is smaller than 1 , it is 当一家公司在 TT 除权日进行了 NN 拆股时,所有在 TT 之前的价格都需要乘以 1//N.N1 / N . N ,通常 1//N.N1 / N . N 是 2,但也可能是像 0.5 这样的分数。当 NN 小于 1 时,情况是
called a reverse split. Similarly, when a company issued a dividend $d\$ d per share with an ex-date of TT, all the prices before TT need to be multiplied by the number (Close(T-1)-d)//Close(T-1)(\operatorname{Close}(T-1)-d) / \operatorname{Close}(T-1), where Close( T-1T-1 ) is the closing price of the trading day before TT. Notice that I adjust the historical prices by a multiplier instead of subtracting $d\$ d so that the historical daily returns will remain the same pre- and post-adjustment. This is the way Yahoo! Finance adjusts its historical data, and is the most common way. (If you adjust by subtracting $d\$ d instead, the historical daily changes in prices will be the same pre- and postadjustment, but not the daily returns.) If the historical data are not adjusted, you will find a drop in price at the ex-date’s market open from previous day’s close (apart from normal market fluctuation), which may trigger an erroneous trading signal. 称为反向拆股。同样,当一家公司在除息日为 TT 时每股发放 $d\$ d 的股息,所有在 TT 之前的价格都需要乘以数字 (Close(T-1)-d)//Close(T-1)(\operatorname{Close}(T-1)-d) / \operatorname{Close}(T-1) ,其中 Close( T-1T-1 )是 TT 前一个交易日的收盘价。注意,我通过乘数调整历史价格,而不是减去 $d\$ d ,这样历史的每日收益率在调整前后保持不变。这是雅虎财经调整其历史数据的方式,也是最常见的方法。(如果你改为减去 $d\$ d 进行调整,历史价格的每日变化在调整前后会相同,但每日收益率则不会。)如果历史数据未调整,你会发现除息日开盘价相较前一日收盘价会出现价格下跌(除正常市场波动外),这可能会触发错误的交易信号。
I recommend getting historical data that are already split and dividend adjusted, because otherwise you would have to find a separate historical database of splits and dividends and apply the adjustments yourself-a somewhat tedious and error-prone task, which I will describe in the following example. 我建议获取已经经过拆股和股息调整的历史数据,否则你需要找到一个单独的拆股和股息历史数据库,并自行应用调整——这是一项有些繁琐且容易出错的任务,下面的例子中我将对此进行说明。
Example 3.2: Adjusting for Splits and Dividends 示例 3.2:调整拆股和分红
Here we look at IGE, an ETF that has had both splits and dividends in its history. It had a 2:12: 1 split on June 9, 2005 (the ex-date). Let’s look at the unadjusted prices around that date (you can download the historical prices of IGE from Yahoo! Finance into an Excel spreadsheet): 这里我们来看 IGE,一只在历史上经历过拆股和分红的 ETF。它在 2005 年 6 月 9 日(除权日)进行了 2:12: 1 拆股。让我们看看该日期前后的未调整价格(你可以从雅虎财经下载 IGE 的历史价格到 Excel 表格中):
We need to adjust the prices prior to 6//9//20056 / 9 / 2005 due to this split. This is easy: N=2N=2 here, and all we need to do is to multiply those prices by 1//21 / 2. The following table shows the adjusted prices: 由于此次拆股,我们需要调整 6//9//20056 / 9 / 2005 之前的价格。这很简单:这里是 N=2N=2 ,我们只需将这些价格乘以 1//21 / 2 。下表显示了调整后的价格:
Now, the astute reader will notice that the adjusted close prices here do not match the adjusted close prices displayed in the Yahoo! Finance table. The reason for this is that there have been dividends distributed after 6//9//20056 / 9 / 2005, so the Yahoo! prices have been adjusted for all those as well. Since each adjustment is a multiplier, the aggregate adjustment is just the product of all the individual multipliers. Here are the dividends from 6/9/2005 to November 2007, together with the unadjusted closing prices of the previous trading days and the resulting individual multipliers: 现在,细心的读者会注意到,这里的调整后收盘价与雅虎财经表格中显示的调整后收盘价不符。原因是自 6//9//20056 / 9 / 2005 之后发放了股息,因此雅虎的价格也对所有这些股息进行了调整。由于每次调整都是一个乘数,累计调整就是所有单个乘数的乘积。以下是 2005 年 6 月 9 日至 2007 年 11 月期间的股息,以及前一个交易日的未调整收盘价和由此产生的单个乘数:
(Check out the multipliers yourself on Excel using the formula I gave above to see if they agree with my values here.) So the aggregate multiplier for the dividends is simply 0.998618 xx0.997488 xx dots xx0.9972140.998618 \times 0.997488 \times \ldots \times 0.997214=0.976773=0.976773. This multiplier should be applied to all the unadjusted prices on or after 6//9//20056 / 9 / 2005. The aggregate multiplier for the dividends and the split is 0.976773 xx0.5=0.4883860.976773 \times 0.5=0.488386, which should be applied to all the unadjusted prices before 6//9//20056 / 9 / 2005. So let’s look at the resulting adjusted prices after applying these multipliers: (使用我上面给出的公式在 Excel 中自行检查乘数,看看它们是否与我这里的数值一致。)因此,股息的总乘数就是 0.998618 xx0.997488 xx dots xx0.9972140.998618 \times 0.997488 \times \ldots \times 0.997214=0.976773=0.976773 。这个乘数应当应用于 6//9//20056 / 9 / 2005 及之后的所有未调整价格。股息和拆股的总乘数是 0.976773 xx0.5=0.4883860.976773 \times 0.5=0.488386 ,应当应用于 6//9//20056 / 9 / 2005 之前的所有未调整价格。现在让我们来看应用这些乘数后得到的调整价格:
You can see that the adjusted closing prices from our calculations and from Yahoo! are the same (after rounding to two decimal places). But, of course, when you are reading this, IGE will likely have distributed more dividends and may have even split further, so your Yahoo! table won’t look like the one above. It is a good exercise to check that you can make further adjustments based on those dividends and splits that result in the same adjusted prices as your current Yahoo! table. 你可以看到,我们计算的调整后收盘价和雅虎财经的调整后收盘价是相同的(四舍五入到小数点后两位)。但当然,当你阅读这段内容时,IGE 可能已经发放了更多的股息,甚至可能进行了更多的拆股,所以你的雅虎财经表格不会和上面的一样。一个很好的练习是检查你是否能够根据这些股息和拆股做进一步调整,从而得到与你当前雅虎财经表格相同的调整后价格。
Are the Data Survivorship-Bias Free? 数据是否不存在幸存者偏差?
We already covered this issue in Chapter 2. Unfortunately, databases that are free from survivorship bias are quite expensive and may not be affordable for a startup business. One way to overcome this problem is to start collecting point-in-time data yourself for the benefit of your future backtest. If you save the prices each day of all the stocks in your universe to a file, then you will have a point-in-time or survivorship-bias-free database to use in the future. Another way to lessen the impact of survivorship bias is to backtest your strategies on more recent data so that the results are not distorted by too many missing stocks. 我们在第二章已经讨论过这个问题。不幸的是,没有幸存者偏差的数据库价格昂贵,可能对于初创企业来说负担不起。克服这个问题的一种方法是自己开始收集时间点数据,以便未来回测时使用。如果你每天将你关注的所有股票的价格保存到文件中,那么你将拥有一个时间点数据或无幸存者偏差的数据库,供未来使用。另一种减少幸存者偏差影响的方法是,在较新的数据上回测你的策略,这样结果就不会因为缺失过多股票而失真。
Example 3.3: An Example of How Survivorship Bias Can Artificially Inflate a Strategy's Performance 示例 3.3:生存者偏差如何人为地抬高策略表现的一个例子
Here is a toy “buy low-priced stocks” strategy. (Warning: This toy strategy is hazardous to your financial health!) Let’s say from a universe of the 1,000 largest stocks (based on market capitalization), we pick 10 that have the lowest closing prices at the beginning of the year and hold them (with equal initial capital) for one year. Let’s look at what we would have picked if we had a good, survivorship-bias-free database: 这里有一个玩具“买低价股”策略。(警告:这个玩具策略对你的财务健康有害!)假设从市值最大的 1000 只股票中,我们选出年初收盘价最低的 10 只股票,并以等额初始资金持有一年。让我们看看如果我们有一个良好、无生存者偏差的数据库,会选出哪些股票:
SYMBOL
Closing Price on 1/2/2001 2001 年 1 月 2 日的收盘价
Closing Price on 1/2/2002 2002 年 1 月 2 日的收盘价
Terminal Price 终端价格
ETYS
0.2188
NaN 非数字
0.125
MDM
0.3125
0.49
0.49
INTW
0.4063
NaN 非数字
0.11
FDHG
0.5
NaN 非数字
0.33
OGNC
0.6875
NaN 非数字
0.2
MPLX
0.7188
NaN 非数字
0.8
GTS
0.75
NaN 非数字
0.35
BUYX
0.75
NaN 非数字
0.17
PSIX
0.75
NaN
0.2188
RTHM
0.8125
NaN
0.3000
SYMBOL Closing Price on 1/2/2001 Closing Price on 1/2/2002 Terminal Price
ETYS 0.2188 NaN 0.125
MDM 0.3125 0.49 0.49
INTW 0.4063 NaN 0.11
FDHG 0.5 NaN 0.33
OGNC 0.6875 NaN 0.2
MPLX 0.7188 NaN 0.8
GTS 0.75 NaN 0.35
BUYX 0.75 NaN 0.17
PSIX 0.75 NaN 0.2188
RTHM 0.8125 NaN 0.3000| SYMBOL | Closing Price on 1/2/2001 | Closing Price on 1/2/2002 | Terminal Price |
| :--- | :--- | :--- | :--- |
| ETYS | 0.2188 | NaN | 0.125 |
| MDM | 0.3125 | 0.49 | 0.49 |
| INTW | 0.4063 | NaN | 0.11 |
| FDHG | 0.5 | NaN | 0.33 |
| OGNC | 0.6875 | NaN | 0.2 |
| MPLX | 0.7188 | NaN | 0.8 |
| GTS | 0.75 | NaN | 0.35 |
| BUYX | 0.75 | NaN | 0.17 |
| PSIX | 0.75 | NaN | 0.2188 |
| RTHM | 0.8125 | NaN | 0.3000 |
All but MDM were delisted sometime between 1//2//20011 / 2 / 2001 and 1//2//20021 / 2 / 2002 (after all, the dot-com bubble was seriously bursting then!). The NaNs indicate those with nonexistent closing prices on 1//2//20021 / 2 / 2002. The Terminal Price column indicates the last prices at which the stocks were traded on or before 1//2//20021 / 2 / 2002. The total return on this portfolio in that year was -42 percent. 除 MDM 外,所有股票都在 1//2//20011 / 2 / 2001 和 1//2//20021 / 2 / 2002 之间的某个时间被摘牌(毕竟,当时互联网泡沫正严重破裂!)。NaN 表示在 1//2//20021 / 2 / 2002 当天不存在收盘价的股票。终端价格栏显示的是股票在 1//2//20021 / 2 / 2002 当天或之前最后交易的价格。该投资组合当年的总回报率为-42%。
Now, let’s look at what we would have picked if our database had survivorship bias and actually missed all those stocks that were delisted that year. We would then have picked the following list instead: 现在,让我们看看如果我们的数据库存在幸存者偏差,实际上遗漏了当年所有被摘牌的股票,我们会选择哪些股票。那样的话,我们将选择以下列表:
Notice that since we select only those stocks that “survived” until at least 1//2//20021 / 2 / 2002, they all have closing prices on that day. The total return on this portfolio was 388 percent! 注意,由于我们只选择那些至少存活到 1//2//20021 / 2 / 2002 的股票,它们在那天都有收盘价。该投资组合的总回报率达到了 388%!
In this example, -42 percent was the actual return a trader would experience following this strategy, whereas 388 percent is a fictitious return that was due to survivorship bias in our database. 在这个例子中,-42%是交易者按照该策略实际可能获得的回报,而 388%则是由于我们数据库中的幸存者偏差导致的虚假回报。
Does Your Strategy Use High and Low Data? 你的策略是否使用最高价和最低价数据?
For almost all daily stock data, the high and low prices are far noisier than the open and close prices. What this means is that even when you had placed a buy limit order below the recorded high of a day, it might not have been filled, and vice versa for a sell limit order. (This could be due to the fact that a very small order was transacted at the high, or the execution could have occurred on a market to which your order was not routed. Sometimes, the high or low is simply due to an incorrectly reported tick that was not filtered out.) Hence, a backtest that relies on high and low data is less reliable than one that relies on the open and close. 对于几乎所有的每日股票数据,最高价和最低价的波动远比开盘价和收盘价更为嘈杂。这意味着即使你在某天的最高价以下下了买入限价单,也可能没有成交,卖出限价单同理。(这可能是因为在最高价处只成交了一个非常小的订单,或者成交发生在你的订单未被路由到的市场上。有时,最高价或最低价仅仅是由于未被过滤掉的错误报价造成的。)因此,依赖最高价和最低价数据的回测不如依赖开盘价和收盘价的回测可靠。
Actually, sometimes even a market on open (MOO) or market on close (MOC) order might not be filled at the historical open and close prices shown in your data. This is due to the fact that the historical prices shown may be due to the primary exchange (e.g., New York Stock Exchange [NYSE]), or it may be a composite price including all the regional exchanges. Depending on where your order was routed, it may be filled at a different price from the 实际上,有时即使是市价开盘(MOO)或市价收盘(MOC)订单,也可能不会以数据中显示的历史开盘价和收盘价成交。这是因为所显示的历史价格可能来自主要交易所(例如纽约证券交易所[NYSE]),也可能是包含所有地区交易所的综合价格。根据您的订单被路由到哪里,成交价格可能与数据集中显示的历史开盘价或收盘价不同。
historical opening or closing price shown in your dataset. Nevertheless, the discrepancies of the open and close prices usually have less impact on backtest performance than the errors in the high and low prices, since the latter almost always inflate your backtest returns. 然而,开盘价和收盘价的差异通常对回测表现的影响小于最高价和最低价的误差,因为后者几乎总是会夸大您的回测收益。
After retrieving the data from a database, it is often advisable to do a quick error check. The simplest way to do this is to calculate the daily returns based on the data. If you have open, high, low, and close prices, you can calculate the various combinations of daily returns such as from the previous high to today’s close as well. You can then examine closely those days with returns that are, say, four standard deviations away from the average. Typically, an extreme return should be accompanied by a news announcement, or should occur on a day when the market index also experienced extreme returns. If not, then your data are suspect. 从数据库中检索数据后,通常建议进行快速的错误检查。最简单的方法是根据数据计算每日收益率。如果你有开盘价、最高价、最低价和收盘价,还可以计算各种每日收益率的组合,比如从前一天的最高价到今天的收盘价的收益率。然后,你可以仔细检查那些收益率偏离平均值四个标准差以上的日期。通常,极端的收益率应伴随着新闻公告,或者发生在市场指数也出现极端收益的那一天。如果没有,那么你的数据可能存在问题。
PERFORMANCE MEASUREMENT 绩效测量
Quantitative traders use a good variety of performance measures. Which set of numbers to use is sometimes a matter of personal preference, but with ease of comparisons across different strategies and traders in mind, I would argue that the Sharpe ratio, maximum drawdown, and MAR ratio are the most important. Notice that I did not include compound annualized growth rate (CAGR), the measure most commonly quoted by investors, because if you use this measure, you have to tell people a number of things about what denominator you use to calculate returns. For example, in a long-short strategy, did you use just one side of capital or both sides in the denominator? Is the return a leveraged one (the denominator is based on account equity), or is it unleveraged (the denominator is based on market value of the portfolio)? If the equity or market value changes daily, do you use a moving average as the denominator, or just the value at the end of each day or each month? Most (but not all) of these problems associated with comparing returns can be avoided by quoting Sharpe ratio, maximum drawdown, and MAR ratio instead as the 量化交易者使用多种绩效衡量指标。选择使用哪一组数字有时取决于个人偏好,但考虑到便于不同策略和交易者之间的比较,我认为夏普比率、最大回撤和 MAR 比率是最重要的。请注意,我没有包括复合年化增长率(CAGR),这是投资者最常引用的指标,因为如果使用这个指标,你必须告诉别人你用什么分母来计算收益。例如,在多空策略中,你是只用一方的资金还是双方的资金作为分母?收益是杠杆收益(分母基于账户权益)还是非杠杆收益(分母基于投资组合的市值)?如果权益或市值每天变化,你是使用移动平均作为分母,还是仅使用每天或每月末的数值?通过引用夏普比率、最大回撤和 MAR 比率,大多数(但不是全部)与比较收益相关的问题都可以避免。
standard performance measures. MAR ratio is just the ratio of the CAGR and the maximum drawdown, and it is somewhat independent of leverage. The reason why it isn’t completely independent is because doubling the leverage does not exactly double the CAGR in the presence of volatility (Chan, 2017). 标准绩效指标。MAR 比率只是复合年增长率(CAGR)与最大回撤的比值,它在某种程度上与杠杆无关。之所以不是完全无关,是因为在存在波动性的情况下,杠杆翻倍并不会使 CAGR 完全翻倍(Chan,2017)。
I introduced the concepts of the Sharpe ratio, maximum drawdown, and maximum drawdown duration in Chapter 2. Here, I will just note a number of subtleties associated with calculating the Sharpe ratio, and give some computational examples in both Excel and MATLAB. 我在第二章介绍了夏普比率、最大回撤和最大回撤持续时间的概念。在这里,我将指出计算夏普比率时的一些细微差别,并给出在 Excel 和 MATLAB 中的一些计算示例。
There is one subtlety that often confounds even seasoned portfolio managers when they calculate Sharpe ratios: should we or shouldn’t we subtract the risk-free rate from the returns of a dollarneutral portfolio? The answer is no. A dollar-neutral portfolio is selffinancing, meaning the cash you get from selling short pays for the purchase of the long securities, so the financing cost (due to the spread between the credit and debit interest rates) is small and can be neglected for many backtesting purposes. Meanwhile, the margin balance you have to maintain earns a credit interest close to the risk-free rate, r_(F)r_{F}. So let’s say the strategy return (the portfolio return minus the contribution from the credit interest) is RR, and the riskfree rate is r_(F)r_{F} Then the excess return used in calculating the Sharpe ratio is R+r_(F)-r_(F)=RR+r_{F}-r_{F}=R. So, essentially, you can ignore the risk-free rate in the whole calculation and just focus on the returns due to your stock positions. 有一个细节常常让经验丰富的投资组合经理在计算夏普比率时感到困惑:我们是否应该从一个美元中性组合的收益中扣除无风险利率?答案是否定的。美元中性组合是自我融资的,意味着你通过卖空获得的现金用于购买多头证券,因此融资成本(由于借贷利率差异)很小,在许多回测中可以忽略不计。与此同时,你必须维持的保证金余额会获得接近无风险利率的利息收益 r_(F)r_{F} 。所以,假设策略收益(组合收益减去信用利息的贡献)是 RR ,无风险利率是 r_(F)r_{F} ,那么用于计算夏普比率的超额收益是 R+r_(F)-r_(F)=RR+r_{F}-r_{F}=R 。因此,基本上你可以在整个计算中忽略无风险利率,只关注股票头寸带来的收益。
Similarly, if you have a long-only day-trading strategy that does not hold positions overnight, you again have no need to subtract the risk-free rate from the strategy return in order to obtain the excess returns, since you do not have financing costs in this case, either. In general, you need to subtract the risk-free rate from your strategy returns in calculating the Sharpe ratio only if your strategy incurs financing cost. 同样地,如果你有一个仅做多且不持仓过夜的日内交易策略,那么你也不需要从策略收益中减去无风险利率来获得超额收益,因为在这种情况下你也没有融资成本。一般来说,只有当你的策略产生融资成本时,才需要在计算夏普比率时从策略收益中减去无风险利率。
To further facilitate comparison across strategies, most traders annualize the Sharpe ratio. Most people know how to annualize the average returns. For example, if you have been using monthly returns, then the average annual return is just 12 times the average monthly return. 为了进一步方便不同策略之间的比较,大多数交易者会将夏普比率年化。大多数人知道如何年化平均收益。例如,如果你使用的是月度收益,那么平均年化收益就是月度平均收益的 12 倍。
However, annualizing the standard deviation of returns is a bit trickier. Here, based on the assumption that the monthly returns are serially uncorrelated (Sharpe, 1994), the annual standard deviation of returns is sqrt12\sqrt{12} times the monthly standard deviation. Hence, overall, the annualized Sharpe ratio would be sqrt12\sqrt{12} times the monthly Sharpe ratio. 然而,年化收益的标准差则稍微复杂一些。这里基于月度收益无序列相关的假设(Sharpe,1994),年化收益的标准差是月度标准差的 sqrt12\sqrt{12} 倍。因此,总体而言,年化夏普比率是月度夏普比率的 sqrt12\sqrt{12} 倍。
In general, if you calculate your average and standard deviation of returns based on a certain trading period TT, whether TT is a month, a day, or an hour, and you want to annualize these quantities, you have to first find out how many such trading periods there are in a year (call it N_(T)N_{T} ). Then 一般来说,如果你基于某个交易周期 TT (无论 TT 是一个月、一天还是一小时)计算你的平均收益和收益的标准差,并且你想将这些数值年化,你首先需要确定一年中有多少个这样的交易周期(记为 N_(T)N_{T} )。然后
Annualized Sharpe Ratio =sqrt(N_(T))xx=\sqrt{N_{T}} \times Sharpe Ratio Based on TT 年化夏普比率 = =sqrt(N_(T))xx=\sqrt{N_{T}} \times 基于 TT 的夏普比率
For example, if your strategy holds positions only during the NYSE market hours (9:30-16:00 ET), and the average hourly returns is RR, and the standard deviation of the hourly returns is ss, then the annualized Sharpe ratio is sqrt1638xx R//s\sqrt{1638} \times R / s. This is because N_(T)=(252N_{T}=(252 trading days )xx(6.5) \times(6.5 trading hours per trading day )=1,638)=1,638. (A common mistake is to compute N_(T)N_{T} as 252 xx24=6,048252 \times 24=6,048.) 例如,如果你的策略仅在纽约证券交易所的交易时间(东部时间 9:30-16:00)持仓,且每小时的平均收益为 RR ,每小时收益的标准差为 ss ,那么年化夏普比率为 sqrt1638xx R//s\sqrt{1638} \times R / s 。这是因为一年有 N_(T)=(252N_{T}=(252 个交易日,每个交易日有 )xx(6.5) \times(6.5 个交易小时,乘积为 )=1,638)=1,638 。(一个常见错误是将 N_(T)N_{T} 计算为 252 xx24=6,048252 \times 24=6,048 。)
Example 3.4: Calculating Sharpe Ratio for Long-Only Versus Market-Neutral Strategies 示例 3.4:计算多头策略与市场中性策略的夏普比率
Let’s calculate the Sharpe ratio of a trivial long-only strategy for IGE: buying and holding a share since the close of November 26, 2001, and selling it at close of November 14, 2007. Assume the average risk-free rate during this period is 4 percent per annum in this example. You can download the daily prices from Yahoo! Finance, specifying the date range desired, and store them as an Excel file (or the comma-separated file for use in the R code), which you can call IGE.xls. The next steps can be done in either Excel or MATLAB: 让我们计算一个简单的 IGE 多头策略的夏普比率:从 2001 年 11 月 26 日收盘时买入一股,并在 2007 年 11 月 14 日收盘时卖出。假设该期间的平均无风险利率为每年 4%。你可以从雅虎财经下载每日价格,指定所需的日期范围,并将其保存为 Excel 文件(或用于 R 代码的逗号分隔文件),命名为 IGE.xls。接下来的步骤可以在 Excel 或 MATLAB 中完成:
Using Excel 使用 Excel
The file should have columns A-G already from the download. 下载的文件应已包含 A-G 列。
Sort all the columns in ascending order of Date (use the Data-Sort function, choose the “Expand the selection” radio button, and choose the “Ascending” as well as the “My data has Header row” radio buttons). 按日期升序排序所有列(使用数据-排序功能,选择“扩展选定区域”单选按钮,同时选择“升序”和“我的数据有标题行”单选按钮)。
In cell H3, type " =(=( G3-G2 )//) / G2". This is the daily return. 在单元格 H3 中,输入“ =(=( G3-G2 )//) / G2”。这就是每日收益率。
Double-clicking the little black dot at the lower right corner of the cell H 3 will populate the entire column H with daily returns of IGE. 双击单元格 H3 右下角的小黑点,将会填充整列 H,显示 IGE 的每日收益率。
For clarity, you can type “Dailyret” in the header cell H 1 . 为了清晰起见,你可以在表头单元格 H1 中输入“Dailyret”。
In cell I3, type " =H3-0.04//252=\mathrm{H} 3-0.04 / 252," which is the excess daily return, assuming a 4 percent per annum risk-free rate and 252 trading days in a year. 在单元格 I3 中,输入“ =H3-0.04//252=\mathrm{H} 3-0.04 / 252 ,”这表示超额每日收益率,假设年无风险利率为 4%,一年有 252 个交易日。
Double-clicking the little black dot at the lower right corner of cell I3 will populate the entire column I with excess daily returns. 双击单元格 I3 右下角的小黑点,将会填充整列 I 的超额日收益。
For clarity, type “Excess Dailyret” in the header cell II. 为了清晰起见,在标题单元格 II 中输入“Excess Dailyret”。
In cell 11506 (the last row in the next column), type " == SQRT(252)" AVERAGE(I3:II 506)/STDEV(I3:II 506)". 在单元格 11506(下一列的最后一行)输入“ == SQRT(252) AVERAGE(I3:II506)/STDEV(I3:II506)”。
The number displayed in cell II 506, which should be " 0.789317538 ," is the Sharpe ratio of this buy-and-hold strategy. 单元格 II506 中显示的数字,应为“0.789317538”,这是该买入持有策略的夏普比率。
The finished spreadsheet is available at my website at epchan.com/ book/example3_4.xls. 完成的电子表格可在我的网站 epchan.com/book/example3_4.xls 下载。
Using MATLAB 使用 MATLAB
% make sure previously defined variables are erased.
clear;
% read a spreadsheet named "IGE.xls" into MATLAB.
[num, txt]=xlsread(`IGE');
% the first column (starting from the second row)
% contains the trading days in format mm/dd/yyyy.
tday=txt(2:end, 1);
% convert the format into yyyymmdd.
tday=datestr(datenum(tday, `mm/dd/yyyy'), `yyyymmdd');
% convert the date strings first into cell arrays and
% then into numeric format.
tday=str2double(cellstr(tday));
% the last column contains the adjusted close prices.
cls=num(:, end);
% sort tday into ascending order.
[tday sortIndex]=sort(tday, `ascend');
% sort cls into ascending order of dates.
cls=cls(sortIndex);
% daily returns
dailyret=(cls(2:end)-cls(1:end-1))./cls(1:end-1);
% excess daily returns assuming risk-free rate of 4%
% per annum and 252 trading days in a year
excessRet=dailyret - 0.04/252;
% the output should be 0.7893
sharpeRatio=sqrt(252)*mean(excessRet)/std(excessRet)
This MATLAB code is also available for download at my website (epchan.com/book/example3_4.m). 这段 MATLAB 代码也可在我的网站(epchan.com/book/example3_4.m)下载。
Using Python 使用 Python
This is the code that can be run in a Jupyter notebook. The code is available for download at epchan.com/book/example3_4.ipynb 这是可以在 Jupyter 笔记本中运行的代码。代码可在 epchan.com/book/example3_4.ipynb 下载
Calculating Sharpe Ratio for Long-Only Vs Market Neutral Strategies 计算多头策略与市场中性策略的夏普比率
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 导入 matplotlib.pyplot 作为 plt
First part of example 示例的第一部分
df=pd.read_excel(‘IGE.xls’) df=pd.read_excel('IGE.xls')
df.sort_values(by=‘Date’, inplace=True) df.sort_values(by='Date', inplace=True)
dailyret=df.loc[:, ‘Adj Close’].pct_change() # daily dailyret=df.loc[:, ‘Adj Close’].pct_change() # 每日
returns 收益率
excessRet=dailyret-0.04/252 # excess daily returns = excessRet=dailyret-0.04/252 # 超额每日收益率 =
strategy returns - financing cost, assuming risk-free 策略收益率 - 融资成本,假设无风险率
rate of 收益率
sharpeRatio=np.sqrt(252)*np.mean(excessRet)/ sharpeRatio = np.sqrt(252) * np.mean(excessRet) /
np.std(excessRet)
sharpeRatio
0.789580250130583
Using R\mathbf{R} 使用 R\mathbf{R}
The code is available for download at epchan.com/book/example3_4.R 代码可在 epchan.com/book/example3_4.R 下载
library(‘zoo’)
datal <- read.csv(“IGE.csv”) # Comma delimited datal <- read.csv(“IGE.csv”) # 逗号分隔
data_sort <- datal [order (as. Date (datal[,1], data_sort <- datal[order(as.Date(datal[,1]),
‘%m/%d/%Y’)),] # sort in ascending order of dates (1st column of data) ‘%m/%d/%Y’)),] # 按日期升序排序(数据的第一列)
adjcls <- data_sort[,ncol(‘Adj.Close’)] adjcls <- data_sort[,ncol('Adj.Close')]
adjcls[ is.nan(adjcls) ] <- NA
mycls <- na.fill(adjcls, type=“locf”, nan=NA, fill=NA) mycls <- na.fill(adjcls, type="locf", nan=NA, fill=NA)
dailyret <- diff(mycls)/mycls[1:(length(mycls)-1)]
excessRet <- dailyret - 0.04/252
sharpeRatio <- sqrt(252)*mean(excessRet, na.rm = TRUE)/
sd(excessRet, na.rm = TRUE)
sharpeRatio
Now let’s calculate the Sharpe ratio of a long-short market-neutral strategy. In fact, it is a very trivial twist of the buy-and-hold strategy above: at the time we bought IGE, let’s suppose we just shorted an equal dollar amount of Standard & Poor’s depositary receipts (SPY) as a hedge, and closed both positions at the same time in November 2007. You can also 现在让我们计算一个多空市场中性策略的夏普比率。实际上,这只是上述买入持有策略的一个非常简单的变体:在我们买入 IGE 的同时,假设我们做空了等值的标准普尔存托凭证(SPY)作为对冲,并在 2007 年 11 月同时平仓。你也可以
download SPY from Yahoo! Finance and store it in a file SPY.xls. You can go through very similar steps as previously covered in both Excel and MATLAB, and I will leave it as an exercise for the reader to perform the exact steps: 从雅虎财经下载 SPY 数据并保存到文件 SPY.xls 中。你可以按照之前在 Excel 和 MATLAB 中介绍的类似步骤操作,具体步骤我留给读者自行完成:
Using Excel 使用 Excel
Sort the columns in SPY.xls in ascending order of date. 将 SPY.xls 中的列按日期升序排序。
Copy column G (Adj Close) in SPY.xls and paste it onto column J of IGE.xls. 复制 SPY.xls 中的 G 列(调整收盘价),并粘贴到 IGE.xls 的 J 列。
Check that column J has the same number of rows as columns A-I. If not, you have a different set of dates-make sure you download the matching date range from Yahoo!. 检查 J 列的行数是否与 A-I 列相同。如果不相同,说明日期不匹配——请确保从 Yahoo! 下载匹配的日期范围。
Perform the same steps as previously covered to calculate the daily returns in column K . 执行之前介绍的相同步骤,在 K 列计算每日收益。
For clarity, type “dailyretSPY” as the header in column K . 为了清晰起见,在 K 列输入标题“dailyretSPY”。
In column L , compute the net returns for the hedged strategy as the difference between column H and K divided by 2 . (Divide by 2 because we now have twice the capital.) 在 L 列,计算对冲策略的净收益,方法是用 H 列减去 K 列的差值再除以 2。(除以 2 是因为现在资本是之前的两倍。)
In cell L1506, compute the Sharpe ratio of this hedged strategy. You should get “0.783681.” 在 L1506 单元格,计算该对冲策略的夏普比率。你应该得到“0.783681”。
Using MATLAB 使用 MATLAB
% Assume this is a continuation of the above MATLAB
%code.
% Insert your own code here to retrieve data from
% SPY.xls just as done previously.
% Name the array that contains the daily returns of
% SPY "dailyretSPY".
% net daily returns
(divide by 2 because we now have twice as much capital.)
netRet=(dailyret - dailyretSPY)/2;
% the output should be 0.7837.
sharpeRatio=sqrt(252)*mean(netRet)/std(netRet)
Using PYTHON 使用 PYTHON
Second part of example
df2=pd.read_excel('SPY.xls')
df=pd.merge(df, df2, on='Date', suffixes=('_IGE', '_
SPY'))
df['Date']=pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df.sort_index(inplace=True)
dailyret=df[['Adj Close_IGE', 'Adj Close_SPY']].pct_
change() # daily returns
This code can be downloaded as example3_4.R:
\# 2nd part of example 3.4
data2 <- read.delim("SPY.txt") \# Tab-delimited
data_sort <- data2[order(as.Date(data2[,1],
'%m/%d/%Y')),] # sort in ascending order of dates (1st
column of data)
adjcls <- data_sort[,ncol('Adj.Close')]
adjcls[ is.nan(adjcls) ] <- NA
mycls <- na.fill(adjcls, type="locf", nan=NA, fill=NA)
dailyretSPY <- diff(mycls)/mycls[1:(length(mycls)-1)]
excessRet <- (dailyret - dailyretSPY)/2
informationRatio <- sqrt(252)*mean(excessRet, na.rm =
TRUE)/sd(excessRet, na.rm = TRUE)
informationRatio
Example 3.5: Calculating Maximum Drawdown and Maximum Drawdown Duration 示例 3.5:计算最大回撤和最大回撤持续时间
We shall continue the preceding long-short market-neutral example in order to illustrate the calculation of maximum drawdown and maximum drawdown duration. The first step in this calculation is to calculate the “high watermark” at the close of each day, which is the maximum cumulative return of the strategy up to that time. (Using the cumulative return curve to calculate high watermark and drawdown is equivalent to using the equity curve, since equity is nothing more than initial investment times 1 plus the cumulative return.) From the high watermark, we can calculate the drawdown, the maximum drawdown, and maximum drawdown duration: 我们将继续前面的多空市场中性示例,以说明最大回撤和最大回撤持续时间的计算。计算的第一步是在每天收盘时计算“最高水位线”,即策略到当时为止的最大累计收益。(使用累计收益曲线计算最高水位线和回撤等同于使用权益曲线,因为权益不过是初始投资乘以 1 加上累计收益。)根据最高水位线,我们可以计算回撤、最大回撤和最大回撤持续时间:
Using Excel 使用 Excel
In cell M3, type " =L3=\mathrm{L} 3 ". 在单元格 M3 中,输入“ =L3=\mathrm{L} 3 ”。
In cell M 4 , type " =(1+M3)^(**)(1+L4)-1=(1+\mathrm{M} 3)^{*}(1+\mathrm{L} 4)-1 ". This is the cumulative compounded return of the strategy up to that day. Populate the entire column M with the cumulative compounded returns of the strategy and erase the last cell of the column. Name this column Cumret. 在单元格 M4 中,输入“ =(1+M3)^(**)(1+L4)-1=(1+\mathrm{M} 3)^{*}(1+\mathrm{L} 4)-1 ”。这是策略到当天为止的累计复合收益。用策略的累计复合收益填充整个 M 列,并删除该列的最后一个单元格。将此列命名为 Cumret。
In cell N3, type " == M3". 在单元格 N3 中,输入“ == M3”。
In cell N4, type " == MAX(N3, M4)". This is the high watermark up to that day. Populate the entire column N with the running high watermark of the strategy and erase the last cell of the column. Name this column High watermark. 在单元格 N4 中,输入“ == MAX(N3, M4)”。这是截至当天的最高水位。用策略的运行最高水位填充整个 N 列,并删除该列的最后一个单元格。将此列命名为“High watermark”。
In cell O 3 , type " =(1+M3)//(1+N3)-1=(1+\mathrm{M} 3) /(1+\mathrm{N} 3)-1 ". This is the drawdown at that day’s close. Populate the entire column O with the drawdowns of the strategy. 在单元格 O3 中,输入“ =(1+M3)//(1+N3)-1=(1+\mathrm{M} 3) /(1+\mathrm{N} 3)-1 ”。这是当天收盘时的回撤。用策略的回撤填充整个 O 列。
In cell 01506 , type " =MAX(03:01505)=\mathrm{MAX}(03: 01505) ". This is the maximum drawdown of the strategy. It should have a value of about 0.1053 , that is, a maximum drawdown of 10.53 percent. 在单元格 O1506 中,输入“ =MAX(03:01505)=\mathrm{MAX}(03: 01505) ”。这是策略的最大回撤。其值应约为 0.1053,即最大回撤为 10.53%。
In cell P 3 , type " =IF(O3=0,0,P2+1)=\mathrm{IF}(\mathrm{O} 3=0,0, \mathrm{P} 2+1) ". This is the duration of the current drawdown. Populate the entire column R with the drawdown durations of the strategy and erase the last cell of the column. 在单元格 P3 中,输入“ =IF(O3=0,0,P2+1)=\mathrm{IF}(\mathrm{O} 3=0,0, \mathrm{P} 2+1) ”。这是当前回撤的持续时间。用策略的回撤持续时间填充整个 R 列,并删除该列的最后一个单元格。
In cell P1506, type " =MAX(P3:P1505)=\mathrm{MAX}(\mathrm{P} 3: \mathrm{P} 1505) ". This is the maximum drawdown duration of the strategy. It should have a value of 497, that is, a maximum drawdown duration of 497 trading days. 在单元格 P1506 中,输入“ =MAX(P3:P1505)=\mathrm{MAX}(\mathrm{P} 3: \mathrm{P} 1505) ”。这是策略的最大回撤持续时间。它的值应为 497,即最大回撤持续时间为 497 个交易日。
Using MATLAB 使用 MATLAB
% Assume this is a continuation of the above MATLAB
% code.
% cumulative compounded returns
cumret=cumprod(1+netRet)-1;plot(cumret);
[maxDrawdown maxDrawdownDuration]=...
calculateMaxDD (cumret);
% maximum drawdown. Output should be -0.0953
maxDrawdown
% maximum drawdown duration. Output should be 497.
maxDrawdownDuration
Notice the code fragment above calls a function “calculateMaxDrawdown,” which I display below. 注意,上面的代码片段调用了一个名为“calculateMaxDrawdown”的函数,下面我将展示该函数。
function [maxDD maxDDD]=calculateMaxDD(cumret)
% [maxDD maxDDD]=calculateMaxDD(cumret)
% calculation of maximum drawdown and maximum drawdown
% duration based on cumulative compounded returns.
% initialize high watermarks to zero.
highwatermark=zeros (size(cumret));
% initialize drawdowns to zero.
drawdown=zeros(size(cumret));
% initialize drawdown duration to zero.
drawdownduration=zeros(size(cumret));
for t=2:length(cumret)
highwatermark(t) =...
max(highwatermark(t-1), cumret(t));
% drawdown on each day
drawdown(t)=(1+ cumret(t))/(1+ highwatermark(t))-1;
if (drawdown(t)==0)
drawdownduration(t)=0;
else
drawdownduration(t)=drawdownduration(t-1)+1;
end
end
maxDD=max(drawdown); % maximum drawdown
% maximum drawdown duration
maxDDD=max(drawdownduration);
The file that contains this function is available as epchan.com/book/ calculateMaxDD.m. You can see in Figure 3.1 where the maximum drawdown and maximum drawdown duration occurred in this plot of the cumulative returns.
FIGURE 3.1 Maximum drawdown and maximum drawdown duration for Example 3.4.
Using Python 使用 Python
You will need to download the following calculateMaxDD.py code to your folder first: 你需要先将以下 calculateMaxDD.py 代码下载到你的文件夹中:
import numpy as np
def calculateMaxDD(cumret):
# =================================================
# calculation of maximum drawdown and maximum drawdown
duration based on
# cumulative COMPOUNDED returns. cumret must be a com-
pounded cumulative return.
# i is the index of the day with maxDD.
# =====================================================
highwatermark=np.zeros(cumret.shape)
drawdown=np.zeros(cumret.shape)
drawdownduration=np.zeros(cumret.shape)
for t in np.arange(1, cumret.shape[0]):
highwatermark[t]=np.maximum(highwatermark[t-1],
cumret[t])
drawdown[t] = (1+cumret[t])/(1+highwatermark[t])-1
if drawdown[t]==0:
drawdownduration[t]=0
else:
drawdownduration[t]=drawdownduration[t-1]+1
maxDD, i=np.min(drawdown), np.argmin(drawdown) #
drawdown < 0 always
maxDDD=np.max(drawdownduration)
return maxDD, maxDDD, i
Then you can run the rest of Jupyter notebook:
cumret=np.cumprod(1+netRet)-1
plt.plot(cumret)
from calculateMaxDD import calculateMaxDD
maxDrawdown, maxDrawdownDuration, startDrawdownDay=calcu
lateMaxDD (cumret.values)
maxDrawdown
-0.09529268047208683
maxDrawdownDuration
497.0
startDrawdownDay
1 2 2 3
Using R\mathbf{R} 使用 R\mathbf{R}
You first need to download the function calculateMaxDD.R to your folder.
# calculateMaxDD.R
calculateMaxDD <- function(cumret) {
# Assume compounded cumulative return as input
highwatermark <- rep(0, length(cumret))
highwatermark
drawdown <- rep(0, length(cumret))
drawdownduration <- rep(0, length(cumret))
for (t in 2:length(cumret)) {
highwatermark[t] <- max(highwatermark[t-1],
cumret[t])
drawdown[t] <- (1+cumret[t])/(1+highwatermark[t])-1
if (drawdown[t]==0) {
drawdownduration[t]=0
} else {
drawdownduration[t]=drawdownduration[t-1]+1
}
}
maxDD <- min(drawdown)
maxDDD <- max(drawdownduration)
return(c(maxDD, maxDDD))
}
Then you can run the following code, which is the third part of example3_4.R: 然后你可以运行以下代码,这是 example3_4.R 的第三部分:
# 3rd part of example 3.4 # 示例 3.4 的第 3 部分
source(‘calculateMaxDD.R’) source('calculateMaxDD.R')
cumret <- cumprod(1+excessRet[!is.nan(excessRet)])-1 cumret <- cumprod(1 + excessRet[!is.nan(excessRet)]) - 1
plot (cumret) plot(cumret)
output <- calculateMaxDD(cumret)
maxDD <- output[1]
maxDD
maxDDD <- output[2]
maxDDD 最大回撤
COMMON BACKTESTING PITFALLS TO AVOID 常见的回测陷阱及避免方法
Backtesting is the process of creating the historical trades given the historical information available at that time, and then finding out what the subsequent performance of those trades is. This process 回测是指根据当时可用的历史信息,创建历史交易记录,然后分析这些交易的后续表现的过程。这个过程
seems easy, given that the trades were made using a computer algorithm in our case, but there are numerous ways in which it can go wrong. Usually, an erroneous backtest would produce a historical performance that is better than what we would have obtained in actual trading. We have already seen how survivorship bias in the data used for backtesting can result in inflated performance. There are, however, other common pitfalls related to how the backtest program is written, or more fundamentally, to how you construct your trading strategy. I will describe two of the most common ones here, with tips on how to avoid them. 看起来很简单,尤其是在我们的案例中交易是通过计算机算法完成的,但实际上有很多可能出错的地方。通常,错误的回测会产生比实际交易更优的历史表现。我们已经看到,回测所用数据中的存活者偏差会导致业绩被高估。然而,还有其他常见的陷阱,涉及回测程序的编写方式,或者更根本地,涉及你构建交易策略的方式。这里我将描述两个最常见的陷阱,并提供避免它们的建议。
Look-Ahead Bias 前瞻性偏差
This error refers to the situation when you are using information that was available only at a time ahead of the instant the trade was made. For example, if your trade entry rule reads: “Buy when the stock is within 1 percent of the day’s low,” you have introduced a look-ahead bias in your strategy, because you could not possibly have known what the day’s low was until the market closed that day. Another example: Suppose a model involves a linear regression fit of two price series. If you use the regression coefficients obtained from the entire data set to determine your daily trading signals, you have again introduced look-ahead bias. 这种错误指的是你使用了在交易发生时尚不可用的信息。例如,如果你的交易入场规则是:“当股票价格接近当天最低价的 1%以内时买入”,那么你的策略中就引入了前瞻性偏差,因为你不可能在当天收盘前知道当天的最低价。另一个例子:假设一个模型涉及两个价格序列的线性回归拟合。如果你使用从整个数据集获得的回归系数来确定每日交易信号,那么你同样引入了前瞻性偏差。
How do we avoid look-ahead bias? Use lagged historical data for calculating signals at every opportunity. Lagging a series of data means that you calculate all the quantities like moving averages, highs and lows, or even volume, based on data up to the close of the previous trading period only. (Of course, you needn’t lag the data if your strategy enters only at the close of the period.) 我们如何避免前瞻性偏差?在每次计算信号时使用滞后的历史数据。滞后数据意味着你计算所有指标,如移动平均线、最高价和最低价,甚至成交量时,只基于前一个交易周期收盘时的数据。(当然,如果你的策略只在周期收盘时入场,则无需滞后数据。)
Look-ahead bias is easier to avoid using Excel or other WYSIWYG programs than using MATLAB, Python, or R. This is because it is easy to align all the different columns of data in Excel and ensure that the formula in each cell is computed based on the rows above the current row. It would be visually obvious when one is using current day’s data in generating signals, given the cell-highlighting functionality in Excel. (Double-clicking a cell with a formula will highlight the cells of data this formula utilizes.) With MATLAB, 使用 Excel 或其他所见即所得(WYSIWYG)程序比使用 MATLAB、Python 或 R 更容易避免前瞻性偏差。这是因为在 Excel 中很容易对齐所有不同的数据列,并确保每个单元格中的公式是基于当前行之上的行计算的。利用 Excel 的单元格高亮功能,当使用当天数据生成信号时,这一点会非常直观。(双击带有公式的单元格会高亮显示该公式所使用的数据单元格。)而在 MATLAB,
Python, or RR, you have to be more careful and remember to run a lag function on certain series used for signal generation. Python 或 RR 中,你必须更加小心,记得对用于信号生成的某些序列运行滞后函数。
Even with all the care and caution that goes into creating a backtest program without look-ahead bias, sometimes we may still let some of it slip in. Some look-ahead bias is quite subtle in nature and not easy to avoid, especially if you are using MATLAB, Python, or R. It is best to do a final checkup of your backtest program using this method: Run the program using all your historical data; generate and save the resulting position file to file A (a position file is the file that contains all the recommended positions generated by the program on each day). Now truncate your historical data so that the most recent portion (say NN days) is removed. So if the last day in the original data is TT, then the last day in the truncated data should be T-NT-N. NN could be 10 days to 100 days. Now run the backtest program again using the truncated data and save the resulting positions into a new file B. Truncate the most recent NN rows of the positions file A so that both A and B have the same number of rows (days) in them, and the last day in both file A and B should be T-NT-N. Finally, check if A and B are identical in their positions. If not, you have a look-ahead bias in your backtest program that you must find and correct, because the discrepancies in positions mean that you are inadvertently using the truncated part of the historical data (the part that lies ahead of day T-NT-N ) in determining the positions in file A . I will illustrate this somewhat convoluted procedure at the end of Example 3.6. 即使在创建没有前视偏差的回测程序时投入了所有的细心和谨慎,有时我们仍可能让一些前视偏差悄然进入。有些前视偏差本质上相当微妙,尤其是在使用 MATLAB、Python 或 R 时,不容易避免。最好使用以下方法对你的回测程序做最后的检查:使用所有历史数据运行程序;生成并保存结果持仓文件为文件 A(持仓文件是包含程序每天生成的所有推荐持仓的文件)。现在截断你的历史数据,去除最近的一部分(比如 NN 天)。如果原始数据的最后一天是 TT ,那么截断数据的最后一天应为 T-NT-N 。 NN 可以是 10 天到 100 天。然后使用截断后的数据再次运行回测程序,并将结果持仓保存到新文件 B 中。截断持仓文件 A 中最近的 NN 行,使得文件 A 和 B 的行数(天数)相同,且文件 A 和 B 的最后一天均为 T-NT-N 。最后,检查文件 A 和 B 的持仓是否完全一致。 如果不是这样,那么你的回测程序中存在前视偏差,你必须找到并纠正它,因为头寸上的差异意味着你无意中使用了历史数据中被截断的部分(即第 T-NT-N 天之后的部分)来确定文件 A 中的头寸。我将在示例 3.6 的结尾处说明这一稍显复杂的过程。
Data-Snooping Bias 数据窥探偏差
In Chapter 2, I mentioned data-snooping bias-the danger that backtest performance is inflated relative to the future performance of the strategy because we have overoptimized the parameters of the model based on transient noise in the historical data. Data snooping bias is pervasive in the business of predictive statistical models of historical data, but is especially serious in finance because of the limited amount of independent data we have. High-frequency data, while in abundant supply, is useful only for high-frequency models. And while we have stock market data stretching back to the early parts of the twentieth century, only data within the past 10 years 在第 2 章中,我提到了数据窥探偏差——即回测表现相对于策略未来表现被高估的风险,因为我们基于历史数据中的暂时噪声过度优化了模型参数。数据窥探偏差在基于历史数据的预测统计模型业务中普遍存在,但在金融领域尤为严重,因为我们拥有的独立数据量有限。高频数据虽然供应充足,但仅对高频模型有用。尽管我们拥有追溯到 20 世纪早期的股市数据,但只有过去 10 年内的数据...
are really suitable for building predictive model. Furthermore, as discussed in Chapter 2, regime shifts may render even data that are just a few years old obsolete for backtesting purposes. The less independent data you have, the fewer adjustable parameters you should employ in your trading model. 真正适合构建预测模型。此外,如第 2 章所述,市场状态的转变可能使得即使是仅几年前的数据在回测时也变得过时。你拥有的独立数据越少,交易模型中应使用的可调参数就越少。
As a rule of thumb, I would not employ more than five parameters, including quantities such as entry and exit thresholds, holding period, or the lookback period, in computing moving averages. Furthermore, not all data-snooping bias is due to the optimization of parameters. Numerous choices one makes in creating a trading model can be affected by repeated backtesting on the same data setdecisions such as whether to enter at the open or close, whether to hold the positions overnight, whether to trade large-cap or mid-cap stocks. Often, these qualitative decisions are made to optimize the backtest performance, but they may not be optimal going forward. Bailey et al. developed a metric called “Deflated Sharpe Ratio” to take into account how many times you have tweaked a backtest to obtain that amazing Sharpe ratio. The more tweaks you have done, the more deflated your true (i.e., expected live trading) Sharpe ratio will be relative to your backtest Sharpe ratio. The formula is a bit complicated to display here, so see Bailey (2014). 作为经验法则,我不会使用超过五个参数,包括入场和出场阈值、持有期或回溯期等数量,在计算移动平均线时。此外,并非所有的数据挖掘偏差都源于参数优化。创建交易模型时所做的许多选择都可能受到在同一数据集上反复回测的影响——例如是否在开盘价还是收盘价入场,是否隔夜持仓,是否交易大盘股或中盘股等决策。通常,这些定性决策是为了优化回测表现而做出的,但它们在未来可能并非最优。Bailey 等人开发了一个名为“折减夏普比率”的指标,用以考虑你为了获得惊人的夏普比率而对回测所做的调整次数。调整次数越多,你的真实(即预期实盘交易)夏普比率相对于回测夏普比率的折减就越大。公式较为复杂,这里不便展示,详见 Bailey (2014)。
It is almost impossible to completely eliminate data-snooping bias as long as we are building data-driven models. However, there are ways to mitigate the bias. 只要我们构建基于数据的模型,几乎不可能完全消除数据窥探偏差。然而,有一些方法可以减轻这种偏差。
Sample Size The most basic safeguard against data-snooping bias is to ensure that you have a sufficient amount of backtest data relative to the number of free parameters you want to optimize. There are some rigorous mathematical results by Bailey et al. (Bailey, 2012) on this minimum backtest length. The idea is that any backtest’s Sharpe ratio is only an estimate of the strategy’s true (i.e., expected live trading) Sharpe ratio if the amount of data used is finite. If we want to be confident (statistically speaking) that your strategy’s “true” Sharpe ratio will be equal to or greater than some desired number, you will need to ensure that the backtest length is equal to or greater than some minimum, and you will also need to ensure that the backtest Sharpe ratio is higher than the desired true Sharpe ratio. There are a few useful estimates: 样本量 对抗数据窥探偏差最基本的保障是确保你的回测数据量相对于你想要优化的自由参数数量来说足够多。Bailey 等人(Bailey, 2012)对最小回测长度有一些严格的数学结果。其核心思想是,任何回测的夏普比率仅仅是策略真实(即预期实盘交易)夏普比率的一个估计值,因为所用数据量是有限的。如果我们想要有统计学上的信心,认为你的策略“真实”的夏普比率将等于或高于某个期望值,那么你需要确保回测长度等于或大于某个最小值,同时还需要确保回测的夏普比率高于期望的真实夏普比率。有几个有用的估计值:
If you want to be statistically confident (at the 95 percent level) that your true Sharpe ratio is equal to or greater than 0 , you need a backtest Sharpe ratio of 1 and a sample size of 681 data points (e.g., 2.71 years of daily data). 如果你想在 95%的置信水平下统计上确信你的真实夏普比率等于或大于 0,你需要一个回测夏普比率为 1 且样本量为 681 个数据点(例如,2.71 年的每日数据)。
The higher the backtest Sharpe ratio, the smaller the sample size is needed. If your backtest Sharpe ratio is 2 or more, then you need only 174 data points ( 0.69 years of daily data) to be confident that your true Sharpe ratio is equal to or greater than 0 . 回测夏普比率越高,所需的样本量越小。如果你的回测夏普比率为 2 或更高,那么你只需要 174 个数据点(0.69 年的每日数据)就能确信你的真实夏普比率等于或大于 0。
If you want to be confident that your true Sharpe ratio is equal to or greater than 1, then you need a backtest Sharpe ratio of at least 1.5 and a sample size of 2,739 (10.87 years of daily data). 如果你想确信你的真实夏普比率等于或大于 1,那么你需要一个至少为 1.5 的回测夏普比率和 2739 个样本量(10.87 年的每日数据)。
Note that these results not only apply to backtest, but also to out-ofsample (paper trading) length. I will discuss out-of-sample testing next. 请注意,这些结果不仅适用于回测,也适用于样本外(模拟交易)的长度。接下来我将讨论样本外测试。
Out-of-Sample Testing Divide your historical data into two parts. Save the second (more recent) part of the data for out-ofsample testing. When you build the model, optimize the parameters as well as other qualitative decisions on the first portion (called the training set), but test the resulting model on the second portion (called the test set). The minimum size of the out-of-sample test set is determined by the same mathematical result in the previous section. It is clearly advantageous to have a strategy with a high backtest Sharpe ratio, because you will need to paper trade for a shorter time to determine whether that great backtest result is real! 样本外测试 将你的历史数据分成两部分。将第二部分(较新的数据)保留用于样本外测试。在构建模型时,在第一部分(称为训练集)上优化参数以及其他定性决策,但在第二部分(称为测试集)上测试所得模型。样本外测试集的最小规模由上一节中的相同数学结果决定。拥有一个高回测夏普比率的策略显然是有利的,因为你需要进行的模拟交易时间会更短,从而确定那个优秀的回测结果是否真实!
Ideally, the set of optimal parameters and decisions for the first part of the backtest period is also the optimal set for the second period, but things are rarely this perfect. The performance on the second part of the data should at least be reasonable. Otherwise, the model has data-snooping bias built into it, and one way to cure it is to simplify the model and eliminate some parameters. 理想情况下,回测期第一部分的最优参数和决策集也是第二部分的最优集,但情况很少如此完美。第二部分数据上的表现至少应该是合理的。否则,模型中就内置了数据窥探偏差,解决方法之一是简化模型,剔除一些参数。
A more rigorous (albeit more computationally intensive) method of out-of-sample testing is to use moving optimization of the parameters. In this case, the parameters themselves are constantly adapting to the changing historical data, and data-snooping bias with respect to parameters is eliminated. (See the sidebar on parameterless trading models.) 一种更严格(尽管计算量更大)的样本外测试方法是使用参数的移动优化。在这种情况下,参数本身会不断适应变化的历史数据,从而消除了与参数相关的数据窥探偏差。(参见关于无参数交易模型的侧栏。)
A portfolio manager whom I used to work for liked to proudly proclaim that his trading models have “no free parameters.” In keeping with the tradition of secrecy in our industry, he would not divulge his technique further. 我曾经共事的一位投资组合经理喜欢自豪地宣称他的交易模型“没有自由参数”。遵循我们行业的保密传统,他不会进一步透露他的技术细节。
Lately, I have begun to understand what a trading model with no free parameters means. It doesn’t mean that it does not contain, for example, any lookback period for calculating trends, or thresholds for entry or exit. I think that would be impossible. It just means that all such parameters are dynamically optimized in a moving lookback window. This way, if you ask, “Does the model have a fixed profit cap?” the trader can honestly reply: “No, profit cap is not an input parameter. It is determined by the model itself.” 最近,我开始理解无自由参数交易模型的含义。这并不意味着模型中没有例如用于计算趋势的回溯期,或进出场的阈值。我认为这是不可能的。它只是意味着所有这些参数都在一个移动回溯窗口中动态优化。这样,如果你问:“模型有固定的利润上限吗?”交易者可以诚实地回答:“没有,利润上限不是输入参数。它由模型本身决定。”
The advantage of a parameterless trading model is that it minimizes the danger of overfitting the model to multiple input parameters (the so-called “data-snooping bias”). So the backtest performance should be much closer to the actual forward performance. 无参数交易模型的优势在于它最大限度地减少了模型对多个输入参数过拟合的风险(即所谓的“数据窥探偏差”)。因此,回测表现应当更接近实际的前瞻表现。
(Note that parameter optimization does not necessarily mean picking one best set of parameters that give the best backtest performance. Often, it is better to make a trading decision based on some kind of averages over different sets of parameters.) (注意,参数优化并不一定意味着选择一组能带来最佳回测表现的参数。通常,基于不同参数集的某种平均值来做交易决策会更好。)
A much more intelligent way to optimize parameters than just using a moving lookback period parameter optimization or averaging over different parameter values is a novel technique we developed called Conditional Parameter Optimization (CPO). It makes use of machine learning to determine the optimal parameters to use for each trade or each day. We will discuss that in Example 7.1. 比仅仅使用移动回溯期参数优化或对不同参数值进行平均更智能的参数优化方法,是我们开发的一种新技术,称为条件参数优化(Conditional Parameter Optimization,CPO)。它利用机器学习来确定每笔交易或每天应使用的最优参数。我们将在示例 7.1 中讨论这一点。
The ultimate out-of-sample testing is familiar to many traders, and it is called paper trading. Running the model on actual unseen data is the most reliable way to test it (short of actually trading it). Paper trading not only allows you to perform a truly honest out-of-sample test; it often allows you to discover look-ahead errors in your programs, as well as making you aware of various operational issues. I will discuss paper trading in Chapter 5. 最终的样本外测试对许多交易者来说并不陌生,这就是所谓的模拟交易。在实际未见过的数据上运行模型是测试模型最可靠的方法(除了真正进行交易之外)。模拟交易不仅允许你进行真正诚实的样本外测试;它还常常能帮助你发现程序中的前瞻性错误,并让你意识到各种操作性问题。我将在第 5 章讨论模拟交易。
If the strategy that you are testing comes from a published source, and you are just conducting a backtest to verify that the 如果你测试的策略来自已发布的来源,而你只是进行回测以验证结果是否正确,
results are correct, then the entire period between the time of publication and the time that you tested the strategy is a genuine out-ofsample period. As long as you do not optimize the parameters of the published model on the out-of-sample period, this period is as good as paper trading the strategy. 那么从发布之时到你测试策略之时的整个期间都是一个真正的样本外期。只要你不在样本外期间对已发布模型的参数进行优化,这个期间就和模拟交易策略一样有效。
Example 3.6: Pair Trading of GLD and GDX 示例 3.6:GLD 和 GDX 的配对交易
This example will illustrate how to separate the data into a training set and a test set. We will backtest a pair-trading strategy and optimize its parameters on the training set and look at the effect on the test set. 这个例子将演示如何将数据分为训练集和测试集。我们将对一个配对交易策略进行回测,并在训练集上优化其参数,然后观察对测试集的影响。
GLD versus GDX is a good candidate for pair trading because GLD reflects the spot price of gold, and GDX is a basket of gold-mining stocks. It makes intuitive sense that their prices should move in tandem. I have discussed this pair of ETFs extensively on my blog in connection with cointegration analysis (see, e.g., Chan, 2006b). Here, however, I will defer until Chapter 7 the cointegration analysis on the training set, which demonstrates that the spread formed by long GLD and short GDX is mean reverting. Instead, we will perform a regression analysis on the training set to determine the hedge ratio between GLD and GDX, and then define entry and exit thresholds for a pair-trading strategy. We will see how optimizing these thresholds on the training set changes the performance on the test set. (This program is available as on epchan.com/book/example3_6.m. The data files are available as GDX.xls and GLD.xls. This program uses a lagl function, which will lag the time series by one time period. It is included at epchan.com/book as well. It also uses a function “ols” for linear regression, which is part of a free package downloaded from spatialeconometrics.com.) GLD 与 GDX 是配对交易的一个很好的候选,因为 GLD 反映了黄金的现货价格,而 GDX 是一篮子黄金矿业股票。直观上,它们的价格应该同步变动。我在博客中已经广泛讨论过这对 ETF 与协整分析的关系(参见例如 Chan, 2006b)。不过,在这里,我将把训练集上的协整分析推迟到第 7 章,在那里会展示由做多 GLD 和做空 GDX 形成的价差是均值回复的。相反,我们将对训练集进行回归分析,以确定 GLD 和 GDX 之间的对冲比率,然后为配对交易策略定义入场和退出阈值。我们将看到在训练集上优化这些阈值如何改变测试集上的表现。(该程序可在 epchan.com/book/example3_6.m 获得。数据文件可在 GDX.xls 和 GLD.xls 中获得。该程序使用了 lagl 函数,该函数会将时间序列滞后一周期,也包含在 epchan.com/book 中。它还使用了“ols”函数进行线性回归,该函数是从 spatialeconometrics.com 免费下载的一个包的一部分。)
Using MATLAB 使用 MATLAB
clear; % make sure previously defined variables are erased. clear; % 确保之前定义的变量被清除。
[num, txt]=xlsread('GLD'); % read a spreadsheet named
"GLD.xls" into MATLAB.
tday1=txt(2:end, 1); % the first column (starting from
the second row) is the trading days in format mm/dd/
yyyy.
tday1=datestr(datenum(tday1, 'mm/dd/yyyy'), 'yyyymmdd');
% convert the format into yyyymmdd 20041118-20071130.
tday1=str2double(cellstr(tday1)); % convert the date
strings first into cell arrays and then into numeric
format.
adjclsl=num(:, end); % the last column contains the
adjusted close prices.
[num, txt]=xlsread('GDX'); % read a spreadsheet named
"GDX.xls" into MATLAB.
tday2=txt(2:end, 1); % the first column (starting from
the second row) is the trading days in format mm/dd/
yyyy.
tday2=datestr(datenum(tday2, 'mm/dd/yyyy'), 'yyyymmdd');
% convert the format into yyyymmdd 20060523-20071130.
tday2=str2double(cellstr(tday2)); % convert the date
strings first into cell arrays and then into numeric
format.
adjcls2=num(:, end); % the last column contains the
adjusted close prices.
[tday, idx1, idx2]=intersect(tday1, tday2); % find the
intersection of the two data sets, and sort them in
ascending order
cl1=adjcls1(idx1);
cl2=adjcls2(idx2);
trainset=1:252; % define indices for training set
testset=trainset(end) +1:length(tday); % define indices
for test set
% determines the hedge ratio on the trainset
results=ols(cll(trainset), cl2(trainset)); % use regres-
sion function
hedgeRatio=results.beta; % 1.6368
spread=cl1-hedgeRatio*cl2; % spread = GLD -
hedgeRatio*GDX
plot(spread(trainset));
figure;
plot(spread(testset));
figure;
spreadMean=mean(spread(trainset)); % mean of spread on
trainset
spreadStd=std(spread(trainset)); % standard deviation of
spread on trainset
zscore=(spread - spreadMean)./spreadStd; % z-score of
spread
longs=zscore<=-2; % buy spread when its value drops
below 2 standard deviations.
shorts=zscore>=2; % short spread when its value rises
above 2 standard deviations.
exitLongs=zscore>=-1; % exit any spread position when
its value is within l standard deviation of its mean.
exitShorts=zscore<=1; % exit any spread position when
its value is within 1 standard deviation of its mean.
positionsL=zeros(length(tday), 2); % initialize long
positions array
positionsS=zeros(length(tday), 2); % initialize long
positions array
positionsS(shorts, :)=repmat([-1 1],
[length(find(shorts)) 1]); % long entries
positionsL(longs, :)=repmat([1 -1],
[length(find(longs)) 1]); % short entries
positionsL(exitLongs, :)=zeros(length(find(exitLongs)),
2); % exit positions
positionsS(exitShorts, :)=zeros(length(find(exitSho
rts)), 2); % exit positions
positions=positionsL+positionsS;
positions=fillMissingData(positions); % ensure existing
positions are carried forward unless there is an exit
signal
cl=[cl1 cl2]; % combine the 2 price series
dailyret=(cl - lag1(cl))./lag1(cl);
pnl=sum(lag1(positions).*dailyret, 2);
sharpeTrainset=sqrt(252)*mean(pnl(trainset(2:end)))./
std(pnl(trainset(2:end))) % the Sharpe ratio on the
training set should be about 2.3
sharpeTestset=sqrt(252)*mean(pnl(testset))./
std(pnl(testset)) % the Sharpe ratio on the test set
should be about 1.5
% sharpeTrainset =
%
% 2.0822
%
%
% sharpeTestset =
%
% 1.4887
plot(cumsum(pnl(testset)));
save example3_6_positions positions; % save positions
file for checking look-ahead bias.
In file lag1.m:
function y=lag1(x)
% y=lag (x)
if (isnumeric(x))
% populate the first entry with NaN
y= [NaN(1,size(x,2));x(1:end-1, :)];elseif (ischar(x))
% populate the first entry with "
y=[repmat(",[1 size(x,2)]);x(1:end-1, :)];else
error(`Can only be numeric or char array');
end
So this pair-trading strategy has excellent Sharpe ratios on both the training set and the test set. Therefore, this strategy can be considered free of data-snooping bias. However, there may be room for improvement. Let's
see what happens if we change the entry thresholds to 1 standard deviation and exit threshold to 0.5 standard deviation. In this case, the Sharpe ratio on the training set increases to 2.9 and the Sharpe ratio on the test set increases to 3.0 . So, clearly, this set of thresholds is better. 看看如果我们将入场阈值改为 1 个标准差,出场阈值改为 0.5 个标准差,会发生什么。在这种情况下,训练集上的夏普比率提高到 2.9,测试集上的夏普比率提高到 3.0。所以,很明显,这组阈值更好。
Often, however, optimizing the parameters on the training set may decrease the performance on the test set. In this situation, you should choose a set of parameters that result in good (but may not be the best) performance on both training and test sets. 然而,通常情况下,在训练集上优化参数可能会降低测试集上的表现。在这种情况下,你应该选择一组在训练集和测试集上都能获得良好(但可能不是最佳)表现的参数。
I have not incorporated transaction costs (which I discuss in the next section) into this analysis. You can try to add that as an exercise. Since this strategy doesn’t trade very frequently, transaction costs do not have a big impact on the resulting Sharpe ratio. 我还没有将交易成本(我将在下一节讨论)纳入此分析。你可以尝试将其作为练习添加进去。由于该策略交易频率不高,交易成本对最终的夏普比率影响不大。
To see why this strategy works, just take a look at Figure 7.4 of the spread, which I will discuss in connection with stationarity and cointegration in Chapter 2. You can see that the spread behaves in a highly meanreverting manner. Hence, buying low and selling high over and over again works well here. 要理解该策略为何有效,只需看看图 7.4 中的价差,我将在第 2 章中结合平稳性和协整来讨论它。你可以看到价差表现出高度的均值回复特性。因此,反复低买高卖在这里效果很好。
One last check, though, that we should perform before calling this a success: We need to check for any look-ahead bias in the backtest program. Add the following code fragment to the previous MATLAB code after the line “cl2=adjcls2(idx2);” 不过,在称其为成功之前,我们还应做最后一项检查:需要检查回测程序中是否存在任何前视偏差。在之前的 MATLAB 代码中“cl2=adjcls2(idx2);”这一行之后,添加以下代码片段。
% number of most recent trading days to cut off cutoff =60=60; % remove the last cutoff number of days. % 最近交易日的截断天数 cutoff =60=60 ; % 移除最后 cutoff 天数。
tday(end-cutoff+1:end, :)=[];
cll(end-cutoff+1:end, :)=[];
cl2(end-cutoff+1:end, :)=[];
Add the following code fragment to the very end of the previous MATLAB program, replacing the line “save example3_6_positions positions.” 将以下代码片段添加到之前 MATLAB 程序的末尾,替换“save example3_6_positions positions.”这一行。
% step two of look-forward-bias check oldoutput=load(‘example3_6_positions’); % 向前偏差检查的第二步 oldoutput=load('example3_6_positions');
oldoutput.positions(end-cutoff+1:end, :)=[];
if (any (positions~=oldoutput.positions))
fprintf(1, `Program has look-forward-bias!\n’); fprintf(1, '程序存在前瞻性偏差!\n');
end
Save this new program into file “example3_6_1.m” and run it. You will find that the sentence “Program has look-forward-bias” is not printed outthis indicates that our algorithm passed our test. 将此新程序保存到文件“example3_6_1.m”并运行它。你会发现“Program has look-forward-bias”这句话没有被打印出来——这表明我们的算法通过了测试。
Using Python 使用 Python
You can download the Python code in the Jupyter notebook example3_6.ipynb. 你可以下载 Jupyter 笔记本中的 Python 代码,文件名为 example3_6.ipynb。
Pair Trading of GLD and GDX
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
df1=pd.read_excel('GLD.xls')
df2=pd.read_excel('GDX.xls')
df=pd.merge(df1, df2, on='Date', suffixes=('_GLD', '_
GDX'))
df.set_index('Date', inplace=True)
df.sort_index(inplace=True)
trainset=np.arange(0, 252)
testset=np.arange(trainset.shape[0], df.shape[0])
Determine hedge ratio on trainset
model=sm.OLS(df.loc[:, 'Adj Close_GLD'].iloc[trainset],
df.loc[:, 'Adj Close_GDX'].iloc[trainset])
results=model.fit()
hedgeRatio=results.params
hedgeRatio
Adj Close_GDX 1.631009
dtype: float64
spread=GLD - hedgeRatio*GDX
spread=df.loc[:, 'Adj Close_GLD']-hedgeRatio[0]*df.
loc[:, 'Adj Close_GDX']
plt.plot(spread.iloc[trainset])
plt.plot(spread.iloc[testset])
spreadMean=np.mean(spread.iloc[trainset])
spreadMean
0.05219623850035999
spreadStd=np.std(spread.iloc[trainset])
spreadStd
1.944860873496509
df['zscore']=(spread-spreadMean)/spreadStd
df['positions_GLD_Long']=0
df['positions_GDX_Long']=0
df['positions_GLD_Short']=0
df['positions_GDX_Short']=0
df.loc[df.zscore>=2, ('positions_GLD_Short', 'positions_
GDX_Short')]=[-1, 1] # Short spread
df.loc[df.zscore<=-2, ('positions_GLD_Long', 'positions_
GDX_Long')]=[1, -1] # Buy spread
df.loc[df.zscore<=1, ('positions_GLD_Short', 'positions_
GDX_Short')]=0 # Exit short spread
df.loc[df.zscore>=-1, ('positions_GLD_Long', 'positions_
GDX_Long')]=0 # Exit long spread
df.fillna(method='ffill', inplace=True) # ensure exist-
ing positions are carried forward unless there is an
exit signal
positions_Long=df.loc[:, ('positions_GLD_Long', 'posi-
tions_GDX_Long')]
positions_Short=df.loc[:, ('positions_GLD_Short', 'posi-
tions_GDX_Short')]
positions=np.array(positions_Long)+np.array(positions_
Short)
positions=pd.DataFrame(positions)
dailyret=df.loc[:, ('Adj Close_GLD', 'Adj Close_GDX')].
pct_change()
pnl=(np.array(positions.shift())*np.array(dailyret)).sum
(axis=1)
sharpeTrainset=np.sqrt(252)*np.mean(pnl[trainset[1:]])/
np.std(pnl[trainset[1:]])
sharpeTrainset
1.9182982282569077
sharpeTestset=np.sqrt(252)*np.mean(pnl[testset])/
np.std(pnl[testset])
sharpeTestset
1.494313761833427
plt.plot(np.cumsum(pnl[testset]))
positions.to_pickle('example3_6_positions')
Using R 使用 R
You can download the R code as example3_6.R. 你可以下载名为 example3_6.R 的 R 代码。
library('zoo')
source('calculateReturns.R')
source('calculateMaxDD.R')
source('backshift.R')
data1 <- read.delim("GLD.txt") # Tab-delimited
data_sort1 <- data1[order(as.Date(datal[,1],
'%m/%d/%Y')),] # sort in ascending order of dates (1st
column of data)
tday1 <- as.integer(format(as.Date(data_sort1[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls1 <- data_sort1[,ncol(data_sort1)]
data2 <- read.delim("GDX.txt") # Tab-delimited
data_sort2 <- data2[order(as.Date(data2[,1],
'%m/%d/%Y')),] # sort in ascending order of dates (1st
column of data)
tday2 <- as.integer(format(as.Date(data_sort2[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls2 <- data_sort2[,ncol(data_sort2)]
# find the intersection of the two data sets
tday <- intersect(tday1, tday2)
adjcls1 <- adjcls1[tday1 %in% tday]
adjcls2 <- adjcls2[tday2 %in% tday]
# define indices for training and test sets
trainset <- 1:252
testset <- length(trainset) +1:length(tday)
# determines the hedge ratio on the trainset
result <- lm(adjcls1 ~ 0 + adjcls2, subset=trainset )
hedgeRatio <- coef(result) # 1.631
spread <- adjcls1-hedgeRatio*adjcls2 # spread = GLD -
hedgeRatio*GDX
plot(spread)
dev.new()
plot(spread[trainset])
dev.new()
plot(spread[testset])
# mean of spread on trainset
spreadMean <- mean(spread[trainset]) # 0.05219624
# standard deviation of spread on trainset
spreadStd <- sd(spread[trainset]) # 1.948731
zscore <- (spread-spreadMean)/spreadStd
longs <- zscore <= -2 # buy spread when its value drops
below 2 standard deviations.
shorts <- zscore >= 2 # short spread when its value
rises above 2 standard deviations.
# exit any spread position when its value is within 1
standard deviation of its mean.
longExits <- zscore >= -1
shortExits <- zscore <= 1
posL <- matrix(NaN, length(tday), 2) # long positions
posS <- matrix(NaN, length(tday), 2) # short positions
# initialize to 0
posL[1,] <- 0
posS[1,] <- 0
posL[longs, 1] <- 1
posL[longs, 2] <- -1
posS[shorts, 1] <- -1
posS[shorts, 2] <- 1
posL[longExits, 1] <- 0
posL[longExits, 2] <- 0
posS[shortExits, 1] <- 0
posS[shortExits, 2] <- 0
# ensure existing positions are carried forward unless
there is an exit signal
posL <- zoo::na.locf(posL)
posS <- zoo::na.locf(posS)
positions <- posL + posS
cl <- cbind(adjcls1, adjcls2) # last row is [385,]
77.32 46.36
# daily returns of price series
dailyret <- calculateReturns(cl, 1) # last row is
[385,] -0.0122636689-0.0140365802
pnl <- rowSums(backshift(1, positions)*dailyret)
sharpeRatioTrainset <- sqrt(252)*mean(pnl[trainset],
na.rm = TRUE)/sd(pnl[trainset], na.rm = TRUE)
sharpeRatioTrainset # 2.327844
sharpeRatioTestset <- sqrt(252)*mean(pnl[testset], na.rm
= TRUE)/sd(pnl[testset], na.rm = TRUE)
sharpeRatioTestset # 1.508212
This codes makes use of the function backshift, which
you can download as backshift.R.
backshift <- function(mylag, x) {
rbind(matrix(NaN, mylag, ncol(x)),
as.matrix(x[1:(nrow(x)-mylag),]))
}
Sensitivity Analysis Once you have optimized your parameters as well as various features of your model and have verified that its performance on a test set is still reasonable, vary these parameters or make some small qualitative changes in the features of the model and see how the performance changes on both the training and the test sets. If the drop is so drastic that any parameter set other than the optimal one is unacceptable, the model most likely suffers from data-snooping bias. 敏感性分析 一旦你优化了参数以及模型的各种特征,并且验证了模型在测试集上的表现仍然合理,就可以改变这些参数或对模型的特征做一些小的定性修改,观察训练集和测试集上的表现如何变化。如果表现下降得如此剧烈,以至于除了最优参数集之外的任何参数集都不可接受,那么模型很可能存在数据窥探偏差。
There are some variations on your model that are particularly important to try: the various ways to simplify the model. Do you really need, say, five different conditions to determine whether to make that trade? What if you eliminate the conditions one by oneat what point does the performance on the training set deteriorate to an unacceptable level? And more important: Is there a corresponding decrease in performance on the test set as you eliminate the conditions? In general, you should eliminate as many conditions, 有一些模型的变体特别值得尝试:各种简化模型的方法。你真的需要,比如说,五个不同的条件来决定是否进行交易吗?如果你逐一去除这些条件,训练集上的表现在哪个点开始恶化到不可接受的水平?更重要的是:随着你去除条件,测试集上的表现是否也相应下降?一般来说,你应该尽可能去除更多的条件,
constraints, and parameters as possible as long as there is no significant decrease in performance in the test set, even though it may decrease performance on the training set. (But you should not add conditions and parameters, or adjust the parameter values, so as to improve performance on the test set: If you do, you have effectively used the test set as your training set and possibly reintroduced datasnooping bias to your model.) 只要在测试集上的表现没有显著下降,就应尽可能减少约束和参数,尽管这可能会降低训练集上的表现。(但你不应添加条件和参数,或调整参数值以提高测试集上的表现:如果这样做,实际上就是将测试集当作训练集使用,可能会重新引入数据窥探偏差到你的模型中。)
When one has reduced the set of parameters and conditions that trigger a trade to the minimum, and after one has ascertained that small variations in these parameters and conditions do not drastically alter the out-of-sample performance, one should consider dividing the trading capital across the different parameter values and sets of conditions. This averaging over parameters will further help ensure that the actual trading performance of the model will not deviate too much from the backtest result. 当你将触发交易的参数和条件集减少到最小,并且确认这些参数和条件的小幅变化不会大幅改变样本外表现时,应考虑将交易资金分配到不同的参数值和条件集上。这种对参数的平均处理将进一步帮助确保模型的实际交易表现不会与回测结果偏离过大。
TRANSACTION COSTS 交易成本
No backtest performance is realistic without incorporating transaction costs. I discussed the various types of transactions costs (commission, liquidity cost, opportunity cost, market impact, and slippage) in Chapter 7 and have given examples of how to incorporate transaction costs into the backtest of a strategy. It should not surprise you to find that a strategy with a high Sharpe ratio before adding transaction costs can become very unprofitable after adding such costs. I will illustrate this in Example 3.7. 没有将交易成本纳入考虑的回测表现是不现实的。我在第 7 章讨论了各种类型的交易成本(佣金、流动性成本、机会成本、市场冲击和滑点),并举例说明了如何将交易成本纳入策略的回测中。你不应该感到惊讶的是,一个在加入交易成本之前夏普比率很高的策略,在加入这些成本后可能变得非常无利可图。我将在示例 3.7 中对此进行说明。
Example 3.7: A Simple Mean-Reverting Model with and without Transaction Costs 示例 3.7:有无交易成本的简单均值回归模型
Here is a simple mean-reverting model that is attributable to Amir Khandani and Andrew Lo at MIT (available at web.mit.edu/alo/www/Papers/august07. pdf). This strategy is very simple: Buy the stocks with the worst previous one-day returns, and short the ones with the best previous one-day returns. Despite its utter simplicity, this strategy has had great performance since 1995, ignoring transaction costs (it has a Sharpe ratio of 4.47 in 2006). 这里有一个简单的均值回归模型,归功于 MIT 的 Amir Khandani 和 Andrew Lo(可在 web.mit.edu/alo/www/Papers/august07.pdf 获取)。该策略非常简单:买入前一日表现最差的股票,卖空前一日表现最好的股票。尽管极其简单,该策略自 1995 年以来表现优异,忽略交易成本时(2006 年夏普比率为 4.47)。
Our objective here is to find out what would happen to its performance in 2006 if we assume a standard 5-basis-point-per-trade transaction cost. (A trade is defined as a buy or a short, not a round-trip transaction.) This example strategy not only allows us to illustrate the impact of transaction costs, it also illustrates the power of MATLAB, Python, or R in backtesting a model that trades multiple securities-in other words, a typical statistical arbitrage model. Backtesting a model with a large number of symbols over multiple years is often too cumbersome to perform in Excel. But even assuming that you have MATLAB, Python, or R at your disposal, there is still the question of how to retrieve historical data for hundreds of symbols, especially survivorship-bias-free data. Here, we will put aside the question of survivorship bias because of the expensive nature of such data and just bear in mind that whatever performance estimates we obtained are upper bounds on the actual performance of the strategy. 我们的目标是了解如果假设每笔交易的交易成本为标准的 5 个基点,2006 年的表现会如何。(一次交易定义为买入或卖空,而非往返交易。)这个示例策略不仅让我们能够说明交易成本的影响,还展示了 MATLAB、Python 或 R 在回测交易多只证券的模型中的强大功能——换句话说,就是典型的统计套利模型。在 Excel 中对大量标的进行多年回测通常过于繁琐。但即使假设你拥有 MATLAB、Python 或 R,如何获取数百只标的的历史数据,尤其是无存活偏差的数据,仍然是一个问题。在这里,我们将暂时搁置存活偏差的问题,因为这类数据成本较高,只需记住我们得到的任何表现估计都是该策略实际表现的上限。
Whenever one wants to backtest a stock selection strategy, the first question is always: Which universe of stocks? The typical starting point is the S&P 500 stock universe, which is the most liquid set of stocks available. The current list of stocks in the S&P 500 is available for download at the Standard & Poor’s website (www.standardandpoors.com). Since the constituents of this universe change constantly, the list that you download will be different from mine. For ease of comparison, you can find my list saved as epchan.com/book/SP500_20071121.xls. The easiest way to download historical data for all these stocks is to modify example3_1.m, example3_1.py, or example3_1.R, and save only the Date and Close fields for each stock symbol, into a file SPX_20071123.txt (downloadable from epchan.com/book). 每当有人想要回测一个股票选股策略时,第一个问题总是:选择哪个股票池?典型的起点是标普 500 股票池,这是流动性最好的股票集合。当前标普 500 的股票名单可以在标准普尔网站(www.standardandpoors.com)下载。由于该股票池的成分股不断变化,你下载的名单会和我不同。为了便于比较,你可以在 epchan.com/book/SP500_20071121.xls 找到我保存的名单。下载所有这些股票的历史数据最简单的方法是修改 example3_1.m、example3_1.py 或 example3_1.R,只保存每个股票代码的日期和收盘价字段,存入文件 SPX_20071123.txt(可从 epchan.com/book 下载)。
Next, we can use this historical data set to backtest the mean-reverting strategy without transaction cost: 接下来,我们可以使用这组历史数据来回测无交易成本的均值回归策略:
Using MATLAB 使用 MATLAB
clear;
startDate=20060101;
endDate=20061231;
T=readtable('SPX_20071123.txt');
tday=T{:, l};
cl=T{:, 2:end};
% daily returns
dailyret=(cl-lag1(cl))./lag1(cl);
% equal weighted market index return
marketDailyret=smartmean(dailyret, 2);
% weight of a stock is proportional to the negative
% distance to the market index.
weights=...
-(dailyret-repmat(marketDailyret,[1
size(dailyret,2)]))./ repmat(smartsum(isfinite(cl),
2), ...
[1 size(dailyret, 2)]);
% those stocks that do not have valid prices or
% daily returns are excluded.
weights(~isfinite(cl) | ~isfinite(lag1(cl)))=0;
dailypnl=smartsum(lag1(weights).*dailyret, 2);
% remove pnl outside of our dates of interest
dailypnl(tday < startDate | tday > endDate) = [];
% Sharpe ratio should be about 0.25
sharpe=...
sqrt(252)*smartmean(dailypnl, 1)/smartstd(dailypnl, 1)
This file was saved as epchan.com/book/example3_7.m on my website. Notice that the Sharpe ratio in 2006 is only 0.25 , not 4.47 as stated by the original authors. The reason for this drastically lower performance is due to the use of the large market capitalization universe of S\&P 500 in our backtest. If you read the original paper by the authors, you will find that most of the returns are generated by small and microcap stocks.
In this MATLAB program, I have used three new functions: "smartsum," "smartmean," and "smartstd." They are very similar to the usual "sum," "mean," and "std" functions, except they skip all the NaN entries in the data. These functions are very useful in backtesting because a price series for stocks often starts and stops. These files are all available at epchan.com/book.
function y = smartsum(x, dim)
%y = smartsum(x, dim)
%Sum along dimension dim, ignoring NaN.
hasData=isfinite(x);
x(~hasData)=0;
y=sum(x,dim);
y(all(~hasData, dim))=NaN;
"martmean.m"
function y = smartmean(x, dim)
% y = smartmean(x, dim)
% Mean value along dimension dim, ignoring NaN.
hasData=isfinite(x);
x(~hasData)=0;
y=sum(x,dim)./sum(hasData, dim);
y(all(~hasData, dim))=NaN; % set y to NaN if all entries
are NaNs.
"smartstd.m"
function y = smartstd(x, dim)
%y = smartstd(x, dim)
% std along dimension dim, ignoring NaN and Inf
Now, continuing with our backtest, let’s see what happens if we deduct a 5-basis-point transaction cost for every trade. 现在,继续我们的回测,看看如果每笔交易扣除 5 个基点的交易成本,会发生什么。
% daily pnl with transaction costs deducted
onewaytcost=0.0005; % assume 5 basis points
% remove weights outside of our dates of interest
weights(tday < startDate | tday > endDate, :) = [];
% transaction costs are only incurred when
% the weights change
dailypnlminustcost=...
dailypnl - smartsum(abs(weights-lag1(weights)), 2).*
onewaytcost;
% Sharpe ratio should be about -3.19
sharpeminustcost=...
sqrt (252)*smartmean(dailypnlminustcost, 1)/...
smartstd(dailypnlminustcost, 1)
The strategy is now very unprofitable! 该策略现在非常不盈利!
Using Python 使用 Python
This file was saved as epchan.com/book/example3_7.ipynb 该文件保存为 epchan.com/book/example3_7.ipynb
Simple Mean-Reverting Model with and without Transaction Costs 带有和不带交易成本的简单均值回归模型
import numpy as np
import pandas as pd
startDate=20060101
endDate=20061231
df=pd.read_table(‘SPX_20071123.txt’) df=pd.read_table('SPX_20071123.txt')
df[‘Date’]=df[‘Date’].astype(‘int’) df['Date']=df['Date'].astype('int')
df.set_index(‘Date’, inplace=True) df.set_index('Date', inplace=True)
df.sort_index(inplace=True)
dailyret=df.pct_change()
marketDailyret=dailyret.mean(axis=1)
weights=- (np.array(dailyret) -np.array (marketDailyret) . weights=- (np.array(dailyret) -np.array(marketDailyret).
reshape((dailyret.shape[0], 1)))
wtsum=np.nansum(abs(weights), axis=1)
weights [wtsum0,]=0
wtsum [wtsum0]=1
weights=weights/wtsum.reshape((dailyret.shape[0],1))
dailypnl=np.nansum(np.array(pd.DataFrame(weights).
shift()) *np.array(dailyret), axis=1)
If a strategy does not deliver superb backtest performance on first trial, there are some common ways to improve it. How to refine a strategy without introducing data-snooping bias and to remain simple with few parameters is more of an art than a science. The guiding principle is the same as that of parameter optimization: Whatever changes you make to the strategy to improve its performance on the training set, it must also improve the performance on the test set. 如果一个策略在首次测试中没有表现出卓越的回测性能,有一些常见的方法可以用来改进它。如何在不引入数据窥探偏差的情况下优化策略,并保持参数少且简单,这更多是一门艺术而非科学。指导原则与参数优化相同:无论你对策略做出什么改动以提升训练集上的表现,这些改动也必须提升测试集上的表现。
Often, there are some very simple strategies that are fairly well known in traders’ circles and are still somewhat profitable, though their returns seem to be diminishing. An example is the pair trading of stocks. The reason they are diminishing in returns is that too many traders are taking advantage of this arbitrage opportunity and gradually erasing the profit margin. However, it is often possible to introduce minor variations in the basic strategy, which will boost its returns. 通常,有一些在交易者圈子里相当知名且仍然有一定盈利能力的简单策略,尽管它们的收益似乎在逐渐减少。一个例子是股票的配对交易。收益减少的原因是太多交易者利用这一套利机会,逐渐抹去了利润空间。然而,通常可以在基本策略中引入一些小的变体,从而提升其收益。
These minor variations are often far less well known than the basic strategy, and therefore far less well exploited by traders. Sometimes they involve excluding certain stocks or groups of stocks from the universe. For example, traders often prefer to exclude pharmaceutical stocks from their technical trading program because of the dramatic impact of news on their prices, or else they may exclude stocks that have pending merger or acquisition deals. Other traders change the entry and exit timing or frequency of the trades. Yet another variation concerns the selection of the stock universe: We saw in Example 3.7 that a strategy that has a very good Sharpe ratio when it is applied to small-cap stocks becomes very unprofitable when applied to large-cap stocks. 这些细微的变体通常远不如基本策略为人所知,因此也远未被交易者充分利用。有时它们涉及将某些股票或股票群体排除在交易范围之外。例如,交易者常常倾向于将制药股排除在其技术交易程序之外,因为新闻对其价格的剧烈影响,或者他们可能排除那些有待完成的合并或收购交易的股票。还有些交易者会改变交易的进出时机或频率。另一种变体涉及股票范围的选择:我们在示例 3.7 中看到,当策略应用于小盘股时,夏普比率非常好,但当应用于大盘股时则变得非常不盈利。
When introducing these refinements to your strategy, it is preferable that the refinement has some basis in fundamental economics or a well-studied market phenomenon, rather than some arbitrary rule based on trial and error. Otherwise, data-snooping bias looms. 在对策略进行这些改进时,最好是基于某种基本经济学原理或经过充分研究的市场现象,而不是基于试错法的任意规则。否则,数据挖掘偏差将会出现。
Example 3.8: A Small Variation on an Existing Strategy 示例 3.8:对现有策略的小幅变动
Let’s refine the mean-reverting strategy described in Example 3.7. Recall that strategy has a mediocre Sharpe ratio of 0.25 and a very unprofitable Sharpe ratio of -3.19 after transaction costs in 2006 . The only change we will make here is to update the positions at the market open instead of the close. In the MATLAB code, simply replace “cl” with “op” everywhere. You can download the R code as example3_8.R and Python code as example3_8.ipynb. 让我们完善一下示例 3.7 中描述的均值回归策略。回想一下,该策略的夏普比率为 0.25,表现一般,而在 2006 年扣除交易成本后的夏普比率为-3.19,非常不盈利。这里我们唯一的改动是将持仓更新时点从收盘改为开盘。在 MATLAB 代码中,只需将所有“cl”替换为“op”。你可以下载 R 代码,文件名为 example3_8.R,Python 代码为 example3_8.ipynb。
Lo and behold, the Sharpe ratio before-and-after costs are both very positive! I will leave it as an exercise for the reader to improve the Sharpe ratio further by testing the strategy on the S&P 400 mid-cap and S&P 600 small-cap universes. 瞧,扣除成本前后的夏普比率都非常正面!我将留给读者作为练习,尝试通过在标普 400 中型股和标普 600 小型股市场中测试该策略,进一步提升夏普比率。
SUMMARY 总结
Backtesting is about conducting a realistic historical simulation of the performance of a strategy. The hope is that the future performance of the strategy will resemble its past performance, though as your investment manager will never tire of telling you, this is by no means guaranteed! 回测就是对策略的历史表现进行现实的模拟。希望策略未来的表现能与过去相似,尽管正如你的投资经理永远不会厌倦告诉你的那样,这绝非保证!
There are many nuts and bolts involved in creating a realistic historical backtest and in reducing the divergence of the future performance of the strategy from its backtest performance. Issues discussed here include: 创建一个现实的历史回测并减少策略未来表现与回测表现差异涉及许多细节。这里讨论的问题包括:
Data: Split/dividend adjustments, noise in daily high/low, and survivorship bias. 数据:拆分/分红调整、每日最高/最低价的噪声,以及存活者偏差。
Performance measurement: Annualized Sharpe ratio and maximum drawdown. 绩效衡量:年化夏普比率和最大回撤。
Look-ahead bias: Using unobtainable future information for past trading decisions. 前瞻性偏差:在过去的交易决策中使用无法获得的未来信息。
Data-snooping bias: Using too many parameters to fit historical data, tweaking a strategy too many times in backtest, and avoiding it using a large enough sample, out-of-sample testing, and sensitivity analysis. 数据窥探偏差:使用过多参数拟合历史数据,在回测中反复调整策略,通过使用足够大的样本、样本外测试和敏感性分析来避免。
Transaction cost: Impact of transaction costs on performance. 交易成本:交易成本对绩效的影响。
Strategy refinement: Common ways to make small variations on the strategy to optimize performance. 策略优化:通过对策略进行小幅调整以优化绩效的常见方法。
After going through this chapter and working through some of the examples and exercises, you should have gained some hands-on experience in how to retrieve historical data and backtest a strategy with Excel, MATLAB, Python, or R. 通过学习本章内容并完成一些示例和练习,你应该已经获得了如何使用 Excel、MATLAB、Python 或 R 检索历史数据并回测策略的实际操作经验。
When one starts testing a strategy, it may not be possible to avoid all these pitfalls due to constraints of time and other resources. In this case, it is okay to skip a few precautions to achieve a quick sense of whether the strategy has potential and is worthy of closer examination. Sometimes, even the most thorough and careful backtest cannot reveal problems that would be obvious after a few months of paper or real trading. One can always revisit each of these issues after the model has gone live. 在开始测试策略时,由于时间和其他资源的限制,可能无法避免所有这些陷阱。在这种情况下,跳过一些预防措施以快速判断策略是否有潜力并值得进一步深入研究是可以接受的。有时,即使是最彻底和细致的回测,也无法发现几个月的模拟或实盘交易后才会显现的问题。模型上线后,可以随时重新审视这些问题。
Once you have backtested a strategy with reasonable performance, you are now ready to take the next step in setting up your trading business. 一旦你对一个策略进行了合理表现的回测,你就可以准备迈出建立交易业务的下一步。
REFERENCES 参考文献
Bailey, David, and Marcos López de Prado. 2012. “The Sharpe Ratio Efficient Frontier.” Journal of Risk 15 (2 Winter 2012/13). https://papers.ssrn. com/sol3/papers.cfm?abstract_id=1821643. Bailey, David, 和 Marcos López de Prado. 2012. “夏普比率有效前沿。”《风险杂志》15 (2 冬季 2012/13). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1821643.
Bailey, David, J. Borwein, Marcos López de Prado, and J. Zhu. 2014. “Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance.” Notices of the Bailey, David, J. Borwein, Marcos López de Prado, 和 J. Zhu. 2014. “伪数学与金融骗子行为:回测过拟合对样本外表现的影响。”《通知》
American Mathematical Society 61 (5 May): 458-471. https://ssrn.com/ abstract=2308659. 美国数学学会 61 (5 月 5 日): 458-471. https://ssrn.com/abstract=2308659.
Chan, Ernest. 2006. “Reader Suggests a Possible Trading Strategy with the GLD-GDX Spread.” Quantitative Trading, November 17. http://epchan. blogspot.com/2006/11/reader-suggested-possible-trading.html. 陈恩荣. 2006. “读者建议一种可能的 GLD-GDX 价差交易策略。”《量化交易》,11 月 17 日. http://epchan.blogspot.com/2006/11/reader-suggested-possible-trading.html.
Chan, Ernest. 2017. “Paradox Resolved: Why Risk Decreased Expected Log Return but Not Expected Wealth.” Quantitative Trading, May 4. http://epchan.blogspot.com/2017/05/paradox-resolved-why-riskdecreases.html. 陈恩荣. 2017. “悖论解决:为何风险降低了期望对数收益但未降低期望财富。”《量化交易》,5 月 4 日. http://epchan.blogspot.com/2017/05/paradox-resolved-why-riskdecreases.html.
Sharpe, William. 1994. “The Sharpe Ratio.” Journal of Portfolio Management, Fall. https://jpm.pm-research.com/content/21/1/49. 夏普,威廉. 1994. “夏普比率。”《投资组合管理杂志》,秋季. https://jpm.pm-research.com/content/21/1/49.
CIIAPTER 4 第四章
Setting Up Your Business 建立你的业务
In this chapter, we will be taking a break from the technical aspect of trading to focus on the business side of it. Assuming that your goal is to remain an independent trader and not work for a money management institution, the choice of the business structure for trading is important. The main choice you have to make is whether to open a retail brokerage account or to join a proprietary trading firm. The next step is to determine what features of the brokerage or trading firm are important to you. Finally, you have to decide what kind of physical trading infrastructure you need in order to execute your quantitative strategy. 在本章中,我们将暂时放下交易的技术层面,转而关注其商业方面。假设你的目标是保持独立交易者身份,而不是为资金管理机构工作,那么选择合适的交易业务结构就显得尤为重要。你需要做出的主要选择是开设零售经纪账户还是加入专有交易公司。接下来,你需要确定经纪公司或交易公司哪些特性对你来说最重要。最后,你还必须决定为了执行你的量化策略,需要什么样的实体交易基础设施。
BUSINESS STRUCTURE: RETAIL OR PROPRIETARY? 业务结构:零售还是专有?
As a trader, you can choose to be completely independent or semiindependent. To be completely independent, you can simply open a retail brokerage account, deposit some cash, and start trading. No one will question your strategy, and no one will guide you in your trading. Furthermore, your leverage is limited by Securities and Exchange Commission (SEC) Regulation T-roughly two times 作为交易者,你可以选择完全独立或半独立。要实现完全独立,你只需开设一个零售经纪账户,存入一些现金,然后开始交易。不会有人质疑你的策略,也不会有人指导你的交易。此外,你的杠杆受美国证券交易委员会(SEC)T 条例的限制——如果持有隔夜头寸,杠杆大约是你权益的两倍。
your equity if you hold overnight positions. Naturally, all the profits and losses will accrue to you. 自然,所有的利润和亏损都将归你所有。
However, you can choose to join what is called a proprietary trading firm such as Bright Trading and become a member of their firm. In order to become a member of such firms, you have to pass the FINRA Series 7 examination to qualify as a registered representative of a brokerage. You will still need to invest your own capital to start an account, but you will obtain much higher leverage (or buying power) than is available through a retail account. Depending on how much capital you invested, you may get to keep all your profits, or some percentage of them. In terms of liability, however, your loss is limited to your initial investment. (Actually, liability is also limited if you form an SS corporation or limited liability company (LLC) and open an account through this entity at a retail brokerage. More on this in Box 4.1.) Often, you can also receive training from the firm, perhaps at an extra cost. You will also be subject to the various rules and regulations that the proprietary trading firm chooses to impose on its members, in addition to rules imposed by the SEC or FINRA. 然而,你可以选择加入所谓的专有交易公司,如 Bright Trading,成为该公司的成员。为了成为这类公司的成员,你必须通过 FINRA 系列 7 考试,获得作为经纪公司的注册代表的资格。你仍然需要投入自己的资金来开设账户,但你将获得比零售账户更高的杠杆(或购买力)。根据你投入的资金多少,你可能可以保留全部利润,或者保留其中的一部分比例。在责任方面,你的损失仅限于你的初始投资。(实际上,如果你成立一家 SS 公司或有限责任公司(LLC),并通过该实体在零售经纪公司开设账户,责任也会受到限制。更多内容见 4.1 框。)通常,你还可以从公司获得培训,可能需要额外付费。你还将受到专有交易公司对其成员施加的各种规则和规定的约束,此外还要遵守 SEC 或 FINRA 施加的规则。
BOX 4.1 SHOULD YOU INCORPORATE BEFORE YOU TRADE? 框 4.1 你应该在交易前成立公司吗?
If you open a brokerage account as an individual, your liability is not limited to what you deposit in that account. Retail brokerages are known to ask their customers to “make whole” the negative equity in their account when a highly levered position loses more than 100 percent of the margin deposit. This happened in the aftermath of the Swiss Franc unpegging in January 2015. Do not be surprised to get a stern legal letter requesting a large payment from your broker when that happens, and perhaps endless phone calls from a debt collector if you don’t pay up. 如果你以个人身份开设经纪账户,你的责任并不限于你在该账户中存入的资金。零售经纪公司通常会要求客户在高杠杆头寸亏损超过保证金存款的 100%时,“补足”账户中的负权益。这种情况曾在 2015 年 1 月瑞士法郎脱钩事件后发生过。当这种情况发生时,不要惊讶于收到经纪公司发来的严厉法律函件,要求你支付一大笔款项;如果你不付款,可能还会接到催债人的无休止电话。
To avoid this personal liability, I would recommend you set up a corporation as your trading vehicle and open an account through this entity at a retail brokerage. Your loss will then be limited to your initial investment, though you may have to file corporate bankruptcy when things go wrong. In the United States, a limited liability company or S-Corp. may 为了避免这种个人责任,我建议你设立一个公司作为你的交易实体,并通过该实体在零售经纪公司开设账户。这样你的损失将仅限于你的初始投资,尽管当情况不妙时,你可能需要申请公司破产。在美国,有限责任公司或 S 型公司可能会
work best, as tax gains or losses can be passed directly to you personally. Even if you are not a US resident, you can still easily incorporate a US LLC for trading in a US brokerage account. I used bizfilings.com for that, but there are many other similar services, such as Stripe Atlas. No legal help is required if you are the sole shareholder. Of course, you will still be subject to the various rules and regulations imposed by the SEC, such as Regulation T. 效果最佳,因为税收收益或损失可以直接转嫁给您个人。即使您不是美国居民,仍然可以轻松地为在美国经纪账户中的交易成立美国有限责任公司(LLC)。我使用了 bizfilings.com,但还有许多类似的服务,比如 Stripe Atlas。如果您是唯一股东,则不需要法律帮助。当然,您仍需遵守美国证券交易委员会(SEC)施加的各种规则和法规,例如 T 规则。
I made them sound like bad things when I spoke of rules and regulations imposed by proprietary trading firms. But, actually, some of these rules (such as the prohibition from trading penny stocks or the prohibition from carrying short positions overnight) are actually risk management measures for your own protection. Often, when the going is good, traders will bemoan these constraints limiting their flexibility and profitability. They may even decide to start their own retail trading accounts and start trading on their own. However, when they suffer the (almost inevitable) big drawdown, they will wish that someone were there to restrain their risk appetites and come to regret this unfettered freedom. (The teenager in us has never left, after all.) 当我谈到专有交易公司施加的规则和规定时,我把它们说得像是坏事。但实际上,其中一些规则(比如禁止交易便士股票或禁止隔夜持有空头头寸)实际上是为了保护你自身的风险管理措施。通常,当行情好的时候,交易者会抱怨这些限制限制了他们的灵活性和盈利能力。他们甚至可能决定开设自己的零售交易账户,开始自己交易。然而,当他们遭遇(几乎不可避免的)大幅回撤时,他们会希望有人在那里约束他们的风险偏好,并对这种不受限制的自由感到后悔。(毕竟,我们内心的那个青少年从未离开过。)
The decision whether to go retail or to join a proprietary trading firm is generally based on your need of capital, the style of your strategy, and your skill level. For example, if you run a low-risk, marketneutral strategy that nevertheless requires a much higher leverage than allowed by Regulation T in order to generate good returns, a proprietary account may be better for you. However, if you engage in high-frequency futures trading that does not require too much capital, a retail account may save you a lot of costs and hassles. Similarly, a very experienced trader with strong risk management practices and emotional stability probably doesn’t need the guidance given by a proprietary firm, but less experienced traders may benefit from the imposed restraints. 选择是做散户交易还是加入专有交易公司,通常取决于你的资金需求、策略风格和技能水平。例如,如果你运行的是低风险、市场中性策略,但为了获得良好回报需要远高于《T 条例》允许的杠杆比例,那么专有账户可能更适合你。然而,如果你从事不需要太多资金的高频期货交易,散户账户可能会为你节省大量成本和麻烦。同样,一位经验丰富、具备良好风险管理和情绪稳定性的交易者可能不需要专有公司的指导,但经验较少的交易者可能会从这些强制性限制中受益。
There is another consideration that applies to those of you who have discovered some unique, highly profitable strategies. In this situation, you may prefer to open a retail trading account, because if you trade through a proprietary account, your proprietary trading firm is going to find out about your highly profitable strategy and may “piggyback” on your strategy with a lot of its own capital. In this case, your strategy will suffer more market impact trading cost as time goes on. 还有一个考虑因素适用于那些发现了一些独特且高利润策略的人。在这种情况下,你可能更愿意开设一个零售交易账户,因为如果你通过专有账户进行交易,你的专有交易公司会发现你的高利润策略,并可能用大量自有资金“搭便车”跟随你的策略。这样一来,随着时间推移,你的策略将面临更大的市场冲击交易成本。
Table 4.1 summarizes the pros and cons for each choice. 表 4.1 总结了每种选择的优缺点。
TABLE 4.1 Retail versus Proprietary Trading 表 4.1 零售交易与专有交易的比较
Issue 问题
Retail Trading 零售交易
Proprietary Trading 自营交易
Legal requirement to open account 开户的法律要求
None. 无。
Need to pass FINRA Series 7 examination and satisfy other FINRA-imposed restrictions. 需要通过 FINRA 系列 7 考试并满足其他 FINRA 施加的限制。
Initial capital requirement 初始资本要求
Substantial. 大量。
Small. 少量。
Available leverage or buying power 可用杠杆或购买力
Determined by SEC Regulation T. Generally 2x2 x leverage for overnight positions, and 4 x for intraday positions. 由美国证券交易委员会(SEC)第 T 号条例决定。通常隔夜持仓杠杆为 2x2 x ,日内持仓杠杆为 4 倍。
Based on firm's discretion. Can be as high as 20x or more for intraday or hedged positions. 由公司自行决定。对于日内或对冲持仓,杠杆可高达 20 倍或更高。
Liability to losses 亏损责任
Unlimited, unless account is opened through an S corporation or LLC. 无限制,除非账户是通过 S 型公司或有限责任公司开设的。
Limited to initial investment. 限制在初始投资额内。
Commissions and fees 佣金和费用
Low commissions (perhaps less than 0.5 cent a share) and minimal monthly fees for data. 低佣金(可能低于每股 0.5 美分)和极少的数据月费。
Higher commissions and significant monthly fees. 更高的佣金和显著的月费。
Bankruptcy risk of brokerage 经纪公司的破产风险
No risk. Account insured by Securities Investor Protection Corporation (SIPC). 无风险。账户由证券投资者保护公司(SIPC)保险保障。
Has risk. Account is not insured. 有风险。账户没有保险保障。
Training, mentoring, guidance 培训、指导、辅导
None. 无。
May provide such services, sometimes at a fee. 可能提供此类服务,有时需收费。
Disclosure of trade secrets 商业秘密披露
Little or no risk, especially if retail brokerage does not have proprietary trading unit. 几乎没有风险,特别是如果零售经纪公司没有自营交易部门。
Has risk. Managers can easily "piggyback" on profitable strategies. 有风险。管理者可以轻松“搭便车”利用盈利策略。
Restrictions on trading style 对交易风格有限制。
No restrictions, as long as it is allowed by SEC. 没有限制,只要美国证券交易委员会(SEC)允许。
May have restrictions, such as prohibitions on holding overnight short positions. 可能有一些限制,比如禁止持有隔夜空头头寸。
Risk management 风险管理
Mostly self-imposed. 大多是自我施加的。
More comprehensive and imposed by managers. 更全面,由管理者强制执行。
Issue Retail Trading Proprietary Trading
Legal requirement to open account None. Need to pass FINRA Series 7 examination and satisfy other FINRA-imposed restrictions.
Initial capital requirement Substantial. Small.
Available leverage or buying power Determined by SEC Regulation T. Generally 2x leverage for overnight positions, and 4 x for intraday positions. Based on firm's discretion. Can be as high as 20x or more for intraday or hedged positions.
Liability to losses Unlimited, unless account is opened through an S corporation or LLC. Limited to initial investment.
Commissions and fees Low commissions (perhaps less than 0.5 cent a share) and minimal monthly fees for data. Higher commissions and significant monthly fees.
Bankruptcy risk of brokerage No risk. Account insured by Securities Investor Protection Corporation (SIPC). Has risk. Account is not insured.
Training, mentoring, guidance None. May provide such services, sometimes at a fee.
Disclosure of trade secrets Little or no risk, especially if retail brokerage does not have proprietary trading unit. Has risk. Managers can easily "piggyback" on profitable strategies.
Restrictions on trading style No restrictions, as long as it is allowed by SEC. May have restrictions, such as prohibitions on holding overnight short positions.
Risk management Mostly self-imposed. More comprehensive and imposed by managers.| Issue | Retail Trading | Proprietary Trading |
| :--- | :--- | :--- |
| Legal requirement to open account | None. | Need to pass FINRA Series 7 examination and satisfy other FINRA-imposed restrictions. |
| Initial capital requirement | Substantial. | Small. |
| Available leverage or buying power | Determined by SEC Regulation T. Generally $2 x$ leverage for overnight positions, and 4 x for intraday positions. | Based on firm's discretion. Can be as high as 20x or more for intraday or hedged positions. |
| Liability to losses | Unlimited, unless account is opened through an S corporation or LLC. | Limited to initial investment. |
| Commissions and fees | Low commissions (perhaps less than 0.5 cent a share) and minimal monthly fees for data. | Higher commissions and significant monthly fees. |
| Bankruptcy risk of brokerage | No risk. Account insured by Securities Investor Protection Corporation (SIPC). | Has risk. Account is not insured. |
| Training, mentoring, guidance | None. | May provide such services, sometimes at a fee. |
| Disclosure of trade secrets | Little or no risk, especially if retail brokerage does not have proprietary trading unit. | Has risk. Managers can easily "piggyback" on profitable strategies. |
| Restrictions on trading style | No restrictions, as long as it is allowed by SEC. | May have restrictions, such as prohibitions on holding overnight short positions. |
| Risk management | Mostly self-imposed. | More comprehensive and imposed by managers. |
One final note: Some may think that there is a tax advantage in joining a proprietary trading firm because any trading loss can be deducted from current income instead of as capital loss. Actually, you can choose to apply for trader tax status even if you have a retail brokerage account so that your trading loss can offset other 最后一点说明:有些人可能认为加入专有交易公司有税收优势,因为任何交易亏损可以从当期收入中扣除,而不是作为资本损失。实际上,即使你拥有零售经纪账户,也可以选择申请交易者税务身份,这样你的交易亏损就可以抵消其他收入。
income, and not just other capital gain. For details on the tax considerations of a trading business, you can visit, for example, www .greencompany.com. 收入,而不仅仅是其他资本利得。有关交易业务税务考虑的详细信息,您可以访问例如 www.greencompany.com。
CHOOSING A BROKERAGE OR PROPRIETARY TRADIVG FIRM 选择经纪公司或专有交易公司
Many traders use only one criterion to choose their brokerage or a proprietary trading firm to join: the commission rate. This is clearly an important criterion because if a trading strategy has a small return, high commissions may render it unprofitable. However, there are other important considerations. 许多交易者选择经纪公司或专有交易公司时只考虑一个标准:佣金费率。这显然是一个重要的标准,因为如果交易策略的收益较小,高额的佣金可能会使其变得无利可图。然而,还有其他重要的考虑因素。
Commissions actually form only part of your total transaction costs, sometimes even a small part. The speed of execution of your brokerage as well as their access of the so-called dark-pool liquidity also figures into your transaction costs. Dark-pool liquidity is formed by institutional orders facilitated away from the exchanges, or they come from the crossing of internal brokerage customer orders. These orders are not displayed as bid-and-ask quotes. Some of the “alternative trading systems” that provide dark-pool liquidity are Liquidnet and ITG’s Posit. Your brokerage may use one or more of these providers, it may only use its internal crossing network, or it may use no alternative trading systems at all. 佣金实际上只是你总交易成本的一部分,有时甚至只是一小部分。你的经纪商的执行速度以及他们对所谓暗池流动性的访问也会影响你的交易成本。暗池流动性是由机构订单在交易所之外撮合形成的,或者来自经纪商内部客户订单的交叉撮合。这些订单不会以买卖报价的形式显示。一些提供暗池流动性的“另类交易系统”包括 Liquidnet 和 ITG 的 Posit。你的经纪商可能使用一个或多个这些提供商,可能只使用其内部交叉网络,或者根本不使用任何另类交易系统。
Sometimes, a better execution price at a large brokerage due to its state-of-the-art execution system and high-speed access to deeper dark pools of liquidity will more than compensate for its higher commissions. This kind of cost/benefit analysis cannot easily be carried out unless you actually trade on multiple brokerages simultaneously and compare the actual execution costs. 有时,由于大型经纪商拥有先进的执行系统和高速访问更深层暗池流动性的能力,其更优的执行价格将足以弥补其较高的佣金。除非你实际上同时在多个经纪商处进行交易并比较实际执行成本,否则这种成本/收益分析很难进行。
For example, I traded through Goldman Sachs’s REDIPlus trading platform, whose Sigma X execution engine routes orders to both its internal crossing network as well as to external liquidity providers. I had found that it often improved my execution price by more than a few cents per share over market order executions on Interactive Brokers (IBKR): more than enough to offset its higher 例如,我通过高盛的 REDIPlus 交易平台进行交易,该平台的 Sigma X 执行引擎将订单路由到其内部交叉网络以及外部流动性提供者。我发现它经常能比在 Interactive Brokers(IBKR)上的市价单执行价格提高几美分每股:这足以抵消其较高的佣金。
commissions. However, to be fair to IBKR, it has since introduced many routing options, including those offered by some specialty algorithmic execution firms (such as Quantitative Brokers for futures orders), in order to reduce your orders’ market impact and allow you to route to dark pools as necessary. 不过,为了公平起见,IBKR 后来引入了许多路由选项,包括一些专业算法执行公司的服务(例如针对期货订单的 Quantitative Brokers),以减少订单对市场的影响,并允许你根据需要路由到暗池。
Another consideration is the range of products you can trade. Many retail brokerages or proprietary trading firms do not allow you to trade futures or foreign currencies. This would be a serious limitation to your trading business’s growth. 另一个需要考虑的是你可以交易的产品范围。许多零售券商或自营交易公司不允许你交易期货或外汇。这将严重限制你的交易业务的增长。
Following these two fairly generic criteria, for a quantitative trader, the next important one is: Does the trading platform offer an application programming interface (API) so that your trading software can receive live data feed, generate orders, and automatically transmit the orders for execution in your account? I will discuss more about API in Chapter 5. The only point to note here is that without an API, high-frequency quantitative trading is impossible. 对于量化交易者来说,继这两个相当通用的标准之后,下一个重要的标准是:交易平台是否提供应用程序接口(API),以便你的交易软件能够接收实时数据流,生成订单,并自动将订单传输到你的账户中执行?我将在第 5 章中详细讨论 API。这里唯一需要注意的是,没有 API,高频量化交易是不可能的。
Closely related to the availability of API is the availability of paper trading accounts. If a brokerage does not offer you a paper trading account, it is very hard to test an API without risking real losses. Among the brokerages I know that offer paper trading accounts are Alpaca, Interactive Brokers, and Oanda (for currency trading). 与 API 的可用性密切相关的是模拟交易账户的可用性。如果经纪商不提供模拟交易账户,那么在不冒真实损失风险的情况下测试 API 将非常困难。在我所知道的提供模拟交易账户的经纪商中,有 Alpaca、Interactive Brokers 和 Oanda(用于货币交易)。
In addition to paper trading accounts, some brokerages provide a “simulator” account (an example is the demo account from Interactive Brokers), where quotes from the past are replayed as if they were real-time quotes, and an automated trading program can trade against these quotes at any time of the day in order to debug the program. 除了模拟交易账户之外,一些经纪商还提供“模拟器”账户(例如 Interactive Brokers 的演示账户),该账户会重放过去的行情,就像实时行情一样,自动交易程序可以在一天中的任何时间针对这些行情进行交易,以便调试程序。
Finally, the reputation and financial strength of the proprietary trading firm you are considering is also important. This does not matter to the choice of a retail brokerage because, as noted in Table 4.1, retail accounts are insured by the SIPC, whereas proprietary accounts are not. Hence, it is important that a proprietary trading firm has a strong balance sheet and good risk management practices to prevent the firm from collapsing because of bad trades made by its fellow member traders (WorldCom’s and Refco’s collapses are good examples). You should also make sure the firm is a brokerdealer registered with an exchange, so that it is regularly audited by the exchange and the SEC. Non-broker-dealer proprietary trading 最后,您所考虑的自营交易公司的声誉和财务实力也非常重要。这一点对选择零售经纪商来说并不重要,因为如表 4.1 所示,零售账户由 SIPC 保险保障,而自营账户则没有。因此,自营交易公司必须拥有强健的资产负债表和良好的风险管理措施,以防止因其成员交易员的错误交易导致公司倒闭(WorldCom 和 Refco 的倒闭就是很好的例子)。您还应确保该公司是注册在交易所的经纪自营商,以便接受交易所和 SEC 的定期审计。非经纪自营交易
firms were supposed to get shut down by the SEC, starting with Tuco Trading in March 2008, but you never know if some unscrupulous operators are still out there. Furthermore, even if times are good for the firm, does it have a good reputation for easy redemption of your capital should you choose to do so? It is, of course, difficult for an outsider to assess whether a proprietary trading firm has such good attributes, but you can read about the firm’s reputation based on current or ex-members’ opinions at the online forum www. elitetrader.com. You should also definitely check out its regulator FINRA’s website brokercheck.finra.org and see if there are “Disclosures,” which include customer complaints, arbitrations, or regulatory actions against the broker you are going to use. If there are, look up the details of each Disclosure to see if it is serious. (If your prop trading firm isn’t even listed as a broker, beware!) 这些公司本应从 2008 年 3 月开始由美国证券交易委员会(SEC)关闭,首先是 Tuco Trading,但你永远不知道是否还有一些不道德的操作者仍在活动。此外,即使公司经营状况良好,如果你选择赎回资金,它是否有良好的声誉能够轻松兑现你的资本?当然,外人很难评估一家自营交易公司是否具备这些良好特质,但你可以通过在线论坛 www.elitetrader.com 上当前或前成员的意见了解该公司的声誉。你还应当务必查看其监管机构 FINRA 的网站 brokercheck.finra.org,看看是否有“披露”,其中包括客户投诉、仲裁或针对你将要使用的经纪人的监管行动。如果有,查阅每条披露的详细信息,判断其严重性。(如果你的自营交易公司甚至没有被列为经纪人,务必小心!)
If you are undecided whether to open a retail brokerage account or a proprietary account, or which retail brokerage or proprietary firm to use, you can in fact do both, or open multiple accounts. Unlike finding full-time employment in proprietary trading firms, just joining them as a member, especially a remote-access member, does not usually compel you to sign a noncompete agreement. You are free to be a member of more than one proprietary firm, or have both proprietary and retail trading accounts, as long as this fact is fully disclosed to the proprietary trading firms involved and to the FINRA as “outside business activities” and prior permissions obtained. With multiple accounts, it should be easier for you to decide which cost structure is more beneficial to you and which account has the better infrastructure and tools for your automated trading system. And, in fact, sometimes each account has its own pros and cons, and you may want to have the flexibility of keeping all of them open for trading different strategies! 如果你还在犹豫是开设零售经纪账户还是自营账户,或者不知道该选择哪家零售经纪公司或自营公司,实际上你可以两者都做,或者开设多个账户。与在自营交易公司找到全职工作不同,仅作为成员加入这些公司,尤其是远程访问成员,通常不会强制你签署竞业禁止协议。只要你将此事实完全披露给相关的自营交易公司和 FINRA,作为“外部业务活动”并获得事先许可,你就可以同时成为多个自营公司的成员,或者同时拥有自营和零售交易账户。拥有多个账户后,你应该更容易决定哪种成本结构对你更有利,哪个账户拥有更适合你自动交易系统的基础设施和工具。事实上,有时每个账户都有其优缺点,你可能希望保留所有账户以便灵活地交易不同的策略!
PHYSICAL INFRASTRUCTURE 物理基础设施
Now that you have set up the legal and administrative structure of your trading business, it is time to consider the physical infrastructure. This applies to both retail and proprietary traders: Many 既然您已经建立了交易业务的法律和行政结构,现在是时候考虑物理基础设施了。这适用于零售交易者和自营交易者:许多
proprietary trading firms allow their members to trade remotely in their homes. If you are a proprietary trader who requires minimal coaching from your account manager and are confident in your ability to set up the physical trading infrastructure yourself, there is no reason not to trade remotely. 自营交易公司允许其成员在家远程交易。如果您是一名自营交易者,且只需账户经理的最少指导,并且有信心自己搭建物理交易基础设施,那么远程交易完全没有问题。
In the start-up phase of your business, the physical infrastructure can be light and simple. You probably have all the components you need in your home office already: a good personal computer (practically any new computer will do), a high-speed internet connection, and an uninterruptible power supply (UPS) so that your computer doesn’t get accidentally shut down in the middle of a trade because of electricity fluctuations. The total initial investment should not exceed a couple of thousand dollars at the maximum, and the monthly cost should not be more than $100\$ 100 or so if you don’t already subscribe to cable TV. 在业务的启动阶段,物理基础设施可以轻便且简单。您家中的办公环境很可能已经具备所需的所有组件:一台性能良好的个人电脑(实际上任何新电脑都可以),高速互联网连接,以及不间断电源(UPS),以防止因电力波动导致电脑在交易过程中意外关机。初期总投资最多不应超过几千美元,如果您尚未订阅有线电视,月度费用也不应超过大约 $100\$ 100 美元。
Some traders wonder if having a TV tuned to CNBC or CNN is a good idea. Although it certainly won’t hurt, many professional quantitative traders have found that it is not necessary, as long as they also subscribe to another professional real-time newsfeed, such as Thomson Reuters, Dow Jones, or Bloomberg. While Bloomberg can cost more than $2,000\$ 2,000 a month, various plans offered by Thomson Reuters and Dow Jones can cost as little as $100\$ 100 to $200\$ 200 (though some will require an annual contract). Bloomberg also has a free internet radio stream at www.bloomberg.com/tvradio/radio that announces breaking business news and commentaries. Also, instead of installing a TV in your office, you can subscribe to CNBC PRO, which will provide live streaming video to your computer. Of course, too much real-time information may not necessarily lead to more profitable trades. For example, Michael Mauboussin of Legg Mason cites a study that finds horse-racing handicappers less successful in their predictions when they are given more information when ranking horses (see Economist, 2007, or Oldfield, 2007). 一些交易者会想知道是否应该开着电视收看 CNBC 或 CNN。虽然这样做肯定不会有害,但许多专业的量化交易员发现,只要他们订阅了其他专业的实时新闻源,比如汤森路透、道琼斯或彭博社,开电视并不是必需的。彭博社的费用可能超过每月 $2,000\$ 2,000 ,而汤森路透和道琼斯提供的各种方案费用则低至 $100\$ 100 到 $200\$ 200 (不过有些方案需要签订年度合同)。彭博社还提供一个免费的网络广播流,网址是 www.bloomberg.com/tvradio/radio,播报突发商业新闻和评论。此外,你也可以订阅 CNBC PRO,直接在电脑上观看直播视频,而不必在办公室安装电视。当然,过多的实时信息不一定能带来更高的交易利润。例如,Legg Mason 的 Michael Mauboussin 引用了一项研究,发现赛马预测者在获得更多信息后,预测马匹排名的准确率反而下降(参见《经济学人》,2007 年,或 Oldfield,2007 年)。
As your trading business grows, you may upgrade your infrastructure gradually. Perhaps you will purchase faster computers, or move your automated trading system to a virtual private server (VPS) that has direct (not public internet) connections to your 随着您的交易业务发展,您可能会逐步升级您的基础设施。也许您会购买更快的计算机,或者将您的自动交易系统迁移到一个虚拟专用服务器(VPS),该服务器与您的
broker’s trading server. As discussed in Chapter 2, any delay in the transmission of your order to your brokerage results in slippage, which is quite real in terms of lost profits. In fast-moving markets, every millisecond counts. Besides offering a high-speed connection, a VPS (e.g., speedytradingservers.com) will also ensure your trading strategy is resilient to common household disasters such as internet outage, electricity outage, flooding, and so on. It can cost only a few hundred dollars a month. You can monitor your trading programs at the VPS using a free software such as Windows Remote Desktop. 经纪商的交易服务器之间有直接连接(而非公共互联网连接)。正如第二章所讨论的,任何订单传输到经纪商的延迟都会导致滑点,这在利润损失方面是非常真实的。在快速变化的市场中,每一毫秒都至关重要。除了提供高速连接外,VPS(例如 speedytradingservers.com)还可以确保您的交易策略能够抵御常见的家庭灾难,如互联网中断、电力中断、洪水等。其费用每月仅需几百美元。您可以使用免费的软件,如 Windows 远程桌面,在 VPS 上监控您的交易程序。
You will certainly want to purchase multiple monitors to hook up to the same computer so that you have extended screen space to monitor all the different trading applications and portfolios. 您肯定会想购买多个显示器连接到同一台计算机,以便拥有更大的屏幕空间来监控所有不同的交易应用程序和投资组合。
SUMMARY 总结
This chapter focused on those decisions and steps that you need to take to bridge the research phase and the execution phase of your trading business. I have covered the pros and cons of retail trading versus proprietary trading and the issues to consider in choosing a brokerage or proprietary trading firm. 本章重点讨论了你需要做出的决策和采取的步骤,以衔接交易业务的研究阶段和执行阶段。我介绍了零售交易与专有交易的优缺点,以及选择经纪公司或专有交易公司的相关考虑事项。
In a nutshell, retail brokerages give you complete freedom and better capital protection but smaller leverage, while proprietary trading firms give you less freedom and less capital protection but much higher leverage. Finding a suitable retail brokerage is relatively easy. It took me less than a month to research and settle on one, and I have not found a reason to switch yet. Finding a suitable proprietary trading firm is much more involved, since there are contracts to sign and an exam (Series 7) to pass. It took me several months to get my account set up at one. 简而言之,零售经纪公司给予你完全的自由和更好的资金保护,但杠杆较小;而专有交易公司则自由度较低、资金保护较少,但杠杆远高得多。找到合适的零售经纪公司相对容易。我花了不到一个月的时间进行调研并确定了一家,到目前为止还没有找到更换的理由。找到合适的专有交易公司则复杂得多,因为需要签订合同并通过一项考试(Series 7)。我花了几个月时间才在一家专有交易公司开设账户。
Of course, you can choose to have both retail and proprietary accounts, each tailored to the specific needs of your strategies. This way, you can also easily compare their speed of execution and depth of liquidity. 当然,你也可以选择同时拥有零售账户和专有账户,分别针对你的策略的具体需求进行定制。这样,你也可以轻松比较它们的执行速度和流动性深度。
Regardless of whether you have chosen to trade in a retail brokerage or join a proprietary trading firm, you need to make sure their trading account and systems have these features: 无论你选择在零售经纪公司交易还是加入专有交易公司,你都需要确保他们的交易账户和系统具备以下功能:
Relatively low commissions 相对较低的佣金
Trade a good variety of financial instruments 交易多种金融工具
Access to deep pool of liquidity 访问深度流动性池
Most importantly, API for real-time data retrieval and order transmission 最重要的是,用于实时数据检索和订单传输的 API
I also described the progressive buildup of the physical infrastructure you need to build in order to run a trading business. Some of the components of a trader’s operating environment mentioned are: 我还描述了构建交易业务所需的物理基础设施的逐步搭建过程。提到的交易者操作环境的一些组成部分包括:
Personal computer 个人电脑
High-speed internet connection 高速互联网连接
Noninterruptible power supply 不间断电源供应
Real-time data and news feed and subscription to financial TV news channels 实时数据和新闻推送,以及订阅财经电视新闻频道
VPS
Building out the physical trading infrastructure is actually quite easy, since in the beginning you probably have all the components ready in your home office already. I have found that it is easy to trade a million-dollar portfolio with nothing more than a few thousand dollars’ initial investment in your physical infrastructure and a few hundred dollars a month in operating cost. But if you want to increase your trading capacity or improve your returns, additional incremental investments will be needed. 搭建物理交易基础设施其实相当简单,因为一开始你可能已经在家庭办公室准备好了所有组件。我发现,用几千美元的初始投资搭建物理基础设施,再加上每月几百美元的运营成本,就能轻松管理一个百万美元的投资组合进行交易。但如果你想增加交易容量或提高收益,就需要额外的增量投资。
Once you have considered and taken these steps, you are now positioned to build an automated trading environment to execute your strategy, which will be covered in the next chapter. 一旦你考虑并采取了这些步骤,你就具备了构建自动化交易环境以执行策略的条件,下一章将会介绍这部分内容。
REFERENCES 参考文献
Economist. 2007. “Too Much Information.” July 12. www.economist.com /finance/displaystory.cfm?story_id=9482952. 经济学人。2007 年。“信息过载。”7 月 12 日。www.economist.com/finance/displaystory.cfm?story_id=9482952。
Markoff, John. 2007. “Faster Chips Are Leaving Programmers in Their Dust.” New York Times, December 17. www.nytimes.com/2007/12/17 /technology/17chip.html?ex=1355634000&en=a81769355deb7953&ei=5124 &partner=permalink&exprod=permalink. 马克福,约翰。2007 年。“更快的芯片让程序员望尘莫及。”纽约时报,12 月 17 日。www.nytimes.com/2007/12/17/technology/17chip.html?ex=1355634000&en=a81769355deb7953&ei=5124&partner=permalink&exprod=permalink。
Oldfield, Richard. 2007. Simple but Not Easy. Doddington Publishing. 奥尔德菲尔德,理查德。2007 年。《简单但不容易》。多丁顿出版社。
CIIAPTER 5 第五章
Execution Systems 执行系统
At this point, you should have backtested a good strategy (maybe something like the pair-trading strategy in Example 3.6), picked a brokerage (e.g., Alpaca or Interactive Brokers), and have set up a good operating environment (at first, nothing more than a good computer and a high-speed internet connection). You are almost ready to execute your trading strategy-after you have implemented an automated trading system (ATS) to generate and transmit your orders to your brokerage for execution. This chapter is about building such an automated trading system and ways to minimize trading costs and divergence with your expected performance based on your backtests. 此时,你应该已经对一个好的策略进行了回测(可能是像示例 3.6 中的配对交易策略),选择了一个经纪商(例如,Alpaca 或 Interactive Brokers),并且搭建了一个良好的操作环境(起初只需一台好电脑和高速互联网连接)。你几乎准备好执行你的交易策略了——在你实现了一个自动交易系统(ATS)来生成并传输订单给你的经纪商执行之后。本章将介绍如何构建这样的自动交易系统,以及如何最大限度地减少交易成本和与你基于回测的预期表现之间的偏差。
WHAT AN AUTOMATED TRADING SYSTEM CAN DO FOR YOU 自动交易系统能为你做什么
An automated trading system will retrieve up-to-date market data from your brokerage or other data vendors, run a trading algorithm to generate orders, and submit those orders to your brokerage for execution. Sometimes, all these steps are fully automated and implemented as one desktop application installed on your computer. 自动交易系统会从你的经纪商或其他数据供应商处获取最新的市场数据,运行交易算法生成订单,并将这些订单提交给你的经纪商执行。有时,所有这些步骤都是完全自动化的,并作为一个安装在你电脑上的桌面应用程序实现。
Other times, only part of this process is automated, and you would have to take some manual steps to complete the whole procedure. 有时,这个过程只有部分是自动化的,你需要采取一些手动步骤来完成整个程序。
A fully automated system has the advantage that it minimizes human errors and delays. For certain high-frequency systems, a fully automated system is indispensable, because any human intervention will cause enough delay to seriously derail the performance. A fully automated system used to be complicated and costly to build, often requiring professional programmers with knowledge of high-performance programming languages such as Java, C#, or C++ in order to connect to your brokerage’s application programming interface (API). But now, with the easy availability of platforms such as QuantConnect and Blueshift, or various automated trading software such as MATLAB’s Trading Toolbox, Python’s Backtrader, or R’s IBroker, you only need to be an amateur programmer (or not a programmer at all, in the case of Blueshift) to build a fully automated trading system. 全自动系统的优点是能够最大限度地减少人为错误和延迟。对于某些高频交易系统来说,全自动系统是必不可少的,因为任何人为干预都会导致足够的延迟,从而严重影响性能。过去,构建全自动系统既复杂又昂贵,通常需要具备 Java、C#或 C++等高性能编程语言知识的专业程序员,才能连接到你的经纪商的应用程序接口(API)。但现在,随着 QuantConnect 和 Blueshift 等平台的普及,或者 MATLAB 的 Trading Toolbox、Python 的 Backtrader、R 的 IBroker 等各种自动交易软件的出现,你只需是一个业余程序员(或者在 Blueshift 的情况下根本不需要程序员)就能构建一个全自动交易系统。
For lower-frequency quantitative trading strategies, there is also a semiautomated alternative: One can generate the orders using programs such as Excel or MATLAB, then submit those orders using built-in tools such as a basket trader or spread trader offered by your brokerage. If your brokerage provides a dynamic data exchange (DDE) link to Excel (as follows), you can also write a macro attached to your Excel spreadsheet that allows you to submit orders to the brokerage simply by running the macro. This way, there is no need to build an application in a complicated programming language. However, it does mean that you would have to perform quite a few manual steps in order to submit your orders. 对于低频率的量化交易策略,还有一种半自动化的替代方案:可以使用 Excel 或 MATLAB 等程序生成订单,然后通过券商提供的篮子交易工具或价差交易工具提交这些订单。如果你的券商提供了与 Excel 的动态数据交换(DDE)链接(如下所示),你还可以编写一个附加在 Excel 电子表格上的宏,只需运行该宏即可向券商提交订单。这样,就无需用复杂的编程语言构建应用程序。然而,这也意味着你需要执行相当多的手动步骤来提交订单。
Whether you have built a semiautomated or a fully automated trading system, there is often a need for input data beyond the prices that your brokerage or data vendor can readily provide you. For example, earnings estimates or dividends data are often not provided as part of the real-time data stream. These nonprice data are typically available free of charge from a broker. For example, Interactive Brokers provides free dividends and earnings estimates data (via Zacks) to their customers. Expected earnings announcements dates and times are available too (via Wall Street Horizon), for a small fee. 无论您构建的是半自动化还是全自动化的交易系统,通常都需要超出您的经纪商或数据供应商能够轻松提供的价格数据的输入数据。例如,盈利预估或股息数据通常不会作为实时数据流的一部分提供。这些非价格数据通常可以从经纪商免费获得。例如,Interactive Brokers 向其客户免费提供股息和盈利预估数据(通过 Zacks)。预期的盈利公告日期和时间也可以获得(通过 Wall Street Horizon),但需支付少量费用。
I will discuss some details of the two kinds of systems in the following sections. I will also discuss how to hire a programming consultant in case you would like someone to help automate the execution of your trading strategy. 我将在接下来的章节中讨论这两种系统的一些细节。如果您希望有人帮助自动执行您的交易策略,我还将讨论如何聘请编程顾问。
Building a Semiautomated Trading System 构建半自动化交易系统
In a semiautomated trading system (shown in Figure 5.1), a user typically generates a list of orders using familiar and easy-to-use software such as Excel, MATLAB, Python, or R. Often, the program that generates this order list is the same as the backtest program: After all, you are implementing the same quantitative strategy that you have backtested. Of course, you must remember to update the input data file to reflect the most recent data. This is usually done with either a program that can directly go to retrieve the appropriate data from a broker (such as Alpaca or Interactive Brokers), quantitative trading platform (such as QuantConnect or Blueshift), or data vendor’s (such as Algoseek or Quandl) API. 在半自动化交易系统中(如图 5.1 所示),用户通常使用熟悉且易用的软件,如 Excel、MATLAB、Python 或 R,生成订单列表。通常,生成该订单列表的程序与回测程序是相同的:毕竟,你正在实现的是你已经回测过的同一量化策略。当然,你必须记得更新输入数据文件,以反映最新的数据。通常,这通过一个程序完成,该程序可以直接从经纪商(如 Alpaca 或 Interactive Brokers)、量化交易平台(如 QuantConnect 或 Blueshift)或数据供应商(如 Algoseek 或 Quandl)的 API 获取相应数据。
Sometimes, the API is as simple as a DDE link that can update an Excel spreadsheet. Many brokerages that cater to semi-quantitative traders provide such DDE links. Interactive Brokers and Goldman 有时,API 就像一个可以更新 Excel 电子表格的 DDE 链接一样简单。许多面向半量化交易者的经纪商都提供这样的 DDE 链接。Interactive Brokers 和 Goldman
Sachs’s REDIPlus are examples. Many proprietary trading firms use one of these brokerages for execution; hence, you would have access to the full menu of these brokerages’ real-time data and order entry technologies as well. Sachs 的 REDIPlus 就是例子。许多专有交易公司使用这些经纪商之一进行执行;因此,你也可以访问这些经纪商提供的实时数据和订单输入技术的完整菜单。
A DDE link is just an expression to be inserted on an Excel spreadsheet that will automatically load the appropriate data into a cell. The expression is different for different brokerages, but they generally look like this: DDE 链接只是一个表达式,插入到 Excel 电子表格中,可以自动将相应的数据加载到单元格中。不同的券商表达式不同,但它们通常看起来像这样:
=accountid|LAST!IBM
where LAST indicates the last price is requested, and IBM is the symbol in question. 其中 LAST 表示请求最新价格,IBM 是相关的股票代码。
To generate the orders, you can run an Excel macro (a Visual Basic program attached to the spreadsheet) or a MATLAB program, which scans through the information and prices on the spreadsheet, runs the trading algorithm, and writes out the orders to another text file where each line contains the triplet (symbol, side, size). For example, 为了生成订单,你可以运行一个 Excel 宏(附加在电子表格上的 Visual Basic 程序)或一个 MATLAB 程序,该程序扫描电子表格上的信息和价格,运行交易算法,并将订单写入另一个文本文件,每行包含一个三元组(股票代码、买卖方向、数量)。例如,
("IBM", "BUY", "100")
might be a line in the output order file. Sometimes, your brokerage requires other information for order submission, such as whether the order is Day Only, or Good Till Cancel. All this auxiliary information is written out to each line of the order file. 可能是输出订单文件中的一行。有时,经纪商会要求订单提交时提供其他信息,比如订单是否仅限当日有效,或是有效直到取消。所有这些辅助信息都会写入订单文件的每一行。
After the text file containing the order list is generated, you can then upload this order file to your brokerage’s basket trader or spread trader for submission. 在生成包含订单列表的文本文件后,您可以将该订单文件上传到经纪商的篮子交易器或价差交易器进行提交。
A basket trader is an application that allows you to upload multiple orders for multiple symbols and submit them to the brokerage in one keystroke. Spread trader is an application with which you can specify the symbols of multiple pairs of stocks or other securities, and the conditions when orders for each of these pairs should be entered. The spread trader can monitor real-time prices and check whether these conditions are satisfied throughout the trading day. If the DDE links of your brokerage allow you to submit orders, you can also run an Excel macro to sweep through the order file and submit all the orders to your account with one press of the button as well. 篮子交易器是一种应用程序,允许您上传多个股票代码的多个订单,并一键提交给经纪商。价差交易器是一种应用程序,您可以指定多对股票或其他证券的代码,以及每对订单应何时下达的条件。价差交易器可以监控实时价格,并在整个交易日内检查这些条件是否满足。如果您的经纪商的 DDE 链接允许提交订单,您还可以运行 Excel 宏,遍历订单文件,并一键将所有订单提交到您的账户。
The brokerage that I use, Interactive Brokers, has a BasketTrader and three different spread traders, as well as DDE links 我使用的经纪商是 Interactive Brokers,他们提供了 BasketTrader 和三种不同的价差交易工具,以及作为其执行平台一部分的数据更新和订单提交的 DDE 链接。
for data update and order submission as part of their execution platform. Interactive Brokers’ spread orders can be used for futures, options, stock, and stock-vs.-option spreads on the same or multiple underlyings. Its ComboTrader can handle all these spreads, while its SpreadTrader is specifically for futures and options spreads, and its OptionTrader is specifically for options spreads. (Confusing? Yes, because functionalities proliferate over time and they must be backward compatible.) Interactive Brokers 的价差订单可以用于期货、期权、股票以及同一或多个标的的股票与期权价差。其 ComboTrader 可以处理所有这些价差,而 SpreadTrader 专门用于期货和期权价差,OptionTrader 则专门用于期权价差。(很混乱吗?是的,因为功能随着时间推移不断增加,而且必须保持向后兼容。)
Here is what I did with BasketTrader from Interactive Brokers. Every day before the market opened, I ran a MATLAB program (though you can just as well run a Python or R program) that retrieved market data, ran the trading algorithm, and wrote out a list of orders into an order file that can be over 1,000 lines (corresponding to over 1,000 symbols). I then brought up the basket trader from my trading screen, uploaded the order file to my account using the BasketTrader, and in one keystroke submitted them all to my account. Some of these orders might get executed at the open; others might get executed later or not at all. Before the market closed, I canceled all the unexecuted orders by pressing a button. Finally, if I wanted to exit all the existing positions, I simply pressed another button in the basket trader to generate the appropriate exit orders. 这是我使用 Interactive Brokers 的 BasketTrader 所做的操作。每天在市场开盘前,我运行一个 MATLAB 程序(当然你也可以运行 Python 或 R 程序),该程序获取市场数据,运行交易算法,并将订单列表写入一个订单文件,该文件可能超过 1000 行(对应超过 1000 个股票代码)。然后,我从交易界面打开 BasketTrader,使用 BasketTrader 将订单文件上传到我的账户,并通过一次按键将所有订单提交到我的账户。这些订单中有些可能在开盘时成交;有些可能稍后成交,或者根本未成交。市场收盘前,我按一个按钮取消所有未成交的订单。最后,如果我想退出所有现有仓位,只需在 BasketTrader 中按另一个按钮,生成相应的退出订单。
I used to use REDIPlus’s spread trader for pair-trading strategies such as Example 3.6 because I could have the spread trader enter orders at all times of the day, not just at the market close. Again, before the market opened I used MATLAB (again, Python or RR would do this just as well) to retrieve market data, ran the pairtrading algorithm, and wrote out limit prices for all the pairs in my universe. (Note that the limit prices are limits on the spread, not on the individual stocks. If they were on the individual stocks, ordinary limit orders would have done the trick and the spread trader would have been redundant.) I then went to the spread trader, which already contained all these pairs that I had previously specified, and manually adjusted the limit prices based on the MATLAB output. (Actually, this step could be automated, too-all the spread orders information could be written out to an Excel file by MATLAB and uploaded to the spread trader.) Pressing another button would initiate automatic monitoring of prices and entering of orders throughout the trading day. 我过去常用 REDIPlus 的价差交易工具来执行配对交易策略,比如示例 3.6,因为它可以在全天任何时间下单,而不仅仅是在收盘时。每天开盘前,我会用 MATLAB(当然,Python 或 RR 也同样适用)获取市场数据,运行配对交易算法,并写出我关注的所有配对的限价。(注意,这里的限价是针对价差的,而不是单个股票的。如果是针对单个股票的限价,普通的限价单就足够了,价差交易工具就显得多余了。)然后我会打开价差交易工具,它已经包含了我之前指定的所有配对,根据 MATLAB 的输出手动调整限价。(实际上,这一步也可以自动化——所有价差订单信息都可以由 MATLAB 写入 Excel 文件,再上传到价差交易工具。)按下另一个按钮后,系统会自动监控价格并在整个交易日内下单。
I also used RediPlus’s DDE link for submitting orders for another basket trading strategy. I used MATLAB to generate the appropriate DDE link formula in each Excel cell so that it could automatically update the appropriate data for the particular symbol on that row. After the market opened, I ran a macro attached to that spreadsheet, which scanned through each symbol and submitted it (together with other order information contained in the spreadsheet) to my account at REDIPlus. 我还使用了 RediPlus 的 DDE 链接来提交另一个篮子交易策略的订单。我用 MATLAB 生成了每个 Excel 单元格中相应的 DDE 链接公式,以便它能够自动更新该行中特定标的的相关数据。市场开盘后,我运行了附加在该电子表格上的宏,该宏扫描每个标的,并将其(连同电子表格中包含的其他订单信息)提交到我在 REDIPlus 的账户。
Typically, a semiautomated trading system is suitable if you need to run this step only a few times a day in order to generate one or a few waves of orders. Even if your brokerage’s API provides an order submission function for your use in an Excel Visual Basic macro, its speed is usually too slow if you have to run this program frequently in order to capture the latest data and generate wave after wave of orders. In this case, one must build a fully automated trading system. But there is one reason why you may want a semi-automated instead of a fully automated system: the ability to sanity-check your orders before they went on their merry way to the broker. (Just do a search for Knight Capital Group, now part of the HFT firm Virtu Financial, about their $440\$ 440 million software error on August 1, 2012.) 通常,如果您只需要每天运行此步骤几次以生成一波或几波订单,那么半自动交易系统是合适的。即使您的经纪商的 API 提供了一个用于 Excel Visual Basic 宏中的订单提交功能,如果您必须频繁运行此程序以捕捉最新数据并生成一波又一波的订单,其速度通常也太慢。在这种情况下,必须构建一个全自动交易系统。但有一个原因可能让您选择半自动系统而非全自动系统:能够在订单发送给经纪商之前对其进行合理性检查。(只需搜索 Knight Capital Group,现为高频交易公司 Virtu Financial 的一部分,了解他们在 2012 年 8 月 1 日发生的 $440\$ 440 百万美元软件错误。)
Building a Fully Automated Trading System 构建全自动交易系统
A fully automated trading system (see Figure 5.2) can run the trading algorithm in a loop again and again, constantly scanning the latest prices and generating new waves of orders throughout the trading day. The submission of orders through an API to your brokerage 全自动交易系统(见图 5.2)可以循环运行交易算法,不断扫描最新价格,并在整个交易日内生成新的订单波。通过 API 向您的经纪商提交订单
FIGURE 5.2 Fully automated trading system. 图 5.2 全自动交易系统。
account is automatic, so you would not need to load the trades to a basket trader or spread trader, or even manually run a macro on your Excel spreadsheet. All you need to do is press a “start” button in the morning, and then a “close” button at the end of the day, and your program will do all the trading for you. 账户是自动的,因此您无需将交易加载到篮子交易员或价差交易员,甚至无需在 Excel 电子表格上手动运行宏。您只需在早上按下“开始”按钮,到了当天结束时按下“关闭”按钮,程序就会为您完成所有交易。
Implementing a fully automated system requires that your brokerage provides an API for data retrieval and order submission. Your brokerage will usually provide an API for some popular programming languages such as Visual Basic, Java, C#, or C++, so your fully automated system must also be written in one of these languages. Alternatively, you can use a quant trading platform such as QuantConnect or Blueshift to execute your strategy with your favorite broker. (After all, you have already backtested your strategy on one of these platforms, so why not use them for execution too?) If you are a MATLAB fan like myself, you can use its Trading Toolbox for that (or get a third-party toolbox through undocumentedmatlab.com). For those brokers that provide a RESTful API (and that includes Alpaca, Interactive Brokers, or even Robinhood!), you can use any programming language you like, as any language can send HTTP Get and Post requests. However, making these HTTP requests are slower than using an API designed for a specific programming language like C# to get market data and submit orders, so that might not work well if your trading strategy is latency-sensitive. 实现一个全自动系统需要你的券商提供用于数据检索和订单提交的 API。你的券商通常会为一些流行的编程语言提供 API,比如 Visual Basic、Java、C#或 C++,因此你的全自动系统也必须用这些语言之一编写。或者,你可以使用量化交易平台,如 QuantConnect 或 Blueshift,通过你喜欢的券商执行策略。(毕竟,你已经在这些平台上回测过策略,为什么不直接用它们来执行呢?)如果你像我一样是 MATLAB 爱好者,可以使用其 Trading Toolbox(或通过 undocumentedmatlab.com 获取第三方工具箱)来实现。对于那些提供 RESTful API 的券商(包括 Alpaca、Interactive Brokers,甚至 Robinhood!),你可以使用任何你喜欢的编程语言,因为任何语言都能发送 HTTP Get 和 Post 请求。然而,发送这些 HTTP 请求比使用为特定编程语言(如 C#)设计的 API 获取市场数据和提交订单要慢,因此如果你的交易策略对延迟敏感,这种方式可能效果不佳。
Theoretically, a fully automated system can be constructed out of an Excel spreadsheet and an attached macro: All you have to do is to create a loop in your macro so that it updates the cells using the DDE links and submit orders when appropriate continuously throughout the day. Unfortunately, data updates through DDE links are slow, and generally your brokerage limits the number of symbols that you can update all at once. (Unless you have generated a large amount of commissions in the previous trading month, Interactive Brokers allows you to update only 100 symbols by default.) Similarly, order submissions through DDE links are also slow. Hence, for trading strategies that react to real-time market data changes intraday, this setup using a spreadsheet is not feasible. 理论上,一个完全自动化的系统可以由一个 Excel 电子表格和一个附加的宏构建而成:你所要做的就是在宏中创建一个循环,使其通过 DDE 链接不断更新单元格,并在适当的时候持续提交订单。遗憾的是,通过 DDE 链接进行数据更新速度较慢,而且通常你的券商会限制你一次性更新的股票代码数量。(除非你在上一个交易月产生了大量佣金,否则 Interactive Brokers 默认只允许你更新 100 个股票代码。)同样,通过 DDE 链接提交订单的速度也很慢。因此,对于那些需要对盘中实时市场数据变化做出反应的交易策略来说,使用电子表格的这种设置是不可行的。
Some brokerages, such as TradeStation, offer a complete backtesting and order submission platform. If you backtested on such 一些券商,比如 TradeStation,提供了完整的回测和订单提交平台。如果你在这样的环境中进行回测
a platform, then it is trivial to configure it so that the program will submit real orders to your account. This dispenses with the need to write your own software, whether for backtesting or for automated execution. However, as I mentioned in Chapter 3, the drawback of such proprietary systems is that they are seldom as flexible as a general-purpose programming language like MATLAB, Python, or R for the construction of your strategy. For instance, if you want to pursue a rather mathematically complex strategy based on principal component analysis (such as the one in Example 7.4), it would be quite difficult to backtest in TradeStation. 如果是一个平台,那么配置程序以向你的账户提交真实订单就变得非常简单。这就免去了你自己编写软件的需求,无论是用于回测还是自动执行。然而,正如我在第三章提到的,这类专有系统的缺点是,它们通常不如像 MATLAB、Python 或 R 这样通用的编程语言灵活,无法很好地构建你的策略。例如,如果你想执行一个基于主成分分析的相当复杂的数学策略(如示例 7.4 中的策略),在 TradeStation 中进行回测将会相当困难。
IIIRING A PROGRAMMING CONSULTANT 聘请编程顾问
Building an ATS generally requires more professional programming skills than backtesting a strategy. This is especially true for high-frequency strategies where the speed of execution is of the essence. Instead of implementing an execution system yourself, you may find that hiring a programming consultant will result in much less headache. 构建自动交易系统通常比回测策略需要更专业的编程技能。对于高频策略尤其如此,因为执行速度至关重要。与其自己实现执行系统,你可能会发现聘请编程顾问会让你少很多麻烦。
Hiring a programming consultant does not have to be expensive. For an experienced programmer, the hourly fees may range from $50\$ 50 to $100\$ 100. Sometimes, you can negotiate a fixed fee for the entire project ahead of time, and I find that most projects for independent traders can be done with $1,000\$ 1,000 to $5,000\$ 5,000. If you have an account at one of the brokerages that supply you with an API, the brokerage can often refer you to some programmers who have experience with their API. (Interactive Brokers, for example, has a special web page that allows programming consultants to offer their services.) You can also look around (or post a request) on elitetrader.com for such programmers. 雇佣编程顾问不一定很昂贵。对于有经验的程序员,小时费用可能在 $50\$ 50 到 $100\$ 100 之间。有时,你可以提前协商整个项目的固定费用,我发现大多数独立交易者的项目可以用 $1,000\$ 1,000 到 $5,000\$ 5,000 完成。如果你在提供 API 的券商开设了账户,券商通常可以推荐一些有其 API 经验的程序员。(例如,Interactive Brokers 有一个专门的网页,允许编程顾问提供他们的服务。)你也可以在 elitetrader.com 上寻找(或发布请求)这类程序员。
As a last resort, you can find hundreds if not thousands of freelance programmers advertising themselves on Upwork.com. However, the freelance programmers on Upwork may lack in-depth knowledge of financial markets and trading technology, which can be crucial to successfully implementing an automated trading system. 作为最后的选择,你可以在 Upwork.com 上找到数百甚至数千名自由职业程序员自我宣传。然而,Upwork 上的自由职业程序员可能缺乏对金融市场和交易技术的深入了解,而这些知识对于成功实现自动交易系统至关重要。
There is one issue that may worry you as you consider hiring programmers: How do you keep your trading strategy confidential? Of course, you can have them sign nondisclosure agreements (NDAs, downloadable for free at many legal document websites), but it is almost impossible to find out if the programmers are in fact running your strategies in their personal accounts once the programs are implemented. There are several ways to address this concern. 当你考虑雇佣程序员时,可能会担心一个问题:如何保持你的交易策略的机密性?当然,你可以让他们签署保密协议(NDA,许多法律文档网站都可以免费下载),但一旦程序被实现,几乎不可能知道程序员是否真的在他们的个人账户中运行你的策略。对此,有几种方法可以解决这个问题。
First, as I mentioned before, most strategies that you may think are your unique creations are actually quite well known to experienced traders. So, 首先,正如我之前提到的,你可能认为是你独创的大多数策略,实际上对有经验的交易者来说是相当熟知的。所以,
whether you like it or not, other people are already trading very similar strategies and impacting your returns. Adding an extra trader or two, unless the trader works for an institutional money manager, is not likely to cause much more impact. 无论你喜欢与否,其他人已经在交易非常相似的策略,并且影响着你的收益。增加一两个交易者,除非该交易者为机构资金管理者工作,否则不太可能造成更多影响。
Second, if you are trading a strategy that has a large capacity (e.g., most futures trading strategies), then the extra market impact from your rogue programmer consultant will be minimal. 其次,如果你交易的是容量较大的策略(例如,大多数期货交易策略),那么你那位擅自操作的程序员顾问带来的额外市场影响将是微乎其微的。
Finally, you can choose to compartmentalize your information and imple-mentation-that is, you can hire different programmers to build different parts of the automated trading strategy. Often, one programmer can build an automated trading infrastructure program that can be used for different strategies, and another one can implement the actual strategy, which will read in input parameters. So in this case, the first programmer does not know your strategy, and the second programmer does not have the infrastructure to execute the strategy. Furthermore, neither programmer knows the actual parameter values to use for your strategy. 最后,你可以选择将你的信息和实现进行分隔——也就是说,你可以雇佣不同的程序员来构建自动交易策略的不同部分。通常,一个程序员可以构建一个可用于不同策略的自动交易基础设施程序,而另一个程序员则可以实现实际的策略,该策略会读取输入参数。因此,在这种情况下,第一个程序员不了解你的策略,第二个程序员也没有执行策略的基础设施。此外,两位程序员都不知道你的策略所使用的实际参数值。
MINIMIZING TRANSACTION COSTS 最小化交易成本
We saw in Chapter 3 how transaction costs can impact a strategy’s actual return. Besides changing your brokerage or proprietary trading firm to one that charges a lower commission, there are a few things you can do in your execution method to minimize the transaction costs. 我们在第三章中看到交易成本如何影响策略的实际收益。除了更换为收取更低佣金的经纪公司或自营交易公司外,你还可以在执行方法中采取一些措施来最小化交易成本。
To cut down on commissions, you can refrain from trading low-priced stocks. Typically, institutional traders do not trade any stocks with prices lower than $5\$ 5. Not only do low-price stocks increase your total commissions costs (since you need to buy or sell more shares for a fixed amount of capital), percentage-wise they also have a wider bid-ask spread and therefore increase your total liquidity costs. 为了减少佣金,你可以避免交易低价股。通常,机构交易者不会交易价格低于 $5\$ 5 的股票。低价股不仅会增加你的总佣金成本(因为你需要买卖更多的股票以固定的资金量进行交易),从百分比上看,它们的买卖价差也更大,因此会增加你的总流动性成本。
In order to minimize market impact cost, you should limit the size (number of shares) of your orders based on the liquidity of the stock. One common measure of liquidity is the average daily volume (it is your choice what lookback period you want to average over). As a rule of thumb, each order should not exceed 1 percent of the average daily volume. As an independent trader, you may think that it is not easy to reach this 1 percent threshold, and you would be 为了最小化市场冲击成本,你应根据股票的流动性限制订单的规模(股票数量)。流动性的一个常用衡量标准是平均日交易量(你可以自行选择回溯期来计算平均值)。经验法则是,每笔订单不应超过平均日交易量的 1%。作为独立交易者,你可能会认为达到这个 1%的门槛并不容易,而你是
right when the stock in question is a large-cap stock belonging to the S&P 500. However, you may be surprised by the low liquidity of some small-cap stocks out there. 正确的,尤其当所交易的股票是属于标普 500 的大盘股时。然而,你可能会对一些小盘股的低流动性感到惊讶。
For example, at the time of this writing, Bel Fuse Inc. is a stock in the S&P 600 SmallCap Index. It has a three-month average volume of about 30,000 , and it closed recently at about $10\$ 10. So 1 percent of this average volume is just 300 shares, which are worth only $3,000\$ 3,000. And this is a stock that is included in an index. Imagine those that aren’t! 例如,在撰写本文时,Bel Fuse Inc. 是标普 600 小型股指数中的一只股票。它的三个月平均成交量约为 30,000 股,最近的收盘价约为 $10\$ 10 。因此,平均成交量的 1%仅为 300 股,价值仅为 $3,000\$ 3,000 。而这还是一只被纳入指数的股票。试想那些未被纳入指数的股票情况!
Another way to reduce market impact is to scale the size of your orders based on the market capitalization of a stock. The way to scale the size is not an exact science, but most practitioners would not recommend a linear scale because the market capitalization of companies varies over several orders of magnitude, from tens of millions to hundreds of billions. A linear scale (i.e., scaling the capital of a stock to be linearly proportional to its market capitalization) would result in practically zero weights for most small- and microcap stocks in your portfolio, and this will take away any benefits of diversification. If we were to use linear scale, the capital weight of the largest large-cap stock will be about 10,000 of the smallest small-cap stock. To reap the benefits of diversification, we should not allow that ratio to be more than 10 or so, provided that the liquidity (volume) constraint described previously is also satisfied. If the capital weight of a stock is proportional to the fourth root of its market cap, it would do the trick. 另一种减少市场冲击的方法是根据股票的市值来调整订单的规模。调整规模的方法并非精确的科学,但大多数从业者不建议采用线性比例,因为公司的市值差异巨大,从数千万到数千亿不等。线性比例(即将股票的资金规模与其市值线性挂钩)会导致大多数小盘股和微盘股在你的投资组合中权重几乎为零,这会丧失多样化带来的任何好处。如果采用线性比例,最大的大盘股的资金权重将是最小的小盘股的约 1 万倍。为了获得多样化的好处,我们不应让这个比例超过 10 左右,前提是之前提到的流动性(成交量)限制也得到满足。如果股票的资金权重与其市值的四次方根成正比,就能达到这个效果。
There is one other way to reduce market impact. Many institutional traders who desire to execute a large order will break it down into many smaller orders and execute them over time. This method of trading will certainly reduce market impact; however, it engenders another kind of transaction cost, namely, slippage. As discussed in Chapter 2, slippage is the difference between the price that triggers the trading signal and the average execution price of the entire order. Because the order is executed over a period of time, slippage can be quite large. Since reducing market impact in this way may increase slippage, it is not really suitable for retail traders whose order size is usually not big enough to require this remedy. 还有另一种减少市场冲击的方法。许多希望执行大额订单的机构交易者会将其拆分成许多较小的订单,并分批执行。这种交易方式确实能减少市场冲击;然而,它会产生另一种交易成本,即滑点。如第二章所述,滑点是触发交易信号的价格与整个订单的平均执行价格之间的差异。由于订单是在一段时间内执行的,滑点可能相当大。由于通过这种方式减少市场冲击可能会增加滑点,因此对于订单规模通常不足以需要此类措施的散户交易者来说,这种方法并不适用。
Sometimes, however, slippage is outside of your control: Perhaps your brokerage’s execution speed is simply too slow, due to either software issues (their software processes your orders too slowly), risk-control issues (your order has to be checked against your account’s buying power and pass various risk control criteria before it can be routed to the exchange), or pipeline issues (the brokerage’s speed of access to the exchanges). Or perhaps your brokerage does not have access to deep enough “dark-pool” liquidity. These execution costs and issues should affect your choice of brokerages, as I pointed out in Chapter 4. 然而,有时滑点是你无法控制的:可能是你的经纪商执行速度太慢,原因可能是软件问题(他们的软件处理你的订单太慢)、风险控制问题(你的订单必须经过账户购买力检查并通过各种风险控制标准后才能发送到交易所),或者是管道问题(经纪商访问交易所的速度)。或者你的经纪商可能无法访问足够深度的“暗池”流动性。正如我在第四章中指出的,这些执行成本和问题应影响你对经纪商的选择。
TESTING YOUR SYSTEM BY PAPER TRADING 通过模拟交易测试你的系统
After you have built your automated trading system, it is a good idea to test it in a paper trading account, if your brokerage provides one. Paper trading has a number of benefits; chief among them is that this is practically the only way to see if your ATS software has bugs without losing a lot of real money. 在你构建好自动交易系统后,如果你的经纪商提供模拟交易账户,最好在该账户中进行测试。模拟交易有许多好处;其中最重要的是,这是几乎唯一一种可以在不损失大量真实资金的情况下,检测你的 ATS 软件是否存在漏洞的方法。
Often, the moment you start paper trading, you will realize that there is a glaring look-ahead bias in your strategy-there may just be no way you could have obtained some crucial piece of data before you enter an order! If this happens, it is “back to the drawing board.” 通常,当你开始模拟交易时,你会意识到你的策略中存在明显的前瞻性偏差——你根本不可能在下单前获得某些关键数据!如果发生这种情况,就得“回到起点重新设计”。
You should be able run your ATS, execute paper trades, and then compare the paper trades and profit and loss (P&L) with the theoretical ones generated by your backtest program using the latest data. If the difference is not due to transaction costs (including an expected delay in execution for the paper trades), then your software likely has bugs. (I mentioned the names of some of the brokerages that offer paper trading accounts in Chapter 4.) 你应该能够运行你的自动交易系统(ATS),执行模拟交易,然后将模拟交易的盈亏与使用最新数据通过回测程序生成的理论盈亏进行比较。如果差异不是由于交易成本(包括模拟交易执行时预期的延迟)引起的,那么你的软件很可能存在漏洞。(我在第 4 章提到了提供模拟交易账户的一些券商名称。)
Another benefit of paper trading is that it gives you better intuitive understanding of your strategy, including the volatility of its P&L, the typical amount of capital utilized, the number of trades per day, and the various operational difficulties including data issues. Even though you can theoretically check out most of these features 模拟交易的另一个好处是,它能让你更直观地理解你的策略,包括盈亏的波动性、通常使用的资金量、每日交易次数以及各种操作难题,包括数据问题。尽管理论上你可以检查大多数这些特征
of your strategy in a backtest, one will usually gain intuition only if one faces them on a daily, ongoing basis. Backtesting also won’t reveal the operational difficulties, such as how fast you can download all the needed data before the market opens each day and how you can optimize your operational procedures in actual execution. (Do not underestimate the time required for preparing your orders before the market opens. It took me some 20 minutes to download and parse all my historical data each morning, and it took another 15 minutes or so to transmit all the orders to my account. If your trading strategy depends on data or news prior to the market open that cannot be more than 35 minutes old, then you need to either figure out a different execution environment or modify your strategy. It is hard to figure out all these timing issues until paper trading is conducted.) 在回测中了解策略的表现,通常只有在每天持续面对这些问题时,才能真正获得直觉。回测也无法揭示操作上的困难,比如每天开盘前你能多快下载所有需要的数据,以及如何在实际执行中优化操作流程。(不要低估开盘前准备订单所需的时间。我每天早上下载并解析所有历史数据大约花了 20 分钟,传送所有订单到我的账户又花了大约 15 分钟。如果你的交易策略依赖于开盘前不超过 35 分钟的数据或新闻,那么你要么需要找到不同的执行环境,要么修改你的策略。在进行模拟交易之前,很难弄清楚所有这些时间安排问题。)
If you are able to run a paper trading system for a month or longer, you may even be able to discover data-snooping bias, since paper trading is a true out-of-sample test. However, traders usually pay less and less attention to the performance of a paper trading system as time goes on, since there are always more pressing issues (such as the real trading programs that are being run). This inattention causes the paper trading system to perform poorly because of neglect and errors in operation. So data-snooping bias can usually be discovered only when you have actually started trading the system with a small amount of capital. 如果你能够运行一个模拟交易系统一个月或更长时间,你甚至可能发现数据窥探偏差,因为模拟交易是真正的样本外测试。然而,随着时间的推移,交易者通常会越来越少关注模拟交易系统的表现,因为总有更紧迫的问题(比如正在运行的真实交易程序)。这种忽视导致模拟交易系统因操作疏忽和错误而表现不佳。因此,数据窥探偏差通常只有在你实际用少量资金开始交易该系统时才能被发现。
WHY DOES ACTUAL PERFORMANCE DIVERGE FROM EXPECTATIONS? 实际表现为何会与预期不符?
Finally, after much hard work testing and preparing, you have entered your first order and it got executed! Whether you win or lose, you understand that it will take a while to find out if its performance meets your expectations. But what if after one month, then two months, and then finally a quarter has passed, the strategy still delivers a meager or maybe even negative returns? This disappointing experience is common to freshly minted quantitative traders. 最后,经过大量的测试和准备,你下了第一笔订单并成功执行!无论是赢是输,你都明白需要一段时间才能判断其表现是否符合预期。但如果一个月、两个月,甚至一个季度过去了,策略仍然只带来微薄甚至负收益呢?这种令人失望的经历对于新晋量化交易员来说很常见。
This would be the time to review the list of what might have caused this divergence from expectation. Start with the simplest diagnosis: 这将是回顾可能导致这种偏离预期的原因清单的时候。先从最简单的诊断开始:
Do you have bugs in your ATS software? 你的 ATS 软件中有错误吗?
Do the trades generated by your ATS match the ones generated by your backtest program? 你的 ATS 生成的交易是否与回测程序生成的交易相匹配?
Are the execution costs much higher than what you expected? 执行成本是否远高于你的预期?
Are you trading illiquid stocks that caused a lot of market impact? 你是否在交易流动性差的股票,导致了较大的市场冲击?
If the execution costs are much higher than what you expected, it may be worthwhile to reread the section on how to minimize transaction costs again. 如果执行成本远高于你的预期,可能值得重新阅读如何最小化交易成本的章节。
After these easy diagnoses have been eliminated, one is then faced with the two most dreaded causes of divergence: data-snooping bias and regime shifts. 在排除了这些简单的诊断后,接下来要面对的是两种最令人畏惧的偏差原因:数据窥探偏差和市场状态转变。
To see if data-snooping bias is causing the underperformance of your live trading, try to eliminate as many rules and as many parameters in your strategy as possible. If the backtest performance completely fell apart after this exercise, chances are you do have this bias and it is time to look for a new strategy. If the backtest performance is still reasonable, your poor live trading performance may just be due to bad luck. 为了判断数据窥探偏差是否导致了你的实盘交易表现不佳,尝试尽可能减少策略中的规则和参数。如果经过此操作后回测表现完全崩溃,很可能你确实存在这种偏差,是时候寻找新的策略了。如果回测表现仍然合理,那么你实盘表现不佳可能只是运气不好。
Regime shifts refer to the situation when the financial market structure or the macroeconomic environment undergoes a drastic change so much so that trading strategies that were profitable before may not be profitable now. 制度转变是指金融市场结构或宏观经济环境发生剧烈变化的情况,以至于之前盈利的交易策略现在可能不再盈利。
There are two noteworthy regime shifts in recent years related to market (or regulatory) structure that may affect certain strategies. 近年来,有两个与市场(或监管)结构相关的显著制度转变,可能会影响某些策略。
The first one is the decimalization of stock prices. Prior to early 2001, stock prices in the United States were quoted in multiples of onesixteenth and one-eighteenth of a penny. Since April 9, 2001, all US stocks have been quoted in decimals. This seemingly innocuous change has had a dramatic impact on the market structure, which is particularly negative for the profitability of statistical arbitrage strategies. 第一个是股票价格的小数化。在 2001 年初之前,美国股票价格以十六分之一和十八分之一美分的倍数报价。自 2001 年 4 月 9 日起,所有美国股票均以小数报价。这一看似无害的变化对市场结构产生了巨大影响,特别是对统计套利策略的盈利能力产生了负面影响。
The reason for this may be worthy of a book unto itself. In a nutshell, decimalization reduces frictions in the price discovery 其原因本身可能值得写一本书。简而言之,小数化减少了价格发现中的摩擦。
process, while statistical arbitrageurs mostly act as market makers and derive their profits from frictions and inefficiencies in this process. (This is the explanation given by Dr. Andrew Sterge in a Columbia University financial engineering seminar titled “Where Have All the Stat Arb Profits Gone?” in January 2008. Other industry practitioners have made the same point to me in private conversations.) Hence, we can expect backtest performance of statistical arbitrage strategies prior to 2001 to be far superior to their presentday performance. 过程,而统计套利者大多充当做市商,从这一过程中存在的摩擦和低效中获取利润。(这是安德鲁·斯特格博士在 2008 年 1 月哥伦比亚大学金融工程研讨会“统计套利利润都去哪儿了?”中给出的解释。其他业内从业者在私下交流中也表达了同样的观点。)因此,我们可以预期 2001 年之前统计套利策略的回测表现远优于现今的表现。
The other regime shift is relevant if your strategy shorts stocks. 另一个制度转变则与您的策略做空股票有关。
Prior to 2007, Securities and Exchange Commission (SEC) rules state that one cannot short a stock unless it is on a “plus tick” or “zero-plus tick.” Hence, if your backtest data include those earlier days, it is possible that a very profitable short position could not actually have been entered into due to a lack of plus ticks, or it could have been entered into only with a large slippage. This plus-tick rule was eliminated by the SEC in June 2007, and it was replaced by an alternative uptick rule (Rule 201) in February 2010. Therefore, your backtest results for a strategy that shorts stocks may show an artificially inflated performance prior to 2007 and after 2009 relative to their actual realizable performance. June 2007-February 2010 might provide the only realistic backtest period if you haven’t incorporated this rule! 在 2007 年之前,证券交易委员会(SEC)的规定是,除非股票处于“正向报价”或“零正向报价”状态,否则不能做空股票。因此,如果你的回测数据包含那些早期的日期,可能存在一种非常盈利的空头仓位实际上无法建立,因为缺乏正向报价,或者只能以较大的滑点建立。SEC 在 2007 年 6 月取消了这一正向报价规则,并在 2010 年 2 月用另一种上升报价规则(规则 201)取代了它。因此,对于做空股票的策略,你的回测结果在 2007 年之前和 2009 年之后可能会显示出相对于实际可实现表现的虚高收益。如果你没有考虑这条规则,2007 年 6 月至 2010 年 2 月可能是唯一现实的回测时间段!
Actually, there is another problem with realizing the backtest performance of a strategy that shorts stocks apart from this regulatory regime shift. Even without the plus-tick rule, many stocks, especially the small-cap ones or the ones with low liquidity, are “hard to borrow.” For you to be able to short a stock, your broker has to be able to borrow it from someone else (usually a large mutual fund or other brokerage clients) and lend it to you for selling. If no one is able or willing to lend you their stock, it is deemed hard to borrow and you would not be able to short it. Hence, again, a very profitable historical short position may not actually have been possible due to the difficulty of borrowing the stock. 实际上,除了监管制度的变化之外,实现做空股票策略的回测表现还有另一个问题。即使没有涨价规则,许多股票,尤其是小盘股或流动性较低的股票,也很“难以借入”。要做空一只股票,你的经纪商必须能够从其他人那里借到这只股票(通常是大型共同基金或其他经纪客户),然后借给你卖出。如果没有人能够或愿意借出他们的股票,这只股票就被认为是难以借入的,你也就无法做空它。因此,同样地,一个历史上非常盈利的空头头寸实际上可能由于借不到股票而无法实现。
The two regime shifts described here are the obvious and wellpublicized ones. However, there may be other, more subtle regime shifts that apply to your category of stocks that few people know 这里描述的两个制度变化是显而易见且广为宣传的。然而,可能还有其他更微妙的制度变化适用于你所关注的股票类别,鲜有人知晓,
about, but are no less disruptive to the profitability of your strategy’s performance. I will discuss how one might come up with a model that detects regime shifts automatically as one of the special topics of Chapter 7. 但这些变化对策略表现的盈利能力同样具有破坏性。我将在第 7 章的专题之一中讨论如何构建一个模型,自动检测制度变化。
SUMMARY 总结
An automated trading system is a piece of software that automatically generates and transmits orders to your brokerage account based on your trading strategy. There are three advantages to having this software: 自动交易系统是一种软件,能够根据你的交易策略自动生成并发送订单到你的经纪账户。拥有这款软件有三个优势:
It ensures the faithful adherence to your backtested strategy. 它确保严格遵循你经过回测的策略。
It eliminates manual operation so that you can simultaneously run multiple strategies. 它消除了手动操作,使你能够同时运行多个策略。
Most importantly, it allows speedy transmissions of orders, which is essential to high-frequency trading strategies. 最重要的是,它允许快速传输订单,这对于高频交易策略至关重要。
Regarding the difference between a semiautomated trading system and a fully automated trading system: 关于半自动交易系统和全自动交易系统的区别:
In a semiautomated trading system, the trader still needs to manually upload a text file containing order details to a basket trader or spread trader, and manually press a button to transmit the orders at the appropriate time. However, the order text file can be automatically generated by a program such as MATLAB, Python, or R. 在半自动交易系统中,交易者仍需手动上传包含订单详情的文本文件到篮子交易器或价差交易器,并在适当的时间手动按下按钮以传输订单。然而,订单文本文件可以由 MATLAB、Python 或 R 等程序自动生成。
In a fully automated trading system, the program will be able to automatically upload data and transmit orders throughout the trading day or even over many days. 在全自动交易系统中,程序能够在整个交易日甚至多天内自动上传数据并传输订单。
After the creation of an ATS, you can then focus on the various issues that are important in execution: minimizing transaction costs 创建 ATS 之后,你就可以专注于执行中重要的各种问题:最小化交易成本。
and paper trading. Minimizing transaction costs is mainly a matter of not allowing your order size to be too big relative to its average trading volume and relative to its market capitalization. Paper trading allows you to: 和模拟交易。最小化交易成本主要是避免你的订单规模相对于其平均交易量和市值过大。模拟交易可以让你:
Discover software bugs in your trading strategy and execution programs. 发现交易策略和执行程序中的软件漏洞。
Discover look-ahead or even data-snooping bias. 发现前瞻性偏差甚至数据窥探偏差。
Discover operating difficulties and plan for operating schedules. 发现操作上的困难并规划操作时间表。
Estimate transaction costs more realistically. 更现实地估算交易成本。
Gain important intuition about P&L volatility, capital usage, portfolio size, and trade frequency. 获得关于盈亏波动、资本使用、投资组合规模和交易频率的重要直觉。
Finally, what do you do in the situation where your live trading underperforms your backtest? You can start by addressing the usual problems: Eliminate bugs in the strategy or execution software; reduce transaction costs; and simplify the strategy by eliminating parameters. But, fundamentally, your strategy still may have suffered from data-snooping bias or regime shift. 最后,当你的实盘交易表现不如回测时,你该怎么办?你可以从解决常见问题开始:消除策略或执行软件中的错误;降低交易成本;通过去除参数简化策略。但从根本上讲,你的策略仍可能受到数据挖掘偏差或市场状态转变的影响。
If you believe (and you can only believe, as you can never prove this) that your poor live-trading performance is due to bad luck and not to data-snooping bias in your backtest nor to a regime shift, how should you proceed when the competing demands of perseverance and capital preservation seem to suggest opposite actions? This critical issue will be addressed in the next chapter, which discusses systematic ways to preserve capital in the face of losses and yet still be in a position to recover once the tide turns. 如果你相信(你只能相信,因为永远无法证明)你的实盘表现不佳是由于运气不好,而不是回测中的数据挖掘偏差或市场状态转变,那么当坚持和资本保全这两种相互矛盾的需求出现时,你应如何应对?这个关键问题将在下一章中讨论,内容涉及在亏损面前系统性地保护资本,同时在市场回暖时仍能恢复元气的方法。
CIIAPTER 6 第 6 章
Money and Risk Management 资金与风险管理
All trading strategies suffer occasional losses, technically known as drawdowns. The drawdowns may last a few minutes or a few years. To profit from a quantitative trading business, it is essential to manage your risks in a way that limits your drawdowns to a tolerable level and yet be positioned to use optimal leverage of your equity to achieve maximum possible growth of your wealth. Furthermore, if you have more than one strategy, you will also need to find a way to optimally allocate capital among them so as to maximize overall risk-adjusted return. 所有交易策略都会遭遇偶尔的亏损,技术上称为回撤。回撤可能持续几分钟,也可能持续几年。要从量化交易业务中获利,关键是以一种限制回撤在可承受范围内的方式管理风险,同时利用股本的最佳杠杆,实现财富的最大可能增长。此外,如果你拥有多个策略,还需要找到一种方法,在它们之间进行资本的最佳分配,以最大化整体的风险调整回报。
The optimal allocation of capital and the optimal leverage to use so as to strike the right balance between risk management and maximum growth is the focus of this chapter, and the central tool we use is called the Kelly formula. 本章的重点是资本的最佳分配和最佳杠杆使用,以在风险管理和最大增长之间取得恰当的平衡,我们使用的核心工具称为凯利公式。
OPTIMAL CAPITAL ALLOCATION AND LEVERAGE 最优资本配置与杠杆
Suppose you plan to trade several strategies, each with their own expected returns and standard deviations. How should you allocate capital among them in an optimal way? Furthermore, what should 假设你计划交易多个策略,每个策略都有其预期收益和标准差。你应该如何在它们之间以最优方式分配资本?此外,整体杠杆(投资组合规模与账户权益的比率)应当是多少?
be the overall leverage (ratio of the size of your portfolio to your account equity)? Dr. Edward Thorp, whom I mentioned in the preface, has written an excellent expository article on this subject in one of his papers (Thorp, 1997), and I shall follow his discussion closely in this chapter. (Dr. Thorp’s discussion is centered on a portfolio of securities, and mine is constructed around a portfolio of strategies. However, the mathematics are almost identical.) 我在前言中提到的 Edward Thorp 博士在他的论文中(Thorp, 1997)对这一主题写了一篇极好的说明性文章,本章我将紧跟他的讨论。(Thorp 博士的讨论以证券组合为中心,而我的则围绕策略组合构建,然而数学原理几乎完全相同。)
Every optimization problem begins with an objective. Our objective here is to maximize our long-term wealth-an objective that I believe is not controversial for the individual investor. Maximizing long-term wealth is equivalent to maximizing the long-term compounded growth rate gg of your portfolio. Note that this objective implicitly means that ruin (i.e., equity’s going to zero or less because of a loss) must be avoided. This is because if ruin can be reached with nonzero probability at some point, the long-term wealth is surely zero, as is the long-term growth rate. 每一个优化问题都始于一个目标。我们这里的目标是最大化我们的长期财富——我相信这个目标对于个人投资者来说是无可争议的。最大化长期财富等同于最大化投资组合的长期复合增长率 gg 。请注意,这个目标隐含地意味着必须避免破产(即由于亏损导致资产净值降至零或以下)。这是因为如果在某个时点破产有非零概率发生,那么长期财富肯定为零,长期增长率也同样为零。
(In all of the discussions, I assume that we reinvest all trading profits, and therefore it is the levered, compounded growth rate that is of ultimate importance.) (在所有讨论中,我假设我们将所有交易利润再投资,因此最终重要的是杠杆化的复合增长率。)
One approximation that I will make is that the probability distribution of the returns of each of the trading strategy ii is Gaussian, with a fixed mean m_(i)m_{i} and standard deviation s_(i)s_{i}. (The returns should be net of all financing costs; that is, they should be excess returns.) This is a common approximation in finance, but it can be quite inaccurate. Certain big losses in the financial markets occur with far higher frequencies (or viewed alternatively, at far higher magnitudes) than Gaussian probability distributions will allow. However, every scientific or engineering endeavor starts with the simplest model with the crudest approximation, and finance is no exception. I will discuss the remedies to such inaccuracies later in this chapter. 我将做出的一个近似是假设每个交易策略 ii 的收益概率分布是高斯分布,具有固定的均值 m_(i)m_{i} 和标准差 s_(i)s_{i} 。(收益应扣除所有融资成本;也就是说,它们应为超额收益。)这是金融领域常用的近似,但可能相当不准确。金融市场中某些重大亏损发生的频率远高于高斯概率分布所允许的频率(或者换个角度看,亏损的幅度远大于高斯分布所能描述的)。然而,所有科学或工程工作都是从最简单的模型和最粗糙的近似开始的,金融领域也不例外。我将在本章后面讨论针对这些不准确性的补救方法。
Let’s denote the optimal fractions of your equity that you should allocate to each of your nn strategies by a column vector F^(**)=(f_(1)^(**),f_(2)^(**):}F^{*}=\left(f_{1}^{*}, f_{2}^{*}\right., dots,f_(n)^(**))^(T)\left.\ldots, f_{n}^{*}\right)^{T}. Here, T means transpose. 我们用列向量 F^(**)=(f_(1)^(**),f_(2)^(**):}F^{*}=\left(f_{1}^{*}, f_{2}^{*}\right. , dots,f_(n)^(**))^(T)\left.\ldots, f_{n}^{*}\right)^{T} 表示你应该分配给每个 nn 策略的最优权益比例。这里,T 表示转置。
Given our optimization objective and the Gaussian assumption, Dr. Thorp has shown that the optimal allocation is given by 基于我们的优化目标和高斯假设,Thorp 博士已经证明最优分配为
F^(**)=C^(-1)MF^{*}=C^{-1} M
Here, CC is the covariance matrix such that matrix element C_(ij)C_{i j} is the covariance of the returns of the i^("th ")i^{\text {th }} and j^("th ")j^{\text {th }} strategies, -1 indicates matrix inverse, and M=(m_(1),m_(2),dots,m_(n))^(T)M=\left(m_{1}, m_{2}, \ldots, m_{\mathrm{n}}\right)^{T} is the column vector of mean returns of the strategies. Note that these returns are oneperiod, simple (uncompounded), unlevered returns. For example, if the strategy is long $1\$ 1 of stock A and short $1\$ 1 of stock B and made $0.10\$ 0.10 profit in a period, mm is 0.05 , no matter what the equity in the account is. 这里, CC 是协方差矩阵,其中矩阵元素 C_(ij)C_{i j} 是 i^("th ")i^{\text {th }} 和 j^("th ")j^{\text {th }} 策略收益的协方差,-1 表示矩阵的逆, M=(m_(1),m_(2),dots,m_(n))^(T)M=\left(m_{1}, m_{2}, \ldots, m_{\mathrm{n}}\right)^{T} 是策略平均收益的列向量。注意,这些收益是单期的、简单的(非复利的)、无杠杆的收益。例如,如果策略是多头持有股票 A 的 $1\$ 1 ,空头持有股票 B 的 $1\$ 1 ,并在一个周期内获得了 $0.10\$ 0.10 的利润,那么 mm 是 0.05,无论账户中的权益是多少。
If we assume that the strategies are all statistically independent, the covariance matrix becomes a diagonal matrix, with the diagonal elements equal to the variance of the individual strategies. This leads to an especially simple formula: 如果我们假设所有策略都是统计独立的,协方差矩阵将变成对角矩阵,对角线元素等于各个策略的方差。这会得到一个特别简单的公式:
f_(i)=m_(i)//s_(i)^(2)f_{i}=m_{i} / s_{i}^{2}
This is the famous Kelly formula (for the many interesting stories surrounding this formula, see, for example, Poundstone, 2005) as applied to continuous finance as opposed to gambling with discrete outcomes, and it gives the optimal leverage one should employ for a particular trading strategy. 这就是著名的凯利公式(关于该公式的许多有趣故事,参见例如 Poundstone,2005),它应用于连续金融领域,而非具有离散结果的赌博,给出了某一特定交易策略应采用的最优杠杆率。
Interested readers can look up a simple derivation of the Kelly formula at the end of this chapter in the simple one-strategy case. 感兴趣的读者可以在本章末尾查看凯利公式的简单推导,针对单一策略的情况。
Example 6.1: An Interesting Puzzle (or Why Risk Is Bad for You) ^(1){ }^{\mathbf{1}} 例子 6.1:一个有趣的谜题(或者说为什么风险对你不利) ^(1){ }^{\mathbf{1}}
Here is a little puzzle that may stymie many a professional trader. Suppose a certain stock exhibits a true (geometric) random walk, by which I mean there is a 50-5050-50 chance that the stock is going up 1 percent or down 1 percent every minute. If you buy this stock, are you most likely-in the long run and ignoring financing costs-to make money, lose money, or be flat? 这里有一个小谜题,可能会难倒许多专业交易员。假设某只股票表现出真正的(几何)随机游走,我的意思是这只股票每分钟有 50-5050-50 的概率上涨 1%或下跌 1%。如果你买入这只股票,从长远来看且忽略融资成本,你最有可能是赚钱、亏钱还是持平?
Most traders will blurt out the answer “Flat!” and that is wrong. The correct answer is that you will lose money, at the rate of 0.005 percent 大多数交易员会脱口而出“持平!”这个答案是错误的。正确答案是你会亏钱,亏损率为 0.005%
Abstract 摘要
(or 0.5 basis point) every minute! This is because for a geometric random walk, the average compounded rate of return is not the short-term (or one-period) return mm ( 0 here), but is g=m-s^(2)//2g=m-s^{2} / 2. This follows from the general formula for compounded growth g(f)g(f) given in the appendix to this chapter, with the leverage ff set to 1 and risk-free rate rr set to 0 . This is also consistent with the fact that the geometric mean of a set of numbers is always smaller than the arithmetic mean (unless the numbers are identical, in which case the two means are the same). When we assume, as I did, that the arithmetic mean of the returns is zero, the geometric mean, which gives the average compounded rate of return, must be negative. (或每分钟 0.5 个基点!)这是因为对于几何随机游走,平均复合收益率不是短期(或单期)收益率 mm (此处为 0),而是 g=m-s^(2)//2g=m-s^{2} / 2 。这源自本章附录中给出的复合增长的一般公式 g(f)g(f) ,其中杠杆 ff 设为 1,无风险利率 rr 设为 0。这也符合这样一个事实:一组数的几何平均数总是小于算术平均数(除非这些数完全相同,此时两者相等)。当我们像我一样假设收益的算术平均数为零时,几何平均数,即平均复合收益率,必然为负。
The take-away lesson here is that risk always decreases long-term growth rate-hence the importance of risk management! (See also Box 6.1 on “Loss Aversion Is Not a Behavioral Bias.”) 这里的启示是风险总是降低长期增长率——因此风险管理非常重要!(另见 6.1 框“损失厌恶不是行为偏差”。)
Often, because of uncertainties in parameter estimations, and also because return distributions are not really Gaussian, traders prefer to cut this recommended leverage in half for safety. This is called half-Kelly betting. 通常,由于参数估计存在不确定性,且收益分布实际上并非高斯分布,交易者倾向于将推荐的杠杆减半以保证安全。这被称为半凯利投注。
If you have a retail trading account, your maximum overall leverage ll will be restricted to either 2 or 4 , depending on whether you hold the positions overnight or just intraday. In this situation, you would have to reduce each f_(i)f_{i} by the same factor l//(|f_(1)|+|f_(2)|+dots+|f_(n)|)l /\left(\left|f_{1}\right|+\left|f_{2}\right|+\ldots+\left|f_{n}\right|\right), where |f_(1)|+|f_(2)|+dots+|f_(n)|\left|f_{1}\right|+\left|f_{2}\right|+\ldots+\left|f_{n}\right| is the total unrestricted leverage of the portfolio. Here, we ignore the possibility that some of your individual strategies may hold positions that offset each other (such as a long and a short position each balanced with short and long T-bills, respectively), which may allow you to hold a higher leverage than this formula suggests. 如果您有一个零售交易账户,您的最大整体杠杆 ll 将被限制为 2 或 4,具体取决于您是隔夜持仓还是仅日内持仓。在这种情况下,您必须将每个 f_(i)f_{i} 按相同的因子 l//(|f_(1)|+|f_(2)|+dots+|f_(n)|)l /\left(\left|f_{1}\right|+\left|f_{2}\right|+\ldots+\left|f_{n}\right|\right) 缩减,其中 |f_(1)|+|f_(2)|+dots+|f_(n)|\left|f_{1}\right|+\left|f_{2}\right|+\ldots+\left|f_{n}\right| 是投资组合的总无限制杠杆。在这里,我们忽略了某些单独策略可能持有相互抵消的头寸(例如多头和空头头寸分别与空头和多头国库券平衡)的可能性,这可能允许您持有比该公式所示更高的杠杆。
I stated that adopting this capital allocation and leverage will allow us to maximize the long-term compounded growth rate of your equity. So what is this maximum compounded growth rate? It turns out to be 我曾说过,采用这种资金分配和杠杆将使我们最大化您权益的长期复合增长率。那么,这个最大复合增长率是多少呢?事实证明是
g=r+S^(2)//2g=r+S^{2} / 2
where the SS is none other than the Sharpe ratio of your portfolio! As I mentioned in Chapter 2, the higher the Sharpe ratio of your portfolio 其中 SS 正是您的投资组合的夏普比率!正如我在第二章中提到的,您的投资组合
(or strategy), the higher the maximum growth rate of your equity (or wealth), provided you use the optimal leverage recommended by the Kelly formula. Here is the simple mathematical embodiment of this fact. (或策略)的夏普比率越高,您的权益(或财富)的最大增长率就越高,前提是您使用凯利公式推荐的最优杠杆。这一事实的简单数学体现如下。
Example 6.2: Calculating the Optimal Leverage Based on the Kelly Formula 示例 6.2:基于凯利公式计算最优杠杆
Let’s see an example of the Kelly formula at work. Suppose our portfolio consists of just a long position in SPY, the exchange-traded fund (ETF) tracking the S&P 500 index. Let’s suppose that the mean annual return of SPY is 11.23 percent, with an annualized standard deviation of 16.91 percent, and that the risk-free rate is 4 percent. Hence, the portfolio has an annual mean excess return of 7.231 percent and an annual standard deviation of 16.91 percent, giving it a Sharpe ratio of 0.4275 . The optimal leverage according to the Kelly formula is f=0.07231//0.1691^(2)=2.528f=0.07231 / 0.1691^{2}=2.528. (Notice one interesting tidbit: The Kelly ff is independent of time scale, so it actually does not matter whether you annualize your return and standard deviation, as opposed to the Sharpe ratio, which is time scale dependent.) Finally, the annualized compounded, levered growth rate is 13.14 percent, which includes the financing costs. 让我们来看一个凯利公式的实际应用例子。假设我们的投资组合仅包含对 SPY 的多头头寸,SPY 是跟踪标普 500 指数的交易型开放式指数基金(ETF)。假设 SPY 的年均收益率为 11.23%,年化标准差为 16.91%,无风险利率为 4%。因此,投资组合的年均超额收益率为 7.231%,年化标准差为 16.91%,其夏普比率为 0.4275。根据凯利公式,最优杠杆为 f=0.07231//0.1691^(2)=2.528f=0.07231 / 0.1691^{2}=2.528 。(注意一个有趣的细节:凯利 ff 与时间尺度无关,因此无论你是否将收益率和标准差年化都无所谓,而夏普比率则依赖于时间尺度。)最后,年化复合杠杆增长率为 13.14%,其中已包含融资成本。
You can verify these numbers yourselves by downloading the SPY daily prices from Yahoo! Finance and computing the various quantities on a spreadsheet. I did that on December 29, 2007, and my spreadsheet is available at epchan.com/book/example6_2.xIs. In column H, I have computed the daily returns of the (adjusted) closing prices of SPY, while in row 3760 starting at column H, I have computed the (annualized) mean return of SPY, the standard deviation of SPY, the mean excess return of the portfolio, the Sharpe ratio of the portfolio, the Kelly leverage, and, finally, the compounded growth rate. 你可以通过从雅虎财经下载 SPY 的每日价格,并在电子表格中计算各种数值,自己验证这些数字。我在 2007 年 12 月 29 日做了这个操作,我的电子表格可在 epchan.com/book/example6_2.xIs 下载。在 H 列,我计算了 SPY(调整后)收盘价的每日收益率,而在第 3760 行从 H 列开始,我计算了 SPY 的(年化)平均收益率、SPY 的标准差、投资组合的平均超额收益率、投资组合的夏普比率、凯利杠杆率,最后是复合增长率。
The Kelly leverage of 2.528 that we computed is saying that, for this strategy, if you have $100,000 in cash to invest, and if you really believe the expected values of your returns and standard deviations, you should borrow money to buy $252,800\$ 252,800 worth of SPY. Furthermore, expect an annual compounded return on your $100,000\$ 100,000 investment to be 13.14 percent. 我们计算出的凯利杠杆率为 2.528,这意味着对于该策略,如果你有 10 万美元现金可投资,并且你确实相信你的收益和标准差的期望值,那么你应该借钱购买价值 $252,800\$ 252,800 的 SPY。此外,预计你对 $100,000\$ 100,000 投资的年复合回报率将达到 13.14%。
For comparison, let’s see what compounded growth rate we will get if we did not leverage our investment (see the formula in the appendix to this chapter): g=r+m-s^(2)//2=0.1123-(0.1691)^(2)//2=9.8g=r+m-s^{2} / 2=0.1123-(0.1691)^{2} / 2=9.8 percent, where m is the annualized mean return and s is the annualized standard deviation of returns. This, and not mean annual return of 11.23 percent, is the long-term growth rate of buying SPY with cash only. 作为比较,让我们看看如果不使用杠杆投资,我们将获得的复合增长率是多少(参见本章附录中的公式): g=r+m-s^(2)//2=0.1123-(0.1691)^(2)//2=9.8g=r+m-s^{2} / 2=0.1123-(0.1691)^{2} / 2=9.8 %,其中 m 是年化平均收益率,s 是年化收益率的标准差。这个值,而不是 11.23%的年均收益率,才是仅用现金购买 SPY 的长期增长率。
Example 6.3: Calculating the Optimal Allocation Using the Kelly Formula 示例 6.3:使用凯利公式计算最优配置比例
We pick three sector-specific ETFs and see how we should allocate capital among them to achieve the maximum growth rate for the portfolio. The three ETFs are: OIH (oil service), RKH (regional bank), and RTH (retail). The daily prices are downloaded from Yahoo! Finance and saved in epchan. com/book as OIH.xls, RKH.xls, and RTH.xls. 我们选择三个特定行业的 ETF,看看如何在它们之间分配资金,以实现投资组合的最大增长率。这三个 ETF 分别是:OIH(石油服务)、RKH(地区银行)和 RTH(零售)。每日价格数据从 Yahoo! Finance 下载,并保存在 epchan.com/book 中,文件名分别为 OIH.xls、RKH.xls 和 RTH.xls。
Using MATLAB 使用 MATLAB
Here is the MATLAB program (epchan.com/book/example6_3.m) to retrieve these files and calculate M,CM, C, and FF. 这是用于检索这些文件并计算 M,CM, C 和 FF 的 MATLAB 程序(epchan.com/book/example6_3.m)。
% make sure previously defined variables are erased.
clear;
% read a spreadsheet named "OIH.xls" into MATLAB.
[numl, txtl]=xlsread('OIH');
% the first column (starting from the second row) is
% the trading days in format mm/dd/yyyy.
tday1=txt1(2:end, 1);
tday1=datestr(datenum(tday1, 'mm/dd/yyyy'), 'yyyymmdd');
% convert the format into yyyymmdd.
% convert the date strings first into cell arrays
% and then into numeric format.
tday1=str2double(cellstr(tday1));
% the last column contains the adjusted close prices.
adjcls1=num1(:, end);
% read a spreadsheet named "RKH.xls" into MATLAB.
[num2, txt2]=xlsread('RKH');
% the first column (starting from the second row) is
% the trading days in format mm/dd/yyyy.
tday2=txt2(2:end, 1);
% convert the format into yyyymmdd.
tday2 = . .
datestr(datenum(tday2, 'mm/dd/yyyy'), 'yyyymmdd');
% convert the date strings first into cell arrays and
% then into numeric format.
tday2=str2double(cellstr(tday2));
adjcls2=num2(:, end);
% read a spreadsheet named "RTH.xls" into MATLAB.
[num3, txt3]=xlsread('RTH');
% the first column (starting from the second row) is
% the trading days in format mm/dd/yyyy.
tday3=txt3(2:end, 1);
% convert the format into yyyymmdd.
tday3=..
datestr(datenum(tday3, 'mm/dd/yyyy'), 'yyyymmdd');
% convert the date strings first into cell arrays and
% then into numeric format.
tday3=str2double(cellstr(tday3));
adjcls3=num3(:, end);
% merge these data
tday=union(tday1, tday2);
tday=union(tday, tday3);
adjcls=NaN(length(tday), 3);
[foo idx1 idx]=intersect(tday1, tday);
adjcls(idx, 1)=adjcls1(idx1);
[foo idx2 idx]=intersect(tday2, tday);
adjcls(idx, 2)=adjcls2(idx2);
[foo idx3 idx]=intersect(tday3, tday);
adjcls(idx, 3)=adjcls3(idx3);
ret=(adjcls-lag1(adjcls))./lag1(adjcls); % returns
% days where any one return is missing
baddata=find(any(~isfinite(ret), 2));
% eliminate days where any one return is missing
ret (baddata,:)=[];
% excess returns: assume annualized risk free rate is 4%
excessRet=ret-repmat(0.04/252, size(ret));
% annualized mean excess returns
M=252*mean(excessRet, 1)'% M =
%
% 0.1396
% 0.0294
% -0.0073
C=252*cov(excessRet) % annualized covariance matrix
% C =
%
% 0.1109 0.0200 0.0183
% 0.0200 0.0372 0.0269
% 0.0183 0.0269 0.0420
F=inv(C)*M % Kelly optimal leverages
% F =
%
% 1.2919
% 1.1723
% -1.4882
Notice that the mean excess return of RTH is negative. Given this, it is not surprising that the Kelly formula recommends we short RTH. 注意,RTH 的平均超额收益为负。鉴于此,凯利公式建议我们做空 RTH 也就不足为奇了。
You might wonder what the Sharpe ratio and the maximum compounded growth rate generated using this optimal allocation are. It turns out that the maximum growth rate of a multistrategy Gaussian process is 你可能想知道使用这个最优配置所产生的夏普比率和最大复合增长率是多少。事实证明,多策略高斯过程的最大增长率是
g(F^(**))=r+F^(**T)CF^(**)//2g\left(F^{*}\right)=r+F^{* T} C F^{*} / 2
Here is the MATLAB code snippet that calculates these two quantities: 以下是计算这两个量的 MATLAB 代码片段:
% Maximum annualized compounded growth rate
g=0.04+F'*C*F/2 % g =
%
% 0.1529
S=sqrt(F'*C*F) % Sharpe ratio of portfolio
% S =
%
% 0.4751
Notice that the compounded growth rate of the portfolio is 15.29 percent, higher than that of the maximum growth rate achievable by any of the individual stocks. (As an exercise, you can verify that the compounded growth rate of OIH, which has the highest one-period return among the three stocks, is 12.78 percent.) 注意,投资组合的复合增长率为 15.29%,高于任何单只股票所能达到的最大增长率。(作为练习,你可以验证三只股票中单期回报最高的 OIH 的复合增长率为 12.78%。)
Using Python 使用 Python
Here is the equivalent Python Jupyter Notebook code, downloadable as example6_3.ipynb. 以下是等效的 Python Jupyter Notebook 代码,可下载为 example6_3.ipynb。
Calculating the Optimal Allocation Using Kelly formula
import numpy as np
import pandas as pd
from numpy.linalg import inv
df1=pd.read_excel('OIH.xls')
Here is the R code, downloadable as example6_3.R. 以下是 R 代码,可下载为 example6_3.R。
library('zoo')
source('calculateReturns.R')
source('calculateMaxDD.R')
source('backshift.R')
datal <- read.delim("OIH.txt") # Tab-delimited
data_sort1 <- datal[order(as.Date(datal[,1],
'%m/%d/%Y')),] # sort in ascending order of dates
(1st column of data)
tday1 <- as.integer(format(as.Date(data_sort1[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls1 <- data_sort1[,ncol(data_sort1)]
data2 <- read.delim("RKH.txt") # Tab-delimited
data_sort2 <- data2[order(as.Date(data2[,1],
'%m/%d/%Y')),] # sort in ascending order of dates
(1st column of data)
tday2 <- as.integer(format(as.Date(data_sort2[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls2 <- data_sort2[,ncol(data_sort2)]
data3 <- read.delim("RTH.txt") # Tab-delimited
data_sort3 <- data3[order(as.Date(data3[,1],
'%m/%d/%Y')),] # sort in ascending order of dates
(1st column of data)
tday3 <- as.integer(format(as.Date(data_sort3[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls3 <- data_sort3[,ncol(data_sort3)]
# merge these data
tday <- union(tday1, tday2)
tday <- union(tday, tday2)
tday <- tday[order(tday)]
adjcls <- matrix(NaN, length(tday), 3)
adjcls[tday %in% tday1, 1] <- adjcls1
adjcls[tday %in% tday2, 2] <- adjcls2
adjcls[tday %in% tday3, 3] <- adjcls3
ret <- calculateReturns(adjcls, 1) # daily returns
excessRet <- ret - 0.04/252 # excess returns: assume
annualized risk free rate is 4%
# annualized mean excess returns
M <- 252*colMeans(excessRet, na.rm = TRUE)
# c(0.143479780597111, 0.0560305170502084,
-0.0073464585163155)
# annualized covariance matrix
C<- 252*cov(excessRet, use = "pairwise.complete.obs")
# 0.11305722 0.01986547 0.01825486
# 0.01986547 0.04263365 0.02689284
# 0.01825486 0.02689284 0.04196684
# Kelly optimal leverage
F <- solve(C) %*% M
# 1.240554
# 1.992328
# -1.991381
# Maximum annualized compounded growth rate
g <- 0.04+t(F) %*% C %*% F/2 # 0.1921276
# Sharpe ratio of portfolio
S <- sqrt(t(F) %*% C %*% F) # 0.5515933
Note that following the Kelly formula requires you to continuously adjust your capital allocation as your equity changes so that it remains optimal. Based on the SPY example (Example 6.2), let’s say you followed the Kelly formula and bought a portfolio worth $252,800\$ 252,800. The next day, disaster struck, and you lost 10 percent on SPY. So now your portfolio is worth only $227,520\$ 227,520, and your equity is now only $74,720\$ 74,720. What should you do now? Kelly’s criterion will dictate that you immediately reduce the size of your portfolio to $188,892\$ 188,892. Why? Because the optimal leverage of 2.528 times the current equity of $74,720\$ 74,720 is $188,892\$ 188,892. 请注意,遵循 Kelly 公式需要你随着股本的变化不断调整你的资金分配,以保持最优。基于 SPY 的例子(例 6.2),假设你按照 Kelly 公式买入了价值 $252,800\$ 252,800 的投资组合。第二天,灾难发生了,你在 SPY 上损失了 10%。所以现在你的投资组合价值只有 $227,520\$ 227,520 ,你的股本也只有 $74,720\$ 74,720 。你现在应该怎么办?Kelly 准则会要求你立即将投资组合规模减少到 $188,892\$ 188,892 。为什么?因为当前股本 $74,720\$ 74,720 的最优杠杆倍数是 2.528 倍,等于 $188,892\$ 188,892 。
As a practical procedure, this continuous updating of the capital allocation should occur at least once at the end of each trading day. In addition to updating the capital allocation, one should also periodically update F^(**)F^{*} itself by recalculating the most recent trailing mean return and standard deviation. What should the lookback period be, and how often do you need to update these inputs to the Kelly formula? These depend on the average holding period of your strategy. If you hold your positions for only one day or so, then as a 作为一种实用的操作程序,这种对资金分配的持续更新应至少在每个交易日结束时进行一次。除了更新资金分配外,还应定期通过重新计算最近的滚动平均收益率和标准差来更新 F^(**)F^{*} 本身。回溯期应该是多少?你需要多频繁地更新凯利公式的这些输入?这取决于你的策略的平均持仓期。如果你持仓时间只有一天左右,那么作为一种
rule of thumb, I would advise using a lookback period of six months. Using a relatively short lookback period has the advantage of allowing you to gradually reduce your exposure to strategies that have been losing their performance. As for the frequency of update, it should not be a burden to update F^(**)F^{*} daily once you have written a program to do so. 经验法则,我建议使用六个月的回溯期。使用相对较短的回溯期的优点是可以让你逐步减少对表现下滑策略的敞口。至于更新频率,一旦你编写了相应的程序,每天更新 F^(**)F^{*} 不应成为负担。
One last point: Some strategies generate a variable number of trading signals each day, which may result in a variable number of positions and thus total capital each day. How should the Kelly formula be used to determine the capital in this case when we don’t know what it will be beforehand? One can still use the Kelly formula to determine the maximum number of positions and thus the maximum capital allowed. It is always safer to have a leverage below what the Kelly formula recommends. 最后一点:有些策略每天会产生不定数量的交易信号,这可能导致每天持仓数量和总资金量的变化。在事先不知道具体数值的情况下,如何使用凯利公式来确定资金规模?仍然可以使用凯利公式来确定最大持仓数量,从而确定允许的最大资金规模。保持杠杆低于凯利公式推荐的水平总是更安全的。
RISK MANAGEMENT 风险管理
We saw in the previous section that the Kelly formula is not only useful for the optimal allocation of capital and for the determination of the optimal leverage, but also for risk management. In fact, the SPY example (Example 6.2) illustrated that the Kelly formula would advise you to reduce the portfolio size in the face of trading losses. This selling at a loss is the frequent result of risk management, whether or not the risk management scheme is based on the Kelly formula. 我们在上一节中看到,凯利公式不仅对资本的最优分配和最优杠杆的确定有用,还对风险管理有帮助。事实上,SPY 的例子(例 6.2)说明了凯利公式会建议你在交易亏损时减少投资组合规模。无论风险管理方案是否基于凯利公式,亏损时卖出通常是风险管理的常见结果。
Risk management always dictates that you should reduce your position size whenever there is a loss, even when it means realizing those losses. (The other face of the coin is that optimal leverage dictates that you should increase your position size when your strategy generates profits.) This kind of selling is believed by some analysts to be the cause of “financial contagion” affecting many large hedge funds simultaneously when one faces a large loss. 风险管理总是要求你在出现亏损时减少持仓规模,即使这意味着要实现这些亏损。(另一方面,最优杠杆要求你在策略产生利润时增加持仓规模。)一些分析师认为,这种卖出行为是导致“金融传染”的原因之一,当一个大型对冲基金遭遇重大亏损时,许多大型对冲基金会同时受到影响。
An example of this is the summer 2007 meltdown, described in the previously cited article “What Happened to the Quants in August 2007?” by Amir Khandani and Andrew Lo (Khandani and Lo, 2007). During August 2007, under the ominous cloud of a housing 一个例子是 2007 年夏季的崩盘,详见前文引用的 Amir Khandani 和 Andrew Lo 所著文章《2007 年 8 月量化基金发生了什么?》(Khandani 和 Lo,2007)。2007 年 8 月,在房地产市场阴云笼罩之下,
and mortgage default crisis, a number of well-known hedge funds experienced unprecedented losses, with Goldman Sachs’s Global Alpha fund falling 22.5 percent. Several billion dollars evaporated within one week. Even Renaissance Technologies Corporation, arguably the most successful quantitative hedge fund of all time, lost 8.7 percent in the first half of August, though it later recovered most of it. Not only is the magnitude of the loss astounding, but the widespread nature of it was causing great concern in the financial community. Strangest of all, few of these funds hold any mortgagebacked securities at all, ostensibly the root cause of the panic. It therefore became a classic study of financial contagion as propagated by hedge funds. 在次贷和抵押贷款违约危机期间,许多知名对冲基金遭遇了前所未有的损失,高盛的 Global Alpha 基金下跌了 22.5%。数十亿美元在一周内蒸发。即使是被认为是有史以来最成功的量化对冲基金——文艺复兴科技公司,也在八月上半月损失了 8.7%,尽管后来大部分损失得以恢复。不仅损失的规模令人震惊,其广泛的影响也在金融界引起了极大关注。最奇怪的是,这些基金中很少有持有任何抵押贷款支持证券,而这恰恰是恐慌的根源。因此,这成为了对冲基金传播金融传染的经典案例研究。
Another example is the January 2021 GameStop short squeeze Wang (2021). As this stock was surging due to traders’ promotion on Reddit’s r//\mathrm{r} / wallstreetbets forum and the subsequent coordinated buying of the stock and especially its call options, large hedge funds suffered enormous losses as they bought cover for their short positions. Renaissance Institutional Equities Fund fell 9.5 percent, while Melvin Capital lost a whopping 53 percent. 另一个例子是 2021 年 1 月的 GameStop 做空挤压事件(Wang,2021)。由于交易者在 Reddit 的 r//\mathrm{r} / wallstreetbets 论坛上推广该股票,随后协调买入该股票,尤其是其看涨期权,导致股价飙升,大型对冲基金在为其空头头寸买入对冲时遭受巨大损失。文艺复兴机构股票基金下跌了 9.5%,而 Melvin Capital 则损失了惊人的 53%。
This kind of contagion occurs because a large loss by one hedge fund causes it to sell off some large positions that it holds (whether or not these are the positions that cause the loss in the first place). This selling causes the prices of the securities to drop (or rise in the case of short positions). If other hedge funds are holding similar positions, they will then suffer large losses also, causing their own risk management system to sell off their own positions, and on and on. For example, in the summer of 2007, one large hedge fund might have been holding subprime mortgage-backed securities and suffered a large loss in that sector. Risk management then required that it sell off liquid stock positions in their portfolio that might, up to that point, be unaffected by the subprime debacle. Because of the selling of such stock positions, other statistical arbitrage hedge funds that hold no mortgage-backed securities might now have suffered big losses, and have proceeded to sell their stocks as well. Hence, a sell-off in the mortgage-backed securities market suddenly became a sell-off in the stock market-a nice demonstration of the meaning of contagion. Similarly, in January 2021, the losses in short 这种传染效应的发生是因为一个对冲基金遭受了巨额亏损,导致它不得不抛售其持有的一些大额头寸(无论这些头寸是否最初导致了亏损)。这种抛售使得相关证券的价格下跌(对于空头头寸则是价格上涨)。如果其他对冲基金持有类似头寸,它们也会遭受巨大亏损,从而触发自身的风险管理系统抛售自己的头寸,如此循环。例如,2007 年夏天,一家大型对冲基金可能持有次级抵押贷款支持证券,并在该领域遭受重大亏损。风险管理要求它抛售投资组合中流动性较好的股票头寸,而这些头寸在此前可能未受次贷危机影响。由于这些股票头寸的抛售,其他不持有抵押贷款支持证券的统计套利对冲基金可能也遭受了重大亏损,并开始抛售股票。因此,抵押贷款支持证券市场的抛售突然演变成了股票市场的抛售——这很好地展示了传染效应的含义。 同样地,在 2021 年 1 月,空头的亏损
GameStop positions in some long-short funds caused them to buy cover other unrelated short positions while simultaneously selling unrelated long positions due to overall portfolio deleveraging, leading to losses in other funds that hold similar short and long positions. They were also forced to buy and sell the same stocks due to their own deleveraging, leading to a contagion. 一些多空基金中持有的 GameStop 头寸导致它们在整体组合去杠杆的过程中,不得不买入以覆盖其他无关的空头头寸,同时卖出无关的多头头寸,进而导致持有类似多空头寸的其他基金出现亏损。它们还因自身去杠杆被迫买卖同一只股票,导致了连锁反应。
Given the necessity of realizing losses as well as the scale and frequency of trading required to constantly rebalance the portfolio in order to closely follow the Kelly formula, it is understandable that most traders prefer to trade at half-Kelly leverage. A lower leverage implies a smaller size of the selling required for risk management. 鉴于实现亏损的必要性以及为了紧密遵循凯利公式而不断调整投资组合所需的交易规模和频率,大多数交易者更倾向于以半凯利杠杆进行交易是可以理解的。较低的杠杆意味着为风险管理所需的卖出规模较小。
Sometimes, even taking the conservative half-Kelly formula may be too aggressive, and traders may want to limit their portfolio size further by additional constraints. This is because, as I pointed out previously, the application of the Kelly formula to continuous finance is premised on the assumption that return distribution is Gaussian. (Finance is continuous in the sense that the outcomes of making bets in the financial market fall on a continuum of profits or losses, as opposed to a game of cards where the outcomes fall into discrete cases.) But, of course, the returns are not really Gaussian: large losses occur at far higher frequencies than would be predicted by a nice bell-shaped curve. Some people refer to the true distributions of returns as having fat tails. What this means is that the probability of an event far, far away from the mean is much higher than allowed by the Gaussian bell curve. These highly improbable events have been called black swan events by the author Nassim Taleb (see Taleb, 2007). 有时,即使采用保守的半凯利公式也可能过于激进,交易者可能希望通过额外的限制进一步限制其投资组合规模。这是因为,正如我之前指出的,凯利公式在连续金融中的应用是基于收益分布为高斯分布的假设。(金融是连续的,意思是金融市场下注的结果落在利润或亏损的连续区间上,而不是像纸牌游戏那样结果分为离散的几种情况。)但当然,收益实际上并非高斯分布:大额亏损发生的频率远高于漂亮的钟形曲线所预测的。有些人称真实的收益分布为厚尾分布。这意味着远离均值的事件发生的概率远高于高斯钟形曲线所允许的概率。这些极不可能发生的事件被作者纳西姆·塔勒布称为黑天鹅事件(参见 Taleb,2007)。
To handle extreme events that fall outside the Gaussian distribution, we can use our simple backtest technique to roughly estimate what the maximum one-period loss was historically. (The period may be one week, one day, or one hour. The only criterion to use is that you should be ready to rebalance your portfolio according to the Kelly formula at the end of every period.) You should also have in mind what is the maximum one-period drawdown on your equity that you are willing to suffer. Dividing the maximum tolerable oneperiod drawdown on equity by the maximum historical loss will tell you whether even half-Kelly leverage is too large for your comfort. 为了应对超出高斯分布的极端事件,我们可以使用简单的回测技术来大致估计历史上的最大单周期损失。(周期可以是一周、一日或一小时。唯一的标准是你应准备在每个周期结束时根据凯利公式重新平衡你的投资组合。)你还应明确自己愿意承受的最大单周期权益回撤。将可容忍的最大单周期权益回撤除以历史最大损失,可以判断即使是半凯利杠杆是否对你来说过大。
The leverage to use is always the smaller of the half-Kelly leverage and the maximum leverage obtained using the worst historical loss. In the S&P 500 index example in the previous section, the maximum historical one-day loss is about 20.47 percent, which occurred on October 19, 1987-“Black Monday.” If you can tolerate only a 20 percent one-day drawdown on equity, then the maximum leverage you can apply is about 1 . Meanwhile, the leverage recommended by half-Kelly is 1.26 . Hence, in this case, even half-Kelly leverage would not be conservative enough to survive Black Monday. 使用的杠杆应始终取半凯利杠杆和根据历史最大亏损计算出的最大杠杆中的较小值。在上一节的标普 500 指数示例中,历史上最大的一日亏损约为 20.47%,发生在 1987 年 10 月 19 日的“黑色星期一”。如果你只能承受股本 20%的单日回撤,那么你能使用的最大杠杆约为 1。同时,半凯利推荐的杠杆是 1.26。因此,在这种情况下,即使是半凯利杠杆也不足以保守到能挺过黑色星期一。
The truly scary scenario in risk management is the one that has not occurred in history before. Echoing the philosopher Ludwig Wittgenstein, "Whereof one cannot speak, thereof one must be silent"on such unknowables, theoretical models are appropriately silent. 风险管理中真正可怕的情形是历史上从未发生过的情形。呼应哲学家路德维希·维特根斯坦的话,“对于无法言说的事物,必须保持沉默”,对于这类不可知的情况,理论模型也应保持适当的沉默。
IS TIIE USE OF STOP LOSS A GOOD RISK MANAGEMENT PRACTICE? 使用止损是良好的风险管理实践吗?
Some traders believe that good risk management means imposing stop loss on every trade; that is, if a position incurs a certain percent loss, the trader will exit the position. It is a common fallacy to believe that imposing stop loss will prevent the portfolio from suffering catastrophic losses. When a catastrophic event occurs, securities prices will drop discontinuously, so the stoploss orders to exit the positions will only be filled at prices much worse than those before the event. So, by exiting the positions, we are actually realizing the catastrophic loss and not avoiding it. For stop loss to be beneficial, we must believe that we are in a momentum, or trending, regime. In other words, we must believe that the prices will get worse within the expected lifetime of our trade. Otherwise, if the market is mean reverting within that lifetime, we will eventually recoup our losses if we didn’t exit the position too quickly. 一些交易者认为良好的风险管理意味着对每笔交易都设置止损;也就是说,如果一个仓位亏损达到一定百分比,交易者就会退出该仓位。认为设置止损能够防止投资组合遭受灾难性损失是一种常见的误区。当灾难性事件发生时,证券价格会断崖式下跌,因此止损订单只能以远低于事件发生前的价格成交。因此,通过退出仓位,我们实际上是在实现灾难性损失,而不是避免它。要使止损发挥作用,我们必须相信市场处于动量或趋势状态。换句话说,我们必须相信价格在我们预期的交易周期内会继续恶化。否则,如果市场在该周期内呈现均值回归趋势,我们最终会收回亏损,前提是我们没有过早退出仓位。
Of course, it is not easy to tell whether one is in a momentum regime (when stop loss is beneficial) or in a mean-reverting regime (when stop loss is harmful). My own observation is that when the movement of prices is due to news or other fundamental reasons (such as a company’s deteriorating revenue), one is likely to be in a momentum regime, and one should not “stand in front of a freight train,” in traders’ vernacular. For example, if a fundamental analysis of a company reveals that it is currently overvalued, its stock price will likely gradually decrease (at least in relation to the market index) in order to reach a new, lower equilibrium price. This movement to the lower equilibrium price is irreversible as long as the fundamental economics of the company does not change. However, when securities prices move drastically without any 当然,很难判断当前是处于趋势行情(此时止损有利)还是均值回归行情(此时止损有害)。我个人的观察是,当价格的变动是由于新闻或其他基本面原因(例如公司收入恶化)引起时,往往处于趋势行情,正如交易员的行话所说,不应“站在货运列车前面”。例如,如果对一家公司进行基本面分析发现其当前被高估,其股价很可能会逐渐下跌(至少相对于市场指数而言),以达到一个新的、更低的均衡价格。只要公司的基本经济状况没有改变,这种向更低均衡价的运动就是不可逆的。然而,当证券价格剧烈波动却没有任何
Abstract 摘要
apparent news or reasons, it is likely that the move is the result of a liquidity event-for example, major holders of the securities suddenly need to liquidate large positions for their own idiosyncratic reasons, or major speculators suddenly decide to cover their short positions. These liquidity events are of relatively short durations and mean reversion to the previous price levels is likely. 明显的新闻或原因缺失时,该价格变动很可能是流动性事件的结果——例如,证券的大持有者因其自身特殊原因突然需要清算大量头寸,或主要投机者突然决定回补空头头寸。这些流动性事件持续时间相对较短,价格很可能会回归到之前的水平。
I will discuss in some more detail the appropriate exit strategies for meanreverting versus momentum strategies in Chapter 7. 我将在第 7 章中更详细地讨论均值回归策略与动量策略的适当退出策略。
Beyond position risk (which is comprised of both market risk and specific risk), there are other forms of risks to consider: model risk, software risk, and natural disaster risk, in decreasing order of likelihood. 除了头寸风险(包括市场风险和特定风险)之外,还需考虑其他形式的风险:模型风险、软件风险和自然灾害风险,按可能性递减排序。
Model Risk 模型风险
Model risk simply refers to the possibility that trading losses are not due to the statistical vagaries of the market but to the fact that the trading model is wrong. It could be wrong for a large number of reasons, some of which were detailed in Chapter 3: data-snooping bias, survivorship bias, and so on. To eliminate all these different biases and errors in the backtest programs, it is extremely helpful to have a collaborator or consultant to duplicate your backtest results independently to ensure their validity. This need to duplicate results is routinely done in scientific research and is no less essential in financial research. 模型风险简单来说是指交易亏损的原因不是市场的统计波动,而是交易模型本身存在错误。模型可能出错的原因有很多,其中一些在第三章中已有详细说明:数据窥探偏差、生存者偏差等等。为了消除回测程序中的各种偏差和错误,拥有一个合作者或顾问来独立复现你的回测结果以确保其有效性是非常有帮助的。这种结果复现的需求在科学研究中是常规操作,在金融研究中同样不可或缺。
Model risk can also come not from any bias or error in your model or backtesting procedure, but from increased competition from other institutional traders all running the same strategy as you; or it could be a result of some fundamental change in market structure that eliminated the edge of your trading model. This is the regime shift that I talked about in Chapter 3. 模型风险也可能不是来自模型或回测程序中的任何偏差或错误,而是来自其他机构交易者竞争加剧,他们都在运行与你相同的策略;或者可能是市场结构发生了某种根本性变化,导致你的交易模型失去了优势。这就是我在第三章中提到的制度转变。
There is not much you can do to alleviate these sources of model risk, except to gradually lower the leverage of the model as it racks up losses, up to the point where the leverage is zero. This can be accomplished in a systematic way if you constantly update the leverage according to the Kelly formula based on the trailing mean return and standard deviation. (As the mean return decreases to zero in the 对于这些模型风险的来源,你几乎无能为力,唯一能做的就是随着模型亏损的累积,逐步降低模型的杠杆,直到杠杆降为零。如果你根据滚动的平均收益率和标准差,持续使用凯利公式来更新杠杆,这一过程可以系统化地实现。(当平均收益率降至零时,
lookback period, your Kelly leverage will be driven to zero.) This is preferable to abruptly shutting down a model because of a large drawdown (see my discussion of the psychological pressure to shut down models prematurely in the following section on psychological preparedness). 回溯期越长,你的凯利杠杆将被压缩到零。)这比因为大幅回撤而突然关闭模型更可取(关于因心理压力而过早关闭模型的讨论,请参见下一节关于心理准备的内容)。
Software Risk 软件风险
Software risk refers to the case where the automated trading system that generates trades every day actually does not faithfully reflect your backtest model. This happens because of the omnipresent software bugs. I discussed the way to eliminate such software errors in Chapter 5: you should compare the trades generated by your automated trading system with the theoretical trades generated by your backtest system to ensure that they are the same. 软件风险是指每天生成交易的自动化交易系统实际上并未忠实反映你的回测模型的情况。这种情况的发生是由于无处不在的软件漏洞。我在第 5 章中讨论了消除此类软件错误的方法:你应该将自动化交易系统生成的交易与回测系统生成的理论交易进行比较,以确保它们是一致的。
Natural Disaster Risk 自然灾害风险
Finally, physical or natural disasters can happen, which can cause big losses, and they don’t have to be anything dramatic like earthquakes or tsunami. What if your internet connection went down before you could enter a hedging position? What if your power went down in the middle of transmitting a trade? The different methods of preventing physical disasters from causing major disruptions to your trading can be found in the section on physical infrastructure in Chapter 5. 最后,物理或自然灾害可能发生,造成重大损失,而且不一定非得是地震或海啸这样戏剧性的事件。如果在你建立对冲头寸之前,网络连接中断了怎么办?如果在传输交易的过程中断电了怎么办?防止物理灾害对交易造成重大干扰的不同方法,可以在第 5 章关于物理基础设施的部分找到。
PSYCHOLOGICAL PREPAREDNESS 心理准备
It may seem strange that a book on quantitative trading would include a section on psychological preparedness. After all, isn’t quantitative trading supposed to liberate us from our emotions and let the computer make all the trading decisions in a disciplined manner? If only it were this easy: human traders who are not psychologically prepared will often override their automated trading 一本关于量化交易的书中包含心理准备的章节,可能看起来有些奇怪。毕竟,量化交易不就是要让我们摆脱情绪,让计算机以纪律化的方式做出所有交易决策吗?如果事情真这么简单就好了:没有做好心理准备的人类交易者,往往会覆盖他们的自动交易系统。
systems’ decisions, especially when there is a position or day with abnormal profit or loss. Hence, it is critical even if we trade using quantitative strategies to understand some of our own psychological weaknesses. 系统的决策,尤其是在出现异常盈利或亏损的仓位或交易日时。因此,即使我们使用量化策略进行交易,理解自身的一些心理弱点也是至关重要的。
Fortunately, there is a field of financial research called behavioral finance (Thaler, 1994) that studies irrational financial decisionmaking. I will try to highlight a few of the common irrational behaviors that affect trading. 幸运的是,有一个名为行为金融学(Thaler,1994)的金融研究领域专门研究非理性的金融决策。我将尝试突出一些影响交易的常见非理性行为。
The first behavioral bias is known variously as the endowment effect, status quo bias, or loss aversion. The first two effects cause some traders to hold on to a losing position for too long, because traders (and people in general) give too much preference to the status quo (the status quo bias), or because they demand much more to give up the stock than what they would pay to acquire it (the endowment effect). As I argued in the risk management section, there are rational reasons to hold on to a losing position (e.g., when you expect mean-reverting behavior); however, these behavioral biases cause traders to hold on to losing positions even when there is no rational reason (e.g., when you expect trending behavior, and the trend is such that your positions will lose even more). At the same time, the loss aversion bias causes some traders to exit their profitable positions too soon, even if holding longer will lead to a larger profit on average. Why do they exit the profitable positions so soon? Because the pain from possibly losing some of the current profits outweighs the pleasure from gaining higher profits. 第一个行为偏差被称为禀赋效应、现状偏见或损失厌恶。前两种效应导致一些交易者过久地持有亏损头寸,因为交易者(以及一般人)过于偏好现状(现状偏见),或者因为他们要求放弃股票的代价远高于购买时愿意支付的价格(禀赋效应)。正如我在风险管理部分所论述的,持有亏损头寸有其理性理由(例如,当你预期均值回归行为时);然而,这些行为偏差使交易者即使在没有理性理由的情况下也会持有亏损头寸(例如,当你预期趋势行为,且趋势会导致你的头寸亏损加剧时)。与此同时,损失厌恶偏差使一些交易者过早退出盈利头寸,即使持有更久平均会带来更大利润。他们为何如此早退出盈利头寸?因为可能失去部分当前利润的痛苦,超过了获得更高利润的愉悦。
This behavioral bias manifests itself most clearly and most disastrously when one has entered a position by mistake (because of either a software bug, an operational error, or a data problem) and has incurred a big loss. The rational step to take is to exit the position immediately upon discovery of the error. However, traders are often tempted to wait for mean reversion such that the loss is smaller before they exit. Unless you have a model for mean reversion that suggests now is a good time to enter into this position, this wait for mean reversion may very well lead to bigger losses instead. 这种行为偏差最明显且最灾难性地表现是在错误建立头寸时(无论是由于软件漏洞、操作失误还是数据问题)并且已经产生了巨大损失。理性的做法是在发现错误后立即退出头寸。然而,交易者常常会被诱惑去等待均值回归,以期在退出时损失更小。除非你有一个均值回归模型表明现在是进入该头寸的好时机,否则这种等待均值回归的行为很可能会导致更大的损失。
While loss aversion leads to suboptimal trading in such a situation, there are other times when loss aversion is wise and is not a behavioral bias. Most economic arguments against loss aversion 虽然在这种情况下损失厌恶会导致次优交易,但在其他时候,损失厌恶是明智的,并非行为偏差。大多数反对损失厌恶的经济学论点
assumes that we have a large number of “gamblers” (read: traders) playing a risky game, and as long as the average return is positive, the economist suggests that this risky game is worthwhile, even if some of these gamblers may be ruined. However, when you are the trader, it is rational to avoid ruin at all cost, no matter what return the “average” trader enjoys. This contrast between the ensemble average (across different traders) and the time series average (over a long time horizon for a single trader) is a profound mathematical observation by the physicists Ole-Peters and Nobel laureate Murray Gell-Mann. See Box 6.1 for more details. 假设我们有大量的“赌徒”(即交易者)在玩一个高风险的游戏,只要平均回报为正,经济学家就认为这个高风险游戏是值得的,即使其中一些赌徒可能会破产。然而,当你是交易者时,理性行为是无论如何都要避免破产,而不管“平均”交易者获得什么回报。物理学家 Ole-Peters 和诺贝尔奖得主 Murray Gell-Mann 提出了这个集合平均(不同交易者之间)与时间序列平均(单个交易者在长时间范围内)的对比,这是一个深刻的数学观察。更多细节见 6.1 框。
BOX 6.1 LOSS AVERSION IS NOT A BEHAVIORAL BIAS* 6.1 框 亏损厌恶不是行为偏差*
In his famous book Thinking, Fast and Slow, the Nobel laureate Daniel Kahneman (2011) described one common example of a behavioral finance bias: 在他著名的著作《思考,快与慢》中,诺贝尔奖得主丹尼尔·卡尼曼(2011)描述了一个行为金融偏差的常见例子:
“You are offered a gamble on the toss of a [fair] coin. “你被提供一个基于[公平]抛硬币的赌博机会。
If the coin shows tails, you lose $100\$ 100. 如果硬币显示为反面,你将失去 $100\$ 100 。
If the coin shows heads, you win $110\$ 110. 如果硬币显示为正面,你将赢得 $110\$ 110 。
Is this gamble attractive? Would you accept it?” 这个赌博有吸引力吗?你会接受吗?
(I have modified the numbers to be more realistic in a financial market setting, but otherwise it is a direct quote.) (我修改了数字,使其在金融市场环境中更为现实,但除此之外这是直接引用。)
Experiments show that most people would not accept this gamble, even though the expected gain is $5. This is the so-called “loss aversion” behavioral bias, and is considered irrational. Kahneman went on to write that “professional risk takers” (read “traders”) are more willing to act rationally and accept this gamble. 实验表明,大多数人不会接受这种赌博,尽管期望收益是 5 美元。这就是所谓的“损失厌恶”行为偏差,被认为是不理性的。卡尼曼进一步写道,“专业风险承担者”(即“交易员”)更愿意理性行事,接受这种赌博。
It turns out that the loss-averse “layman” is the one acting rationally here. 事实证明,损失厌恶的“外行人”才是在这里表现得理性的人。
It is true that if we have infinite capital, and can play infinitely many rounds of this game simultaneously, we should expect a $5\$ 5 gain per round. But trading isn’t like that. We are dealt one coin at a time, and if we suffer a string of losses, our capital will be depleted and we will be in debtor prison if we keep playing. The proper way to evaluate whether this game is attractive is to evaluate the expected compound rate of growth of our capital. 确实,如果我们拥有无限的资本,并且可以同时进行无限多轮这样的游戏,我们应该期望每轮的收益为 0。但交易并非如此。我们一次只能拿到一枚硬币,如果连续遭遇亏损,我们的资本将被耗尽,如果继续玩下去,我们将陷入债务监狱。评估这个游戏是否有吸引力的正确方法是评估我们资本的预期复合增长率。
Let’s say we are starting with a capital of $1,000\$ 1,000. The expected return of playing this game once is initially 0.005 . The standard deviation of the 假设我们起始资本为 0。玩这场游戏一次的期望回报最初是 0.005。标准差为
return is 0.105 . To simplify matters, let’s say we are allowed to adjust the payoff of each round so we have the same expected return and standard deviation of return each round. For example, if at some point we earned so much that we doubled our capital to $2,000\$ 2,000, we are allowed to win $220\$ 220 or lose $200\$ 200 per round. What is the expected growth rate of our capital? As Example 6.1 shows, in the continuous approximation it is -0.0005125 per round - we are losing, not gaining! The layman is right to refuse this gamble. 回报率是 0.105。为了简化问题,假设我们可以调整每一轮的收益,使得每一轮的期望回报和回报的标准差都相同。例如,如果在某个时点我们赚了很多,使得资本翻倍到 $2,000\$ 2,000 ,那么我们每轮可以赢 $220\$ 220 或输 $200\$ 200 。我们的资本的期望增长率是多少?正如示例 6.1 所示,在连续近似下,每轮是-0.0005125——我们在亏损,而不是盈利!外行人拒绝这场赌博是正确的。
Loss aversion, in the context of a risky game played repeatedly, is rational, and not a behavioral bias. Our primitive, primate instinct grasped a truth that behavioral economists cannot. It only seems like a behavioral bias if we take an “ensemble view” (i.e., allowed infinite capital to play many rounds of this game simultaneously), instead of a “time series view” (i.e. allowed only finite capital to play many rounds of this game in sequence, provided we don’t go broke at some point). The time series view is the one relevant to all traders. In other words, take time average, not ensemble average, when evaluating real-world risks. 在反复进行的风险游戏中,损失厌恶是理性的,而不是行为偏差。我们原始的灵长类动物本能理解了行为经济学家无法理解的真理。只有当我们采取“集合视角”(即允许无限资本同时进行多轮游戏)时,这才看起来像是一种行为偏差,而不是“时间序列视角”(即只允许有限资本顺序进行多轮游戏,前提是我们不会在某个时点破产)。时间序列视角才是所有交易者相关的视角。换句话说,在评估现实世界风险时,应取时间平均,而非集合平均。
The important difference between ensemble average and time average has been raised in a paper by physicist Ole Peters and Nobel laureate Murray Gell-Mann (Peters, et al., 2016). It deserves to be much more widely read in the behavioral economics community. But beyond academic interest, there is a practical importance in emphasizing that loss aversion is rational. As traders, we should not only focus on average returns: risks can depress compound returns severely. 物理学家 Ole Peters 和诺贝尔奖得主 Murray Gell-Mann 在一篇论文中提出了集合平均与时间平均之间的重要区别(Peters 等,2016)。这在行为经济学界值得被更广泛地关注。但除了学术兴趣之外,强调损失厌恶是理性的这一点具有实际意义。作为交易者,我们不应只关注平均收益:风险可能会严重压低复合收益。
Another common bias that I have personally experienced is the representativeness bias-people tend to put too much weight on recent experience and underweight long-term average (Ritter, 2003). (This reference has a good introduction to various biases studied by behavioral finance.) After a big loss, traders-even quantitative traders-tend to immediately modify certain parameters of their strategies so that they would have avoided the big loss if they were to trade this modified system. But, of course, this is unwise because this modification may invite some other big loss that is yet to happen, or it may have eliminated many profit opportunities that existed. We must remember that we are operating in a probabilistic regime: No system can avoid all the market vagaries that can result in losses. 我个人经历过的另一个常见偏见是代表性偏见——人们倾向于过分重视近期的经验,而低估长期平均值(Ritter,2003)。(该参考文献对行为金融学研究的各种偏见有很好的介绍。)在遭受重大亏损后,交易者——即使是量化交易者——往往会立即修改策略的某些参数,以便如果使用这个修改后的系统交易,就能避免这次重大亏损。但这显然是不明智的,因为这种修改可能会引发尚未发生的其他重大亏损,或者可能消除了许多原本存在的盈利机会。我们必须记住,我们是在一个概率性环境中操作:没有任何系统能够避免所有可能导致亏损的市场变动。
If you feel that your system really is deficient and want to tweak it, you should always backtest the modified version to make sure 如果你觉得你的系统确实存在缺陷并想进行调整,你应该始终对修改后的版本进行回测,以确保
that it does outperform the old system over a sufficiently long backtest period, not just over the last few weeks. 它在足够长的回测期内确实优于旧系统,而不仅仅是在最近几周表现更好。
There are two major psychological weaknesses that are more well known to the traders than to economists: despair and greed. 有两种主要的心理弱点,交易者比经济学家更为熟知:绝望和贪婪。
Despair occurs when a trading model is in a major, prolonged drawdown. Many traders (and their managers, investors, etc.) will be under great pressure under this circumstance to shut down the model completely. Other overly self-confident traders with a reckless bent will do the opposite: They will double their bets on their losing models, hoping to recoup their losses eventually, if and when the models rebound. Neither behavior is rational: if you have been managing your capital allocation and leverage by the Kelly formula, you would lower the capital allocation for the losing model gradually. 当交易模型处于重大且长期的回撤期时,绝望便会产生。许多交易者(以及他们的管理者、投资者等)在这种情况下会面临巨大的压力,倾向于完全关闭该模型。另一些过于自信且鲁莽的交易者则会采取相反的做法:他们会加倍下注于亏损的模型,希望在模型反弹时最终收回损失。这两种行为都不理性:如果你一直按照凯利公式管理你的资金分配和杠杆,你会逐渐降低对亏损模型的资金分配。
Greed is the more usual emotion when the model is having a good run and is generating a lot of profits. The temptation now is to increase its leverage quickly in order to get rich quickly. Once again, a well-disciplined quantitative trader will keep the leverage below the dictates of the Kelly formula as well as the caution imposed by the possibility of fat-tail events. 当模型表现良好并产生大量利润时,贪婪是更常见的情绪。此时的诱惑是迅速增加杠杆,以期快速致富。再次强调,纪律严明的量化交易者会将杠杆控制在凯利公式的要求之下,同时考虑到可能出现的厚尾事件所带来的风险,保持谨慎。
Both despair and greed can lead to overleveraging (i.e., trading an overly large portfolio): In despair, one tries to recoup the losses by adding fresh capital; in greed, one adds capital too quickly after initial successes with a strategy. Therefore, the one golden rule in risk management is to keep the size of your portfolio under control at all times. This is, however, easier said than done. Large, well-known funds have succumbed to the temptation to overleverage and failed: Long-Term Capital Management in 2000 (Lowenstein, 2000) and Amaranth Advisors in 2006 (Chan, 2006a). In the Amaranth Advisors case, the leverage employed on one single strategy (natural gas calendar spread trade) due to one single trader (Brian Hunter) is so large that a $6\$ 6 billion loss was incurred, comfortably wiping out the fund’s equity-a textbook case of risk mismanagement. 绝望和贪婪都可能导致过度杠杆(即交易过大规模的投资组合):在绝望时,人们试图通过追加新资金来弥补损失;在贪婪时,人们在策略初步成功后过快地增加资金。因此,风险管理的唯一黄金法则是始终控制好投资组合的规模。然而,说起来容易做起来难。大型知名基金曾屈服于过度杠杆的诱惑而失败:2000 年的长期资本管理公司(Lowenstein, 2000)和 2006 年的 Amaranth Advisors(Chan, 2006a)。在 Amaranth Advisors 的案例中,由于单一交易员(Brian Hunter)在单一策略(天然气日历价差交易)上使用的杠杆过大,导致了数十亿美元的损失,轻松抹去了基金的全部权益——这是风险管理失误的典型案例。
I have experienced this pressure myself both in an institutional setting and in a personal setting, and the unfortunate result both times was to succumb prematurely. When I was with a money management firm, I lost over $1\$ 1 million for the fund’s investors because, in a fit of greed, I added over $100\$ 100 million to a portfolio based on a strategy that had been traded for barely six months. (That was 我自己在机构环境和个人环境中都经历过这种压力,不幸的是,两次的结果都是过早地屈服。当我在一家资金管理公司时,我为基金投资者亏损了超过 $1\$ 1 百万,因为一时贪婪,我在一个仅交易了不到六个月的策略上追加了超过 $100\$ 100 百万的投资组合。(那是在我了解凯利准则和其他压力测试方法之前。)
before I learned of the Kelly criterion and other stress testing methodologies.) As if this is not enough lesson, I repeated the same mistake again when I started trading independently. It concerns a mean-reverting spread strategy involving XLE, an energy exchangetraded fund (ETF) and the crude oil future (CL). When the spread refused to mean revert over time, I stubbornly increased the size of the spread to almost $500,000\$ 500,000. Finally, despair set in, and I exited the spread with close to a six-figure loss. Naturally, the spread started to revert afterward when I wasn’t around to benefit. (Fortunately, several of my other strategies performed well in that first year of my independent trading, so the fiscal year ended with only a small overall loss.) 如果这还不够成为教训,当我开始独立交易时,我又犯了同样的错误。这涉及一个均值回归价差策略,包含 XLE(一种能源交易所交易基金 ETF)和原油期货(CL)。当价差随着时间推移拒绝均值回归时,我固执地将价差规模增加到接近 $500,000\$ 500,000 。最终,绝望袭来,我以接近六位数的亏损退出了该价差策略。自然地,当我不在时,价差开始回归。(幸运的是,我其他几个策略在我独立交易的第一年表现良好,因此该财政年度最终只出现了小幅整体亏损。)
How should we train ourselves to overcome these psychological weaknesses and learn not to override the models manually and to remedy trading errors correctly and expeditiously? As with most human endeavors, the way to do this is to start with a small portfolio and gradually gain psychological preparedness, discipline, and confidence in your models. As you become emotionally more able to handle the daily swings in profit and loss (P&L) and rein in the primordial urges of the psyche, your portfolio’s actual performance will hew to the theoretically expected performance of your strategy. 我们应该如何训练自己克服这些心理弱点,学会不手动干预模型,并正确且迅速地纠正交易错误呢?和大多数人类活动一样,方法是从一个小型投资组合开始,逐步培养心理准备、纪律性以及对模型的信心。当你在情绪上能够更好地应对每日盈亏的波动,并抑制内心原始的冲动时,你的投资组合的实际表现将会贴近策略的理论预期表现。
I have certainly found that to be the case after getting over those aforementioned disastrous trades. My newfound discipline and faith in the Kelly formula has so far prevented similar disasters from happening again. 在经历了前面提到的那些灾难性交易之后,我确实发现情况就是如此。我新获得的纪律性和对凯利公式的信任,到目前为止已经防止了类似灾难的再次发生。
SUMMARY 总结
Risk management is a crucial discipline in trading. The trading world is littered with numerous examples of giant hedge funds and investment banks laid low by enormous losses due to a single trade or in a very short period of time. Most of these losses are due to overleveraging positions and not to an inherently erroneous model. Typically, traders will not overleverage a model that has not worked very well. It is a hitherto superbly performing model that is at the greatest risk of huge loss due to overconfidence and overleverage. This 风险管理是交易中的一门关键学科。交易界充斥着许多因单笔交易或极短时间内的巨大亏损而导致大型对冲基金和投资银行倒闭的例子。这些亏损大多源于仓位过度杠杆化,而非模型本身存在根本性错误。通常,交易者不会对表现不佳的模型进行过度杠杆操作。恰恰是那些迄今表现极佳的模型,因过度自信和过度杠杆而面临巨大亏损的最大风险。
chapter therefore provides an important tool for risk management: the determination of the optimal leverage using the Kelly formula. 因此,本章提供了风险管理的重要工具:利用凯利公式确定最优杠杆率。
Besides the determination of the optimal leverage, the Kelly formula has a very useful side benefit: It also determines the optimal allocation of capital among different strategies, based on the covariance of their returns. 除了确定最优杠杆率外,凯利公式还有一个非常有用的附加功能:它还能基于不同策略收益的协方差,确定资本在各策略间的最优分配。
But no risk management formula or system will prevent disasters if you are not psychologically prepared for the ups and downs of trading and thus deviating from the prescriptions of rational decision making (i.e., your models). The ultimate risk management mind-set is very simple: Do not succumb to either despair or greed. To gain practice in this psychological discipline, one must proceed slowly with small position size, and thoroughly test various aspects of the trading business (model, software, operational procedure, money and risk management) before scaling up according to the Kelly formula. 但如果你没有为交易的起伏做好心理准备,从而偏离理性决策(即你的模型)的指导,那么任何风险管理公式或系统都无法防止灾难。终极的风险管理心态非常简单:不要屈服于绝望或贪婪。为了在这种心理纪律上获得练习,必须以小仓位慢慢进行,并在根据凯利公式扩大规模之前,彻底测试交易业务的各个方面(模型、软件、操作程序、资金和风险管理)。
I have found that in order to proceed slowly and cautiously, it is helpful to have other sources of income or other businesses to help sustain yourself either financially or emotionally (to avoid the boredom associated with slow progress). It is indeed possible that finding a diversion, whether income producing or not, may actually help improve the long-term growth of your wealth. 我发现,为了能够缓慢而谨慎地前进,拥有其他收入来源或其他业务来帮助自己在经济上或情感上维持生活是很有帮助的(以避免因进展缓慢而产生的无聊)。实际上,找到一种转移注意力的方式,无论是否产生收入,可能都会有助于改善你财富的长期增长。
APPENDIX: A SIMPLE DERIVATION OF THE KELLY FORMULA WHEN RETURN DISTRIBUTION IS GAUSSIAN 附录:当收益分布为高斯分布时凯利公式的简单推导
If we assume that the return distribution of a strategy (or security) is Gaussian, then the Kelly formula can be derived very easily. We start with the formula for a compounded, levered growth rate applicable to a Gaussian process: 如果我们假设一个策略(或证券)的收益分布是高斯分布,那么凯利公式可以非常容易地推导出来。我们从适用于高斯过程的复利杠杆增长率公式开始:
g(f)=r+fm-s^(2)f^(2)//2g(f)=r+f m-s^{2} f^{2} / 2
where ff is the leverage; rr is the risk-free rate; mm is the average simple, uncompounded one-period excess return; and ss is the standard deviation of those uncompounded returns. This formula for compounded growth rate can itself be derived quite simply, but not as 其中 ff 是杠杆; rr 是无风险利率; mm 是平均的简单、非复利的一期超额收益; ss 是这些非复利收益的标准差。这个复利增长率的公式本身也可以相当简单地推导出来,但不是像...
simply as the Kelly formula, so I leave its derivation for the reader to look up in the Thorp article referenced earlier. 简单地称为凯利公式,因此我将其推导留给读者去查阅前面提到的 Thorp 文章。
To find the optimal ff, which maximizes gg, simply take its first derivative with respect to ff and set the derivative to zero: 要找到使 gg 最大化的最优 ff ,只需对 ff 求一阶导数并将导数设为零:
dg//df=m-s^(2)f=0d g / d f=m-s^{2} f=0
Solving this equation for ff gives us f=m//s^(2)f=m / s^{2}, the Kelly formula for one strategy or security under the Gaussian assumption. 解这个方程得到 ff ,即在高斯假设下单一策略或证券的凯利公式 f=m//s^(2)f=m / s^{2} 。
REFERENCES 参考文献
Chan, Ernest. 2006a. “A ‘Highly Improbable’ Event? A Historical Analysis of the Natural Gas Spread Trade That Bought Down Amaranth.” Quantitative Trading blog, October 2, http://epchan.blogspot.com/2006/10/highly-improba-ble-event.html. 陈恩荣。2006 年 a。“一个‘极不可能’的事件?对导致阿马兰斯崩溃的天然气价差交易的历史分析。”《量化交易》博客,10 月 2 日,http://epchan.blogspot.com/2006/10/highly-improba-ble-event.html。
Kahneman, Daniel. 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux. 丹尼尔·卡尼曼。2011 年。《思考,快与慢》。法拉尔、斯特劳斯与吉鲁出版社。
Khandani, Amir E., and Andrew Lo. 2007. “What Happened to the Quants in August 2007?” MIT. https://web.mit.edu/Alo/www/Papers/august07.pdf. 阿米尔·E·汗达尼,安德鲁·洛。2007 年。“2007 年 8 月量化交易者发生了什么?”麻省理工学院。https://web.mit.edu/Alo/www/Papers/august07.pdf。
Lowenstein, Roger. 2000. When Genius Failed: The Rise and Fall of Long-Term Capital Management. Random House. 罗杰·洛文斯坦。2000 年。《天才的失败:长期资本管理公司的兴衰》。兰登书屋。
Peters, O., and M. Gell-Mann. 2016. “Evaluating Gambles Using Dyanmics.” Chaos 26, 023103. https://doi.org/10.1063/1.4940236. Peters, O.,和 M. Gell-Mann。2016 年。《使用动力学评估赌博》。Chaos 26,023103。https://doi.org/10.1063/1.4940236。
Poundstone, William. 2005. Fortune’s Formula. New York: Hill and Wang. Ritter, Jay. 2003. “Behavioral Finance.” Pacific-Basin Finance Journal 11(4, September): 429-437. Poundstone, William。2005 年。《财富公式》。纽约:Hill and Wang。Ritter, Jay。2003 年。《行为金融学》。Pacific-Basin Finance Journal 11(4,9 月):429-437。
Taleb, Nassim. 2007. The Black Swan: The Impact of the Highly Improbable. Random House. Taleb, Nassim。2007 年。《黑天鹅:高度不可能事件的影响》。Random House。
Thaler, Richard. 1994. The Winner’s Curse. Princeton, NJ: Princeton University Press. Thaler, Richard。1994 年。《赢家的诅咒》。新泽西普林斯顿:普林斯顿大学出版社。
Thorp, Edward. 1997. “The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market.” Handbook of Asset and Liability Management, Volume I, Zenios and Ziemba (eds.). Elsevier 2006. www.EdwardOThorp.com. Thorp, Edward. 1997 年。《凯利准则在二十一点、体育博彩和股票市场中的应用》。资产负债管理手册,第一卷,Zenios 和 Ziemba(编)。Elsevier 2006 年。www.EdwardOThorp.com。
Special Topics in Quantitative Trading 量化交易专题
The first six chapters of this book covered most of the basic knowledge needed to research, develop, and execute your own quantitative strategy. This chapter explains important themes in quantitative trading in more detail. These themes form the bases of statistical arbitrage trading, and most quantitative traders are conversant in some if not most of these topics. They are also very helpful in informing our intuition about trading. 本书前六章涵盖了研究、开发和执行您自己的量化策略所需的大部分基础知识。本章将更详细地解释量化交易中的重要主题。这些主题构成了统计套利交易的基础,大多数量化交易者对其中一些甚至大部分主题都很熟悉。它们对于指导我们对交易的直觉也非常有帮助。
I will describe the two basic categories of trading strategies: mean-reverting versus momentum strategies. Periods of meanreverting and trending behaviors are examples of what some traders call regimes, and the different regimes require different strategies, or at least different parameters of the same strategy. Mean-reverting strategies derive their mathematical justification from the concepts of stationarity and cointegration of time series, which I will cover next. Following that, I will discuss a novel application of machine learning to adapt the parameters of a trading strategy to different regimes that we call Conditional Parameter Optimization (CPO). Then I will describe a theory that many hedge funds use to manage large portfolios and one that has caused much turmoil in their performances: namely, factor models. Other categories of strategies that traders frequently discuss are seasonal trading and high-frequency 我将描述两种基本的交易策略类别:均值回归策略与动量策略。均值回归和趋势行为的周期是一些交易者所称的“状态”,不同的状态需要不同的策略,或者至少需要同一策略的不同参数。均值回归策略的数学依据来自时间序列的平稳性和协整性概念,接下来我将介绍这些内容。随后,我将讨论一种将机器学习新颖应用于根据不同状态调整交易策略参数的方法,我们称之为条件参数优化(Conditional Parameter Optimization,CPO)。然后,我将描述许多对冲基金用来管理大型投资组合的一种理论,以及这一理论在其业绩中引发诸多波动的原因:即因子模型。交易者经常讨论的其他策略类别包括季节性交易和高频交易。
strategies. All trading strategies require a way to exit their positions; I will describe the different logical ways to do this. Finally, I ponder the question of how to best enhance the returns of a strategy: through higher leverage or trading higher-beta stocks? 所有交易策略都需要一种退出持仓的方法;我将描述实现这一点的不同逻辑方式。最后,我思考了如何最好地提升策略收益:是通过更高的杠杆,还是交易更高贝塔值的股票?
MEAN-REVERTING VERSUS MOMENTUM STRATEGIES 均值回归策略与动量策略
Trading strategies can be profitable only if securities prices are either mean-reverting or trending. Otherwise, they are random-walking, and trading will be futile. If you believe that prices are mean reverting and that they are currently low relative to some reference price, you should buy now and plan to sell higher later. However, if you believe the prices are trending and that they are currently low, you should (short) sell now and plan to buy at an even lower price later. The opposite is true if you believe prices are high. 交易策略只有在证券价格呈现均值回归或趋势性时才可能获利。否则,价格是随机游走的,交易将毫无意义。如果你认为价格是均值回归的,并且当前价格相对于某个参考价较低,那么你应该现在买入,计划以后以更高的价格卖出。然而,如果你认为价格呈现趋势性,并且当前价格较低,那么你应该现在(做空)卖出,计划以后以更低的价格买回。如果你认为价格较高,则情况正好相反。
Academic research has indicated that stock prices are on average very close to random walking. However, this does not mean that under certain special conditions, they cannot exhibit some degree of mean reversion or trending behavior. Furthermore, at any given time, stock prices can be both mean reverting and trending, depending on the time horizon you are interested in. Constructing a trading strategy is essentially a matter of determining if the prices under certain conditions and for a certain time horizon will be mean reverting or trending, and what the initial reference price should be at any given time. (When the prices are trending, they are also said to have “momentum,” and thus the corresponding trading strategy is often called a momentum strategy.) 学术研究表明,股票价格平均来看非常接近随机游走。然而,这并不意味着在某些特殊条件下,股票价格不能表现出一定程度的均值回归或趋势行为。此外,在任何特定时间,股票价格既可以是均值回归的,也可以是趋势性的,这取决于你关注的时间范围。构建交易策略本质上就是确定在特定条件和特定时间范围内,价格是会均值回归还是呈现趋势,以及在任何给定时间点初始参考价格应是多少。(当价格呈现趋势时,也称其具有“动量”,因此相应的交易策略通常被称为动量策略。)
Reversion of the price of a single stock from a temporary deviation from its mean price level back to its mean is called time-series mean reversion, which doesn’t happen often. (See, however, the strategy example in the next section on regime change and parameter optimization, which describes an apparently successful attempt to adapt a mean reversion strategy to the changing daily regimes for an ETF.) Mean reversion of the spread of a pair of stocks, or a 单只股票价格从其均价水平的暂时偏离回归到均价的现象称为时间序列均值回归,这种情况并不常见。(不过,请参见下一节关于状态变化和参数优化的策略示例,其中描述了一个显然成功的尝试,将均值回归策略适应于 ETF 的日常变化状态。)一对股票价差的均值回归,或者说…
portfolio of stocks, back to its mean level is called cross-sectional mean reversion, and it happens much more often. 股票组合回归到其均值水平被称为横截面均值回归,这种情况发生得更频繁。
I have already described a trading strategy based on the mean reversion of a pair of stocks (ETFs, to be precise) in Example 3.6. As for the mean reversion of a long-short portfolio of stocks, financial researchers (Khandani and Lo, 2007) have constructed a very simple short-term mean reversal model that is profitable (before transaction costs) over many years. Of course, whether the mean reversion is strong enough and consistent enough such that we can trade profitably after factoring in transaction costs is another matter, and it is up to you, the trader, to find those special circumstances when it is strong and consistent. 我已经在示例 3.6 中描述了一个基于一对股票(准确来说是 ETF)均值回归的交易策略。至于股票多空组合的均值回归,金融研究人员(Khandani 和 Lo,2007)构建了一个非常简单的短期均值回归模型,该模型在多年内(未扣除交易成本前)是有利可图的。当然,均值回归是否足够强且足够稳定,以至于在考虑交易成本后仍能盈利,则是另一回事,这取决于你作为交易者,能否找到那些均值回归强且稳定的特殊情况。
Though cross-sectional mean reversion is quite prevalent, backtesting a profitable mean-reverting strategy can be quite perilous. 尽管横截面均值回归相当普遍,但对一个有利可图的均值回归策略进行回测可能相当危险。
Many historical financial databases contain errors in price quotes. Any such error tends to artificially inflate the performance of meanreverting strategies. It is easy to see why: a mean-reverting strategy will buy on a fictitious quote that is much lower than some moving average and sell on the next correct quote that is in line with the moving average and thus makes a profit. One must make sure the data is thoroughly cleansed of such fictitious quotes before one can completely trust your backtesting performance on a mean-reverting strategy. 许多历史金融数据库中的价格报价存在错误。任何此类错误往往会人为地夸大均值回归策略的表现。原因很容易理解:均值回归策略会在一个远低于某个移动平均线的虚假报价上买入,然后在下一个与移动平均线一致的正确报价上卖出,从而获利。在完全信任均值回归策略的回测表现之前,必须确保数据已彻底清除此类虚假报价。
Survivorship bias also affects the backtesting of mean-reverting strategies disproportionately, as I discussed in Chapter 3. Stocks that went through extreme price actions are likely to be either acquired (the prices went very high) or went bankrupt (the prices went to zeros). A mean-reverting strategy will short the former and buy the latter, losing money in both cases. However, these stocks may not appear at all in your historical database if it has survivorship bias, thus artificially inflating your backtest performance. You can look up Table 3.1 to find out which database has survivorship bias. 正如我在第三章中讨论的,存活者偏差也会不成比例地影响均值回归策略的回测。经历过极端价格波动的股票很可能被收购(价格大幅上涨)或破产(价格跌至零)。均值回归策略会做空前者,买入后者,在这两种情况下都会亏钱。然而,如果你的历史数据库存在存活者偏差,这些股票可能根本不会出现在数据库中,从而人为地夸大你的回测表现。你可以查阅表 3.1,了解哪些数据库存在存活者偏差。
Momentum can be generated by the slow diffusion of information-as more people become aware of certain news, more people decide to buy or sell a stock, thereby driving the price in the same direction. I suggested earlier that stock prices may exhibit momentum when the expected earnings have changed. This can happen when a company announces its quarterly earnings, and investors 动量可以通过信息的缓慢扩散产生——随着越来越多的人意识到某些消息,更多的人决定买入或卖出某只股票,从而推动价格朝同一方向变动。我之前提到,当预期收益发生变化时,股票价格可能会表现出动量。这种情况可能发生在公司公布季度收益时,投资者会...
either gradually become aware of this announcement or they react to this change by incrementally executing a large order (so as to minimize market impact). And indeed, this leads to a momentum strategy called post earnings announcement drift, or PEAD. (For a particularly useful article with lots of references on this strategy, look up quantlogic.blogspot.com/2006/03/pocket-phd-post-earningannouncment.html.) Essentially, this strategy recommends that you buy a stock when its earnings exceed expectations and short a stock when it falls short. More generally, many news announcements have the potential of altering expectations of a stock’s future earnings, and therefore have the potential to trigger a trending period. As to what kind of news will trigger this, and how long the trending period will last, it is again up to you to find out. 要么逐渐意识到这一公告,要么通过逐步执行大额订单来对这一变化做出反应(以尽量减少市场影响)。实际上,这导致了一种被称为收益公告后漂移(post earnings announcement drift,简称 PEAD)的动量策略。(关于这一策略,有一篇特别有用且包含大量参考文献的文章,见 quantlogic.blogspot.com/2006/03/pocket-phd-post-earningannouncment.html。)本质上,这一策略建议当股票收益超出预期时买入该股票,当收益未达预期时做空该股票。更广泛地说,许多新闻公告都有可能改变市场对股票未来收益的预期,因此有可能引发一段趋势期。至于什么样的新闻会触发这种情况,以及趋势期会持续多久,则需要你自己去发现。
Besides the slow diffusion of information, momentum can be caused by the incremental execution of a large order due to the liquidity needs or private investment decisions of a large investor. This cause probably accounts for more instances of short-term momentum than any other causes. With the advent of increasingly sophisticated execution algorithms adopted by the large brokerages, it is, however, increasingly difficult to ascertain whether a large order is behind the observed momentum. 除了信息传播缓慢之外,动量还可能由大型投资者因流动性需求或私人投资决策而分批执行大额订单引起。这种原因可能比其他任何原因都更能解释短期动量的出现。然而,随着大型券商采用越来越复杂的执行算法,越来越难以确定观察到的动量背后是否存在大额订单。
Momentum can also be generated by the herdlike behavior of investors: investors interpret the (possibly random and meaningless) buying or selling decisions of others as the sole justifications of their own trading decisions. As Yale economist Robert Schiller said in the New York Times (Schiller, 2008), nobody has all the information they need in order to make a fully informed financial decision. One has to rely on the judgment of others. There is, however, no sure way to discern the quality of the judgment of others. More problematically, people make their financial decisions at different times, not meeting at a town hall and reaching a consensus once and for all. The first person who paid a high price for a house is “informing” the others that houses are good investments, which leads another person to make the same decision, and so on. Thus, a possibly erroneous decision by the first buyer is propagated as “information” to a herd of others. 动量也可能由投资者的群体行为产生:投资者将他人(可能是随机且无意义的)买卖决策解读为自己交易决策的唯一理由。正如耶鲁大学经济学家罗伯特·席勒在《纽约时报》中所说(Schiller, 2008),没有人拥有做出完全知情的金融决策所需的全部信息。人们不得不依赖他人的判断。然而,没有可靠的方法来辨别他人判断的质量。更麻烦的是,人们在不同时间做出金融决策,而不是在市政厅聚会并达成最终共识。第一个为房子支付高价的人“告知”其他人房子是好投资,这导致另一个人做出相同的决定,依此类推。因此,第一个买家的可能错误决策被作为“信息”传播给一群其他人。
Unfortunately, momentum regimes generated by these two causes (private liquidity needs and herdlike behavior) have highly 不幸的是,由这两种原因(私人流动性需求和群体行为)产生的动量状态具有高度
unpredictable time horizons. How could you know how big an order an institution needs to execute incrementally? How do you predict when the “herd” is large enough to form a stampede? Where is the infamous tipping point? If we do not have a reliable way to estimate these time horizons, we cannot execute a momentum trade profitably based on these phenomena. In a later section on regime switch, I will examine some attempts to predict these tipping or “turning” points. 不可预测的时间范围。你怎么能知道一个机构需要分批执行多大的订单?你如何预测“羊群”何时足够庞大以形成踩踏?臭名昭著的临界点在哪里?如果我们没有可靠的方法来估计这些时间范围,我们就无法基于这些现象盈利地执行动量交易。在后面关于状态切换的章节中,我将探讨一些预测这些临界点或“转折”点的尝试。
There are other causes of momentum that are more predictable: the persistence of roll returns in futures markets, and forced sale and purchases of securities due to risk management or portfolio rebalancing. Both these causes are explored in detail in my second book (Chan, 2013). 还有其他更可预测的动量原因:期货市场中展期收益的持续性,以及由于风险管理或投资组合再平衡而被迫买卖证券。这两个原因在我的第二本书(Chan, 2013)中有详细探讨。
There is one last contrast between mean-reverting and momentum strategies that is worth pondering. What are the effects of increasing competition from traders with the same strategies? For mean-reverting strategies, the effect typically is the gradual elimination of any arbitrage opportunity, and thus gradually diminishing returns down to zero. When the number of arbitrage opportunities has been reduced to almost zero, the mean-reverting strategy is subject to the risk that an increasing percentage of trading signals are actually due to fundamental changes in stocks’ valuation and thus is not going to mean revert. For momentum strategies, the effect of competition is often the diminishing of the time horizon over which the trend will continue. As news disseminates at a faster rate and as more traders take advantage of this trend earlier on, the equilibrium price will be reached sooner. Any trade entered after this equilibrium price is reached will be unprofitable. 均值回归策略和动量策略之间还有一个值得深思的对比。来自采用相同策略的交易者竞争加剧会产生什么影响?对于均值回归策略,通常的影响是套利机会逐渐被消除,收益也逐渐减少直至归零。当套利机会几乎被消除时,均值回归策略面临的风险是,越来越多的交易信号实际上是由于股票估值的基本面变化,因此不会回归均值。对于动量策略,竞争的影响往往是趋势持续的时间范围缩短。随着信息传播速度加快,更多交易者更早地利用这一趋势,均衡价格将更快达到。在达到该均衡价格之后进入的任何交易都将无利可图。
REGIME CHANGE AND CONDITIONAL PARAMETER OPTIMIZATION 制度变更与条件参数优化
The concept of regimes is most basic to financial markets. What else are “bull” and “bear” markets if not regimes? The desire to predict regime changes is also as old as financial markets themselves. 市场状态的概念是金融市场中最基本的。除了状态之外,“牛市”和“熊市”还能算什么?预测状态变化的愿望也和金融市场本身一样古老。
If our attempts to predict the switching from a bull to a bear market were even slightly successful, we could focus our discussion to this one type of switching and call it a day. If only it were that easy. The difficulty with predicting this type of switching encourages researchers to look more broadly at other types of regime switching in the financial markets, hoping to find some that may be more amenable to existing statistical tools. 如果我们对从牛市转向熊市的预测哪怕稍有成效,我们就可以将讨论集中在这一种状态转换上,然后结束讨论。要是事情真这么简单就好了。预测这种转换的困难促使研究人员更广泛地关注金融市场中的其他类型状态转换,希望找到一些更适合现有统计工具的情况。
I have already described two regime changes that are due to changes in market and regulatory structures: decimalization of stock prices in 2003 and the elimination of the short-sale plus-tick rule in 2007. (See Chapter 5 for details.) These regime changes are preannounced by the government, so no predictions of the shifts are necessary, though few people can predict the exact consequences of the regulatory changes. 我已经描述了两种由于市场和监管结构变化引起的状态变化:2003 年股票价格的小数化和 2007 年取消卖空加价规则。(详见第 5 章。)这些状态变化是由政府预先宣布的,因此不需要预测这些变化,尽管很少有人能准确预测监管变化的具体后果。
Some of the other most common financial or economic regimes studied are inflationary vs. recessionary regimes, high- vs. low-volatility regimes, and mean-reverting vs. trending regimes. A more recent regime change may be the rise of retail call options buyers who drove up “meme” stocks’ prices to the stratosphere starting in 2020, due to promotion at the r/WallStreetBets forum at Reddit.com (Kochkodin, 2021). (Those of us who have witnessed the dotcom bubble in 1999 have seen this movie before.) Many well-respected hedge funds (e.g., Melvin Capital) have been brought to their knees due to such regime changes. 其他一些最常研究的金融或经济体制包括通胀型与衰退型体制、高波动性与低波动性体制,以及均值回归型与趋势型体制。一个较新的体制变化可能是散户认购期权买家的崛起,他们自 2020 年起通过 Reddit.com 的 r/WallStreetBets 论坛推广,将“梗”股票的价格推高至天文数字(Kochkodin,2021)。 (经历过 1999 年互联网泡沫的人对此情景并不陌生。)许多备受尊敬的对冲基金(如 Melvin Capital)因这种体制变化而陷入困境。
Regime changes sometimes necessitate a complete change of trading strategy (e.g. trading a mean-reverting instead of momentum strategy). Other times, traders just need to change the parameters of their existing trading strategy. Traders typically adapt their parameters by optimizing them on a moving (or ever expanding) lookback period, but this conventional method is usually too slow in reacting to a rapidly changing market environment. I have come up with a novel way of adapting the parameters of a trading strategy based on machine learning that I call Conditional Parameter Optimization (CPO). This allows traders to adapt new parameters as frequently as they like-perhaps for every single trade. 市场环境的变化有时需要完全更换交易策略(例如,采用均值回归策略而非动量策略)。有时,交易者只需调整现有交易策略的参数。交易者通常通过在一个移动(或不断扩展)的回溯期内优化参数来进行调整,但这种传统方法通常对快速变化的市场环境反应过慢。我提出了一种基于机器学习的交易策略参数调整新方法,称为条件参数优化(Conditional Parameter Optimization,简称 CPO)。这使得交易者可以根据需要频繁调整参数——甚至可以每笔交易都调整一次。
CPO uses machine learning to place orders optimally based on changing market conditions in any market. Traders in these markets CPO 利用机器学习根据任何市场中不断变化的市场条件,最优地进行下单。参与这些市场的交易者
typically already possess a basic trading strategy that decides the timing, pricing, type, and/or size of such orders. This trading strategy will usually have a small number of adjustable parameters (trading parameters) that are often optimized based on a fixed historical data set (train set). Alternatively, they may be periodically reoptimized using an expanding or continuously updated train set. (The latter is often called Walk Forward Optimization.) In either case, this conventional optimization procedure can be called Unconditional Parameter Optimization, as the trading parameters do not respond to rapidly changing market conditions. Even though they may be optimal on average (where the average is taken over by the historical train set), they may not be optimal under every market condition. Even though we may update the train set to update the parameters, the changes in parameter values are typically small since the changes to the train set from one day to the next are necessarily small. Ideally, we would like trading parameters that are much more sensitive to the market conditions and yet are trained on a large enough amount of data. 通常已经拥有一个基本的交易策略,用于决定此类订单的时机、定价、类型和/或规模。该交易策略通常具有少量可调参数(交易参数),这些参数通常基于固定的历史数据集(训练集)进行优化。或者,它们可能会使用不断扩展或持续更新的训练集定期重新优化。(后者通常称为前瞻性优化。)无论哪种情况,这种传统的优化过程都可以称为无条件参数优化,因为交易参数不会对快速变化的市场状况做出反应。尽管它们在平均意义上可能是最优的(平均值是基于历史训练集计算的),但它们在每种市场条件下可能并非最优。即使我们更新训练集以更新参数,参数值的变化通常也很小,因为训练集从一天到下一天的变化必然很小。理想情况下,我们希望交易参数对市场状况更加敏感,同时又能基于足够大量的数据进行训练。
To address this adaptability problem, we apply a supervised machine learning algorithm (specifically, random forest with boosting) to learn from a large predictor (feature) set that captures various aspects of the prevailing market conditions, together with specific values of the trading parameters, to predict the outcome of the trading strategy. (An example outcome is the strategy’s future one-day return.) Once such a machine-learning model is trained to predict the outcome, we can apply it to live trading by feeding in the features that represent the latest market conditions as well as various combinations of the trading parameters. The set of parameters that results in the optimal predicted outcome (e.g., the highest future one-day return) will be selected as optimal, and will be adopted for the trading strategy for the next period. The trader can make such predictions and adjust the trading strategy as frequently as needed to respond to rapidly changing market conditions. The frequency and magnitude of such adjustments is no longer constrained by the large amount of historical data required for robust optimization using conventional unconditional optimization. 为了解决适应性问题,我们应用了一种监督式机器学习算法(具体来说,是带有提升的随机森林),从大量的预测变量(特征)集中学习,这些特征捕捉了当前市场状况的各个方面,以及交易参数的具体数值,以预测交易策略的结果。(一个示例结果是策略未来一天的收益。)一旦训练出这样一个用于预测结果的机器学习模型,我们就可以将其应用于实盘交易,通过输入代表最新市场状况的特征以及各种交易参数组合。产生最佳预测结果(例如,最高未来一天收益)的参数组合将被选为最优,并将在下一周期内采用于交易策略。交易者可以根据需要频繁地进行此类预测并调整交易策略,以应对快速变化的市场状况。此类调整的频率和幅度不再受限于传统无条件优化所需的大量历史数据。
In Example 7.1, I illustrate how we apply CPO using PredictNow.ai’s financial machine learning API to adapt the parameters of a Bollinger Band-based mean reversion strategy on GLD (the gold ETF) and obtain superior results. 在示例 7.1 中,我演示了如何使用 PredictNow.ai 的金融机器学习 API 应用条件参数优化(CPO),以调整基于布林带的均值回归策略在 GLD(金 ETF)上的参数,并获得更优的结果。
Example 7.1: Conditional Parameter Optimization applied to an ETF trading strategy 示例 7.1:条件参数优化应用于 ETF 交易策略
(This example is reproduced from a blog post on predictnow.ai/blog.) (本示例摘自 predictnow.ai/blog 上的一篇博客文章。)
To illustrate the CPO technique, we describe below an example trading strategy on an ETF. 为了说明 CPO 技术,下面我们描述一个 ETF 的示例交易策略。
This strategy uses the lead-lag relationship between the GLD and GDX ETFs using 1-minute bars from January 1, 2006, until December 31, 2020, splitting it 80%//20%80 \% / 20 \% between train/test periods. The trading strategy has 3 trading parameters: the hedge ratio (GDX_weight), entry threshold (entry_threshold), and a moving lookback window (lookback). The spread is defined as 该策略利用 GLD 和 GDX ETF 之间的领先-滞后关系,使用 2006 年 1 月 1 日至 2020 年 12 月 31 日的 1 分钟 K 线数据,将数据按 80%//20%80 \% / 20 \% 划分为训练/测试期。该交易策略有 3 个交易参数:对冲比率(GDX_weight)、入场阈值(entry_threshold)和移动回溯窗口(lookback)。价差定义为
" Spread "(t)=GLD_"close "(t)-GDX_"close "(t)xx GDX_"weight. "\text { Spread }(t)=G L D \_ \text {close }(t)-G D X \_ \text {close }(t) \times G D X \_ \text {weight. }
We may enter a trade for GLD at time tt, and exit it at time t+1t+1 minute, hopefully realizing a profit. We want to optimize the three trading parameters on a 5xx10 xx85 \times 10 \times 8 grid. The grid is defined as follows: 我们可能在时间 tt 进入 GLD 交易,并在时间 t+1t+1 分钟退出,希望实现盈利。我们希望在一个 5xx10 xx85 \times 10 \times 8 网格上优化这三个交易参数。该网格定义如下:
To be clear, even though we are using GLD and GDX prices and functions of these prices to make trading decisions, we only trade GLD, unlike the typical long-short pair trading setup. 需要说明的是,尽管我们使用 GLD 和 GDX 的价格及其函数来做交易决策,但我们只交易 GLD,这与典型的多空配对交易设置不同。
Every minute we compute Spread(t) in equation (1), and compute its “Bollinger Bands,” conventionally defined as 每分钟我们计算公式(1)中的 Spread(t),并计算其“布林带”,通常定义为
Z_(-)score(t)=(Spread(t)-Spread _EMA(t))/(sqrt(Spread _VAR(t)))Z_{-} \operatorname{score}(t)=\frac{\operatorname{Spread}(t)-\operatorname{Spread} \_E M A(t)}{\sqrt{\operatorname{Spread} \_V A R(t)}}
where Spread_EMA is the exponential moving average of the Spread, and Spread_VAR is its exponential moving variance (see the endnote for their conventional definitions). 其中 Spread_EMA 是 Spread 的指数移动平均,Spread_VAR 是其指数移动方差(有关它们的常规定义,请参见尾注)。
Similar to a typical mean-reverting strategy using Bollinger Bands, we trade into a new GLD position based on these rules: 类似于使用布林带的典型均值回归策略,我们根据以下规则进行新的 GLD 持仓交易:
a. Buy GLD if Z_score < -entry_threshold (resulting in long position). a. 如果 Z_score < -entry_threshold,则买入 GLD(形成多头仓位)。
b. Short GLD if Z_score > entry_threshold (resulting in short position). b. 如果 Z_score > entry_threshold,则做空 GLD(形成空头仓位)。
c. Liquidate long position if Z_score > exit_threshold. c. 如果 Z_score > exit_threshold,则平掉多头仓位。
d. Liquidate short position if Z_score < -exit_threshold. d. 如果 Z_score < -exit_threshold,则平掉空头仓位。
exit_threshold can be anywhere between entry_threshold and -entry_ threhold. After optimization in the train set, we set exit_threshold = -0.6^(**)-0.6^{*} entry_threshold and keep that relationship fixed when we vary entry_ threshold in our future (unconditional or conditional) parameter optimizations. We trade the strategy on 1-minute bars between 9:30 and 15:59 ET, and liquidate any position at 16:00. For each combination of our three trading parameters, we record the daily return of the resulting intraday strategy and form a time series of daily strategy returns, to be used as labels for our machine learning step in CPOs. Note that since the trading strategy may execute multiple round trips per day before forced liquidation at the market close, this daily strategy return is the sum of such round-trip returns. exit_threshold 可以在 entry_threshold 和-entry_threshold 之间的任意位置。在训练集优化后,我们设定 exit_threshold = -0.6^(**)-0.6^{*} entry_threshold,并在未来(无条件或有条件)参数优化中保持该关系不变。我们在东部时间 9:30 至 15:59 之间以 1 分钟 K 线进行策略交易,并在 16:00 平掉所有仓位。对于三种交易参数的每一种组合,我们记录由此产生的日内策略的每日收益,并形成每日策略收益的时间序列,作为 CPO 机器学习步骤的标签。注意,由于交易策略可能在市场收盘强制平仓前每天执行多次往返交易,因此该每日策略收益是这些往返交易收益的总和。
Unconditional vs. Conditional Parameter Optimizations 无条件与有条件参数优化
In conventional, unconditional, parameter optimization, we select the three trading parameters (GDX_weight, entry threshold, and lookback) that maximize cumulative in-sample return over the three-dimensional parameter grid using exhaustive search. (Gradient-based optimization did not work due to multiple local maxima.) We use that fixed set of three optimal trading parameters to specify the strategy out-of-sample on the test set. 在传统的无条件参数优化中,我们通过穷举搜索,在三维参数网格上选择使样本内累计收益最大化的三个交易参数(GDX_weight、入场阈值和回溯期)。 (基于梯度的优化方法由于存在多个局部极大值而无法奏效。)我们使用这组固定的三个最优交易参数,在测试集上指定样本外策略。
With conditional, parameter optimization, the set of trading parameters used each day depends on a predictive machine-learning model trained on the train set. This model will predict the future one-day return of our trading strategy, given the trading parameters and other market conditions. Since the trading parameters can be varied at will (i.e., they are control variables), we can predict a different future return for many sets of trading parameters each day, and select the optimal set that predicts the highest future return. That optimal parameter set will be used for the trading strategy for the next day. This step is taken after the current day’s market close and before the market open of the next day. 在条件参数优化中,每天使用的交易参数集取决于在训练集上训练的预测机器学习模型。该模型将根据交易参数和其他市场条件,预测我们交易策略未来一天的收益。由于交易参数可以随意调整(即它们是控制变量),我们每天可以预测多个交易参数集对应的不同未来收益,并选择预测未来收益最高的最优参数集。该最优参数集将用于下一天的交易策略。此步骤在当日市场收盘后、次日开盘前进行。
In addition to the three trading parameters, the predictors (or “features”) for input to our machine learning model are eight technical indicators obtained from the Technical Analysis Python library: Bollinger Bands Z-score, Money Flow, Force Index, Donchian Channel, Average True Range, Awesome Oscillator, and Average Directional Index. We choose these indicators to represent the market conditions. Each indicator actually produces 2xx72 \times 7 features, since we apply them to each of the ETFs GLD and GDX price 除了三个交易参数外,输入到我们的机器学习模型的预测变量(或“特征”)是从 Technical Analysis Python 库中获得的八个技术指标:布林带 Z 分数、资金流量、力量指数、唐奇安通道、平均真实波幅、极限振荡器和平均方向指数。我们选择这些指标来代表市场状况。每个指标实际上产生 2xx72 \times 7 个特征,因为我们将它们应用于 ETF GLD 和 GDX 的价格
series, and each was computed using seven different lookback windows: 50,100,200,400,800,160050,100,200,400,800,1600, and 3200 minutes. (Note: This is not the same as the trading parameter “lookback” described earlier.) Hence, there are a total of 3+8xx2xx7=1153+8 \times 2 \times 7=115 features used in predicting the future oneday return of the strategy. But because there are 5xx10 xx8=4005 \times 10 \times 8=400 combinations of the three trading parameters, each trading day comes with 400 rows of training data that looks something like the table below (labels are not displayed): 序列,并且每个指标使用了七个不同的回溯窗口计算: 50,100,200,400,800,160050,100,200,400,800,1600 ,以及 3200 分钟。(注:这与前面描述的交易参数“回溯”不同。)因此,总共有 3+8xx2xx7=1153+8 \times 2 \times 7=115 个特征用于预测策略未来一天的收益。但由于三个交易参数有 5xx10 xx8=4005 \times 10 \times 8=400 种组合,每个交易日会有 400 行训练数据,类似下表(标签未显示):
After the machine learning model is trained, we can use it for live predictions and trading. Each trading day after the market closes, we prepare an input vector, which is structured like one row of the table above, populated with one particular set of the trading parameters and the current values of the technical indicators, and use the machine learning model to predict the trading strategy’s return on the next day. We do that 400 times, varying the trading parameters, but obviously not the technical indicators’ values, and find out which trading parameter set predicts the highest return. We adopt that optimal set for the trading strategy next day. In mathematical terms, 在机器学习模型训练完成后,我们可以将其用于实时预测和交易。每个交易日市场收盘后,我们准备一个输入向量,其结构类似于上表中的一行,填充有一组特定的交易参数和当前技术指标的数值,然后使用机器学习模型预测下一天交易策略的收益。我们这样做 400 次,改变交易参数,但显然不改变技术指标的数值,找出哪组交易参数预测的收益最高。我们采用该最优参数集作为下一天交易策略的参数。用数学语言表示,
(GDX_weight_optimal, entry_threshold_optimal, lookback_optimal) = argmax_((GDX_"weight, entry_threshold, lookback) "^({)cdots:})\operatorname{argmax}_{\left(G D X \_ \text {weight, entry_threshold, lookback) }{ }^{\{ } \cdots\right.} ^("predict "){ }^{\text {predict }} (GDX_weight, entry_threshold, lookback, technical indicators) }\}
where predict is the predictive function available from our machine learning website predictnow.ai’s API, which uses random forest with boosting as the training algorithm. The sample Python Jupyter notebook code fragment for training a model and using it for predictions is displayed here. (The code won’t work unless you sign up for a trial with predictnow.ai.) 其中 predict 是我们机器学习网站 predictnow.ai 的 API 提供的预测函数,该函数使用带提升的随机森林作为训练算法。下面展示了用于训练模型并进行预测的 Python Jupyter 笔记本代码示例片段。(除非您注册了 predictnow.ai 的试用,否则代码无法运行。)
# TO BEGIN ANY WORK WITH PREDICTNOW.AI CLIENT, WE START
BY IMPORTING AND CREATING A CLASS INSTANCE
from predictnow.pdapi import PredictNowClient
import pandas as pd
api_key = "%KeyProvidedToEachOfOurSubscriber"
api_host = "http://12.34.567.890:1000" # our SaaS server
username = "helloWorld"
email = "helloWorld@yourmail.com"
client = PredictNowClient(api_host,api_key)
# You will need to edit this input dataset file path and
labelname!
file_path = 'my_amazing_features.xlsx'
labelname = 'Next_day_strategy_return'
import os
# FANTASTIC JOB! NOW YOUR PREDICTNOW.AI CLIENT HAS BEEN
SETUP.
# For classification problems
#params = {'timeseries': 'yes', 'type':
'classification', 'feature_selection': 'shap', 'anal-
ysis': 'none', 'boost': 'gbdt', 'testsize': '0.2',
'weights': 'no', 'eda': 'yes', 'prob_calib': 'no',
'mode': 'train'}
# For regression problems, suitable for CPO
params = {'timeseries': 'yes', 'type': 'regression',
'feature_selection': 'none', 'analysis': 'none',
'boost': 'gbdt', 'testsize': '0.2', 'weights': 'no',
'eda': 'yes', 'prob_calib': 'no', 'mode': 'train'}
# LET'S CREATE THE MODEL BY SENDING THE PARAMETERS TO
PREDICTNOW.AI
response = client.create_model(
username=username, # only letters, numbers, or underscores
model_name="test1",
params=params,
)
# LET'S LOAD UP THE FILE TO PANDAS IN THE LOCAL ENVIRONMENT
from pandas import read_csv # If you have the Excel
file, replace read_csv with read_excel
from pandas import read_excel
df = read_excel(file_path, engine="openpyxl") # Same here
df.name = "testdataframe" # Optional, but recommended
response = client.train(
model_name="test1",
input_df=df,
label=labelname,
username=username,
email=email,
return_output=False,
)
print("FANTASTIC! YOUR FIRST-EVER MODEL TRAINING AT PRE-
DICTNOW.AI HAS BEEN COMPLETED!")
print(response)
# User can now examine the train/test sets results from
the model by calling the getresult function (and
providing the name of the model that resides on
Predictnow.ai server
status = client.getstatus(username=username, train_
id=response["train_id"])
if status["state"] == "COMPLETED":
response = client.getresult(
model_name="test1",
username=username,
)
import pandas as pd
predicted_targets_cv = pd.read_json(response.
predicted_targets_cv)
print("predicted_targets_cv")
print(predicted_targets_cv)
predicted_targets_test = pd.read_json(response.
predicted_targets_test)
print("predicted_targets_test")
print(predicted_targets_test)
performance_metrics = pd.read_json(response.
performance_metrics)
print("performance_metrics")
print(performance_metrics)
# # Now we can make LIVE predictions for many
combinations of the parameters by populating many
rows in the example_input_live.csv file with these
parameter combinations
if status["state"] == "COMPLETED":
df = read_csv("example_input_live.csv") # Input data
for live prediction
df.name = "myfirstpredictname" # optional, but
recommended
# Making live predictions
response = client.predict(
model_name="test1",
input_df=df,
username=username,
eda="yes",
prob_calib=params["prob_calib"],
)
# FOR LIVE PREDICTION: (remember labels and
probabilities each can have many rows
corresponding to many combinations of parameters
y_pred = pd.read_json(response.labels)
print("THE LABELS")
print(labels)
An example output labels file from this step looks like this:
\begin{tabular}{|l|l|}
\hline Date & pred_target \\
\hline 2020-12-24 2.5_30_0.2 20218132334 & 0.011875 \\
\hline 2020-12-24 2.5_60_0.2 20218132344 & 0.012139 \\
\hline 2020-12-24 2.5_90_0.2 20218132354 & 0.012139 \\
\hline 2020-12-24 2.5_120_0.2 20218132364 & 0.012975 \\
\hline 2020-12-24 2.5_180_0.2 20218132374 & 0.012975 \\
\hline 2020-12-24 2.5_240_0.2 20218132384 & 0.012975 \\
\hline 2020-12-24 2.5_360_0.2 20218132394 & 0.012975 \\
\hline 2020-12-24 2.5_720_0.2 20218132404 & 0.012975 \\
\hline
\end{tabular}
where \(2.5 \_30 \_0.2\) is one parameter combination, and \(2.5 \_60 \_0.2\) is another.
input_features= df['Date'].values
for i in range(len(input_features)):
#/ split y_pred['Date'] into actual date and
parameters string
date_params =input_features[i].split(' ')
params = date_params[1]
if i==0:
#initializing max_index and its parameters value
#E.g. (2.5, 60, 0.2)
params_cond_optimized = params
y_pred_max = y_pred[i].values
else:
if y_pred[i].values >= y_pred_max:
#updating max_index and its parameters value
params_cond_optimized = params
y_pred_max = y_pred[i].values
# params_cond_optimized is "conditionally optimal"
parameters for the
# next day
It is important to understand that unlike a naïve application of machine learning to predict GLD’s one-day return using technical indicators, we are using machine learning to predict the return of a trading strategy applied to GLD given a set of trading parameters, and using those predictions to optimize these parameters on a daily basis. The naïve approach is less likely to succeed because everybody is trying to predict GLD’s (i.e., gold’s) returns and inviting arbitrage activities, but nobody (until they read this book!) is predicting the returns of this particular GLD trading strategy. Furthermore, many traders do not like using machine learning as a black box to predict returns. In CPO, the trader’s own strategy is making the actual predictions. Machine learning is merely used to optimize the parameters of this trading strategy. This provides for a much greater degree of transparency and interpretability. 重要的是要理解,与使用技术指标对 GLD 的一日回报进行天真的机器学习预测不同,我们使用机器学习来预测在给定一组交易参数下应用于 GLD 的交易策略的回报,并利用这些预测每天优化这些参数。天真的方法不太可能成功,因为每个人都在试图预测 GLD(即黄金)的回报,并引发套利活动,但没有人(直到他们读了这本书!)在预测这个特定 GLD 交易策略的回报。此外,许多交易者不喜欢将机器学习作为一个黑箱来预测回报。在 CPO 中,交易者自己的策略在做实际的预测。机器学习仅用于优化该交易策略的参数。这提供了更高程度的透明度和可解释性。
Performance Comparisons 绩效比较
We compare out-of-sample test set performance of Unconditional vs. Conditional Parameter Optimization on the last three years of data ending on December 31, 2020, and find the cumulative three-year return to be 73%73 \% and 83%83 \%, respectively. All other metrics are improved using CPO. The comparable equity curves can be found in Figure 7.1. 我们比较了无条件参数优化与条件参数优化在截至 2020 年 12 月 31 日的最近三年数据上的样本外测试集表现,发现三年累计收益分别为 73%73 \% 和 83%83 \% 。使用条件参数优化,所有其他指标均有所提升。可比的权益曲线见图 7.1。
A time series is stationary if it never drifts farther and farther away from its initial value. In technical terms, stationary time series are “integrated of order zero,” or I(0) (Alexander, 2001). It is obvious that if the price series of a security is stationary, it would be a great candidate for a mean-reversion strategy. Unfortunately, most stock price series are not stationary-they exhibit a geometric random walk that gets them farther and farther away from their starting (i.e., initial public offering) values. However, you can often find a pair of stocks such that if you long one and short the other, the market value of the pair is stationary. If this is the case, then the two individual time series are said to be cointegrated. They are so described because a linear combination of them is integrated of order zero. Typically, two stocks that form a cointegrating pair are from the same industry group. Traders have long been familiar with this socalled pair-trading strategy. They buy the pair portfolio when the spread of the stock prices formed by these pairs is low, and sell/ short the pair when the spread is high-in other words, a classic mean-reverting strategy. 时间序列是平稳的,如果它不会越来越远离其初始值。从技术角度讲,平稳时间序列是“零阶整合”的,或称为 I(0)(Alexander,2001)。显然,如果某证券的价格序列是平稳的,那么它将是均值回归策略的理想候选对象。不幸的是,大多数股票价格序列并非平稳——它们表现出几何随机游走,使其价格越来越远离起始值(即首次公开募股时的价格)。然而,你常常可以找到一对股票,如果你做多其中一只并做空另一只,这对股票的市场价值是平稳的。如果是这样,那么这两个单独的时间序列被称为协整。之所以这样称呼,是因为它们的线性组合是零阶整合的。通常,形成协整对的两只股票来自同一行业组。交易者早已熟悉这种所谓的配对交易策略。当由这对股票形成的价差较低时,他们买入该配对组合;当价差较高时,他们卖出/做空该配对——换句话说,这是一种经典的均值回归策略。
FIGURE 7.2 A stationary time series formed by the spread between GLD and GDX. 图 7.2 由 GLD 和 GDX 之间价差形成的平稳时间序列。
An example of a pair of cointegrating price series is the gold ETF GLD versus the gold miners ETF, GDX, which I discussed in Example 3.6. If we form a portfolio with long 1 share of GLD and short 1.631 share of GDX, the prices of the portfolio form a stationary time series (see Figure 7.2). The exact number of shares of GLD and GDX can be determined by a regression fit of the two component time series (see Example 7.2). Note that just like Example 3.6, I have used only the first 252 data points as the training set for this regression. 一对协整价格序列的例子是黄金 ETF GLD 与黄金矿业 ETF GDX,我在示例 3.6 中讨论过。如果我们构建一个投资组合,做多 1 股 GLD 并做空 1.631 股 GDX,该投资组合的价格将形成一个平稳时间序列(见图 7.2)。GLD 和 GDX 的具体持股比例可以通过对两个组成时间序列进行回归拟合确定(见示例 7.2)。注意,就像示例 3.6 中一样,我只使用了前 252 个数据点作为该回归的训练集。
Example 7.2: How to Form a Good Cointegrating (and Mean-Reverting) Pair of Stocks 示例 7.2:如何构建一个良好的协整(且均值回复)股票对
As I explained in the main text, if you are long one security and short another one in the same industry group and in the right proportion, sometimes the combination (or “spread”) becomes a stationary series. A stationary series is an excellent candidate for a mean-reverting strategy. This example teaches you how to use a free MATLAB package, downloadable at 正如正文中所解释的,如果你做多一只证券并做空同一行业组中的另一只证券,且比例合适,有时这种组合(或“价差”)会成为一个平稳序列。平稳序列是均值回复策略的极佳候选对象。本例将教你如何使用一个免费的 MATLAB 软件包,下载地址为 www.spatial-econometrics.com, to determine if two price series are cointegrated and, if so, how to find the optimal hedge ratio (i.e., the number of shares of the second security versus one share of the first security). www.spatial-econometrics.com,用于确定两个价格序列是否协整,如果是,如何找到最优对冲比率(即第二个证券的股票数量与第一个证券一股的比例)。
The main method used to test for cointegration is called the cointegrating augmented Dickey-Fuller test, hence the function name cadf. A detailed description of this method can be found in the manual, also available on the same website mentioned earlier. 测试协整的主要方法称为协整增强型迪基-富勒检验,因此函数名为 cadf。该方法的详细描述可以在手册中找到,手册也可在前面提到的同一网站上获得。
Using MATLAB 使用 MATLAB
The following program is available online as epchan.com/book/ example7_2.m: 以下程序可在线获取,网址为 epchan.com/book/example7_2.m:
% make sure previously defined variables are erased.
clear;
% read a spreadsheet named "GLD.xls" into MATLAB.
[num, txt]=xlsread('GLD');
% the first column (starting from the second row) is
% the trading days in format mm/dd/yyyy.
tday1=txt(2:end, 1);
% convert the format into yyyymmdd.
tday1=..
datestr(datenum(tday1, 'mm/dd/yyyy'), 'yyyymmdd');
% convert the date strings first into cell arrays and
% then into numeric format.
tday1=str2double(cellstr(tday1));
% the last column contains the adjusted close prices.
adjclsl=num(:, end);
% read a spreadsheet named "GDX.xls" into MATLAB.
[num2, txt2]=xlsread('GDX');
% the first column (starting from the second row) is
% the trading days in format mm/dd/yyyy.
tday2=txt2(2:end, 1);
% convert the format into yyyymmdd.
tday2 = . .
datestr(datenum(tday2, 'mm/dd/yyyy'), 'yyyymmdd');
% convert the date strings first into cell arrays and
% then into numeric format.
tday2=str2double(cellstr(tday2));
adjcls2=num2(:, end);
% find all the days when either GLD or GDX has data.
tday=union(tday1, tday2);
[foo idx idx1]=intersect(tday, tday1);
% combining the two price series
adjcls=NaN(length(tday), 2);
adjcls(idx, 1)=adjcls1(idx1);
[foo idx idx2]=intersect(tday, tday2);
adjcls(idx, 2)=adjcls2(idx2);
% days where any one price is missing
baddata=find(any(~isfinite(adjcls), 2));
tday(baddata)=[];
adjcls(baddata,:)=[];
trainset=1:252; % define indices for training set
vnames=strvcat('GLD', 'GDX');
adjcls=adjcls(trainset, :);
tday=tday(trainset, :);
% run cointegration check using
% augmented Dickey-Fuller test
res=cadf(adjcls(:, 1), adjcls(:, 2), 0, 1);
prt(res, vnames);
% Output from cadf function:
% Augmented DF test for co-integration variables:
GLD,GDX
\begin{tabular}{rrrr}
\% CADF t-statistic & \# of lags & AR(1) estimate \\
\(\%\) & -3.18156477 & 1 & -0.070038
\end{tabular}
%
% 1% Crit Value 5% Crit Value 10% Crit Value
% -3.924 -3.380 -3.082
% The t-statistic of -3.18 which is in between the 5%
Crit Value of -3.38
% and the 10% Crit Value of -3.08 means that there is a
better than 90%
% probability that these 2 time series are cointegrated.
results=ols(adjcls(:, 1), adjcls(:, 2));
hedgeRatio=results.beta
z=results.resid;
% A hedgeRatio of 1.6766 was found.
% I.e. GLD=1.6766*GDX + z, where z can be
% interpreted as the
% spread GLD-1.6766*GDX and should be stationary.
% This should produce a chart similar to Figure 7.2.
plot(z);
Using Python 使用 Python
The following program is available as epchan.com/book/example7_2.ipynb:
How to Form a Good Cointegrating (and Mean-Reverting) Pair of Stocks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import coint
from statsmodels.api import OLS
df1=pd.read_excel('GLD.xls')
df2=pd.read_excel('GDX.xls')
df=pd.merge(df1, df2, on='Date', suffixes=('_GLD', '_GDX'))
df.set_index('Date', inplace=True)
df.sort_index(inplace=True)
trainset=np.arange(0, 252)
df=df.iloc[trainset,]
Run cointegration (Engle-Granger) test
coint_t, pvalue, crit_value=coint(df['Adj Close_GLD'],
df['Adj Close_GDX'])
(coint_t, pvalue, crit_value) # abs(t-stat) > critical
value at 95%. pvalue says probability of null hypo-
thesis (of no cointegration) is only 1.8%
(-2.3591268376687244,
0.3444494880427884,
array([-3.94060523, -3.36058133, -3.06139039]))
Determine hedge ratio
model=OLS(df['Adj Close_GLD'], df['Adj Close_GDX'])
results=model.fit()
hedgeRatio=results.params
hedgeRatio
Adj Close_GDX 1.631009
dtype: float64
spread = GLD - hedgeRatio*GDX
spread=df['Adj Close_GLD']-hedgeRatio[0]*df['Adj Close_GDX']
plt.plot(spread)
You may notice that the Python code's Engle-Granger test generates a \(t\)-statistic of -2.4 , whose absolute value is less than the \(90 \%\) critical value, indicating that the two series are not cointegrating. This contradicts the results of the MATLAB cadf test. Which should we trust? Let me just say that Python's libraries are free and come with no guarantees on accuracy nor correctness, whereas MATLAB employs a staff of numerous PhD computer scientists and statisticians.
Using R\mathbf{R} 使用 R\mathbf{R}
You can download the R code as example7_2.R.
# Need the zoo package for its na.locf function
install.packages('zoo')
# Need the CADFtest package for its CADFtest function
install.packages('CADFtest')
library('zoo')
library('CADFtest')
datal <- read.delim("GLD.txt") # Tab-delimited
data_sort1 <- data1[order(as.Date(datal[,1],
'%m/%d/%Y')),] # sort in ascending order of dates (1st
column of data)
tday1 <- as.integer(format(as.Date(data_sort1[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls1 <- data_sort1[,ncol(data_sort1)]
data2 <- read.delim("GDX.txt") # Tab-delimited
data_sort2 <- data2[order(as.Date(data2[,1],
'%m/%d/%Y')),] # sort in ascending order of dates (1st
column of data)
tday2 <- as.integer(format(as.Date(data_sort2[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls2 <- data_sort2[,ncol(data_sort2)]
# find the intersection of the two data sets
tday <- intersect(tday1, tday2)
adjcls1 <- adjcls1[tday1 %in% tday]
adjcls2 <- adjcls2[tday2 %in% tday]
# CADFtest cannot have NaN values in input
adjcls1 <- zoo::na.locf(adjcls1)
adjcls2 <- zoo::na.locf(adjcls2)
mydata <- list(GLD=adjcls1, GDX=adjcls2);
trainset <- 1:252
res <- CADFtest(model=GLD~GDX, data=mydata, type =
"drift", max.lag.X=1, subset=trainset)
summary(res) # As the following input shows, p-value is
about 0.005; hence we can reject null hypothesis of no
cointegration at 99.5% level.
# Covariate Augmented DF test
# CADF test
# t-test statistic: -3.240868894
# estimated rho^2: 0.260414676
# p-value: 0.004975155
# Max lag of the diff. dependent variable: 1.000000000
# Max lag of the stationary covariate(s): 1.000000000
# Max lead of the stationary covariate(s): 0.000000000
#
# Call:
# dynlm(formula = formula(model), start = obs.1, end =
obs.T)
#
# Residuals:
\begin{tabular}{cccccc}
\(\#\) & Min & 1Q & Median & 3Q & Max \\
\(\#\) & -2.70728 & -0.26235 & 0.00595 & 0.29684 & 1.47164
\end{tabular}
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.07570 0.29814-0.254 0.79970
# L(y, 1) -0.03817 0.01178 -3.241 0.00498 **
# L(d(y), 1) 0.08542 0.03077 2.776 0.00578 **
# L(X, 0) 0.75428 0.02802 26.919 < 2e-16 ***
# L(X, 1) -0.68942 0.03161-21.812 < 2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.'
0.1 ' ' 1
#
# Residual standard error: 0.474 on 378 degrees of freedom
# Multiple R-squared: 0.6577, Adjusted R-squared: 0.6541
# F-statistic: 241.8 on 3 and 378 DF, p-value: < 2.2e-16
# determines the hedge ratio
lmresult <- lm(GLD ~ 0 + GDX, mydata, subset=trainset )
hedgeRatio <- coef(lmresult) # 1.631009
z <- residuals(lmresult) # The residuals should be
stationary (mean-reverting)
plot(z) # This should produce a chart similar to Figure 7.2.
The R code's Engle-Granger test generates a \(t\)-statistic of -3.2 , which rejects the null hypothesis that the pair is not cointegrating. This corroborates the MATLAB cadf test, while repudiating the Python's result. Moral of the story: Do not trust Python's statistics and econometrics packages.
In case you think that any two stocks in the same industry group would be cointegrating, here is a counterexample: KO (Coca-Cola) versus PEP (Pepsi). The same cointegration test as used in Example 7.1 tells us that there is a less than 90 percent probability that they are cointegrated. (You should try it yourself and then compare with my program epchan.com/book/example7_3.m.) If you use linear regression to find the best fit between KO and PEP, the plot of the time series will resemble Figure 7.3. 如果你认为同一行业组内的任何两只股票都会协整,这里有一个反例:KO(可口可乐)与 PEP(百事可乐)。使用与示例 7.1 中相同的协整检验告诉我们,它们协整的概率不到 90%。(你应该自己试试,然后与我的程序 epchan.com/book/example7_3.m 进行比较。)如果你用线性回归来寻找 KO 和 PEP 之间的最佳拟合,时间序列的图形将类似于图 7.3。
If a price series (of a stock, a pair of stocks, or, in general, a portfolio of stocks) is stationary, then a mean-reverting strategy is guaranteed to be profitable, as long as the stationarity persists into the future (which is by no means guaranteed). However, the converse is not true. You don’t necessarily need a stationary price series in order to have a successful mean-reverting strategy. Even a nonstationary price series can have many short-term reversal opportunities that one can exploit, as many traders have discovered. 如果一个价格序列(无论是单只股票、一对股票,还是一般的股票组合)是平稳的,那么只要这种平稳性持续到未来(这并不保证),均值回归策略就一定是有利可图的。然而,反之则不然。你不一定需要一个平稳的价格序列才能拥有成功的均值回归策略。即使是非平稳的价格序列,也可能存在许多短期反转机会,正如许多交易者已经发现的那样。
FIGURE 7.3 A nonstationary time series formed by the spread between KO and PEP. 图 7.3 由 KO 和 PEP 价差形成的非平稳时间序列。
Many pair traders are unfamiliar with the concepts of stationarity and cointegration. But most of them are familiar with correlation, which superficially seems to mean the same thing as cointegration. Actually, they are quite different. Correlation between two price series actually refers to the correlations of their returns over some time horizon (for concreteness, let’s say a day). If two stocks are positively correlated, there is a good chance that their prices will move in the same direction most days. However, having a positive correlation does not say anything about the long-term behavior of the two stocks. In particular, it doesn’t guarantee that the stock prices will not grow farther and farther apart in the long run, even if they do move in the same direction most days. However, if two stocks were cointegrated and remain so in the future, their prices (weighted appropriately) will be unlikely to diverge. Yet their daily (or weekly, or any other time horizon) returns may be quite uncorrelated. 许多配对交易者对平稳性和协整的概念不太熟悉。但他们大多数人都熟悉相关性,表面上看相关性似乎与协整含义相同。实际上,它们有很大不同。两个价格序列之间的相关性实际上指的是它们在某个时间范围内(具体来说,比如一天)的收益率相关性。如果两只股票正相关,那么它们的价格大多数天很可能会朝同一方向变动。然而,正相关并不能说明两只股票的长期行为。特别是,它并不保证股票价格在长期内不会越来越远,即使它们大多数天确实朝同一方向变动。然而,如果两只股票是协整的,并且未来仍保持协整,那么它们的价格(经过适当加权)不太可能出现发散。但它们的每日(或每周,或任何其他时间范围)收益率可能完全不相关。
As an artificial example of two stocks, A and B , that are cointegrated but not correlated, see Figure 7.4. Stock B clearly doesn’t move in any correlated fashion with stock A: Some days they move in the same direction, other days the opposite. Most days, stock B doesn’t move at all. But notice that the spread in stock prices between A and B always returns to about $1\$ 1 after a while. 作为两个股票 A 和 B 的一个人工示例,它们是协整的但不相关,见图 7.4。股票 B 显然没有以任何相关的方式与股票 A 一起移动:有些天它们朝同一方向移动,其他天则相反。大多数天,股票 B 根本不动。但请注意,股票 A 和 B 之间的价差总是在一段时间后回到大约 $1\$ 1 。
FIGURE 7.4 Cointegration is not correlation. Stocks A and B are cointegrated but not correlated. 图 7.4 协整不是相关。股票 A 和 B 是协整的但不相关。
Can we find a real-life example of this phenomenon? Well, KO versus PEP is one. In the program example7_3.m, I have shown that they do not cointegrate. If, however, you test their daily returns for correlation, you will find that their correlation of 0.4849 is indeed statistically significant. The correlation test is presented at the end of the example7_3.m program and shown here in Example 7.3. 我们能找到这种现象的现实例子吗?嗯,KO 与 PEP 就是一个例子。在程序 example7_3.m 中,我已经展示了它们不协整。然而,如果你测试它们的日收益率的相关性,你会发现它们的相关系数 0.4849 确实具有统计显著性。相关性测试在 example7_3.m 程序的末尾给出,并在示例 7.3 中展示。
Example 7.3: Testing the Cointegration versus Correlation Properties between KO and PEP 示例 7.3:测试 KO 和 PEP 之间的协整与相关性特性
The cointegration test for KO and PEP is the same as that for GDX and GLD in Example 7.2, so it won’t be repeated here. (It is available from epchan.com/book/example7_3.m.) The cointegration result shows that the tt-statistic for the augmented Dicky-Fuller test is -2.14 , larger than the 10 percent critical value of -3.038 , meaning that there is a less than 90 percent probability that these two time series are cointegrated. KO 和 PEP 的协整检验与示例 7.2 中 GDX 和 GLD 的检验相同,因此这里不再重复。(代码可从 epchan.com/book/example7_3.m 获取。)协整结果显示,增强型迪基-富勒检验的 tt 统计量为-2.14,大于 10%临界值-3.038,意味着这两个时间序列协整的概率低于 90%。
The following code fragment, however, tests for correlation between the two time series: 然而,以下代码片段用于检验这两个时间序列之间的相关性:
Using MATLAB 使用 MATLAB
You can download the MATLAB code as example7_3.m. 你可以下载 MATLAB 代码,文件名为 example7_3.m。
% A test for correlation.
dailyReturns=(adjcls-lag1(adjcls))./lag1(adjcls);
[R,P]=corrcoef(dailyReturns(2:end,:));
% R =
%
% 1.0000 0.4849
% 0.4849 1.0000
%
%
% P =
%
% 1 0
% 0 1
% The P value of 0 indicates that the two time series
% are significantly correlated.
Using Python 使用 Python
You can download the Python Jupyter notebook as example7_3.ipynb. 您可以下载名为 example7_3.ipynb 的 Python Jupyter 笔记本。
How to Form a Good Cointegrating (and Mean-Reverting)
Pair of Stocks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import coint
from statsmodels.api import OLS
from scipy.stats import pearsonr
dfl=pd.read_excel('KO.xls')
df2=pd.read_excel('PEP.xls')
df=pd.merge(df1, df2, on='Date', suffixes=('_KO', '_PEP'))
df.set_index('Date', inplace=True)
df.sort_index(inplace=True)
Run cointegration (Engle-Granger) test
coint_t, pvalue, crit_value=coint(df['Adj Close_KO'],
df['Adj Close_PEP'])
(coint_t, pvalue, crit_value) # abs(t-stat) < critical
value at 90%. pvalue says probability of null
hypothesis (of no cointegration) is 73%
(-1.5815517041517178,
0.7286134576473527,
array([-3.89783854, -3.33691006, -3.04499143]))
Determine hedge ratio
model=OLS(df['Adj Close_KO'], df['Adj Close_PEP'])
results=model.fit()
hedgeRatio=results.params
hedgeRatio
Adj Close_PEP 1.011409
dtype: float64
spread = KO - hedgeRatio*PEP
spread=df['Adj Close_KO']-hedgeRatio[0]*df['Adj Close_PEP']
plt.plot(spread) # Figure 7.2
[<matplotlib.lines.Line2D at 0x2728e431b00>]
png
png
Correlation test
dailyret=df.loc[:, ('Adj Close_KO', 'Adj Close_PEP')].
pct_change()
dailyret.corr()
Adj Close_KO
Adj Close_PEP
Adj Close_KO
1.000000
0.484924
Adj Close_PEP
0.484924
1.000000
dailyret_clean=dailyret.dropna()
pearsonr(dailyret_clean.iloc[:,0], dailyret_clean.
iloc[:,1]) # first output is correlation coefficient,
second output is pvalue.
(0.4849239439370571, 0.0)
Using R\mathbf{R} 使用 R\mathbf{R}
You can download the R code as example7_3.R.
\# Need the zoo package for its na.locf function
\# install.packages('zoo')
\# Need the CADFtest package for its CADFtest function
\# install.packages('CADFtest')
library ('zoo')
library ('CADFtest')
source('calculateReturns.R')
datal <- read.delim("KO.txt") \# Tab-delimited
data_sort1 <- data1[order(as.Date(data1[,1],
'%m/%d/%Y')),] # sort in ascending order of dates
(1st column of data)
tday1 <- as.integer(format(as.Date(data_sort1[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls1 <- data_sort1[,ncol(data_sort1)]
data2 <- read.delim("PEP.txt") # Tab-delimited
data_sort2 <- data2[order(as.Date(data2[,1],
'%m/%d/%Y')),] # sort in ascending order of dates
(lst column of data)
tday2 <- as.integer(format(as.Date(data_sort2[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls2 <- data_sort2[,ncol(data_sort2)]
# find the intersection of the two data sets
tday <- intersect(tday1, tday2)
adjcls1 <- adjcls1[tday1 %in% tday]
adjcls2 <- adjcls2[tday2 %in% tday]
# CADFtest cannot have NaN values in input
adjcls1 <- zoo::na.locf(adjcls1)
adjcls2 <- zoo::na.locf(adjcls2)
mydata <- list(KO=adjcls1, PEP=adjcls2);
res <- CADFtest(model=KO~PEP, data=mydata, type =
"drift", max.lag.X=1)
summary(res) # As the following input shows, p-value is
about 0.16, hence we cannot reject null hypothesis.
# Covariate Augmented DF test
# CADF test
# t-test statistic: -2.2255225
# estimated rho^2: 0.8249085
# p-value: 0.1612782
# Max lag of the diff. dependent variable: 1.0000000
# Max lag of the stationary covariate(s): 1.0000000
# Max lead of the stationary covariate(s): 0.0000000
#
# Call:
# dynlm(formula = formula(model), start = obs.1, end =
obs.T)
#
# Residuals:
\begin{tabular}{cccccc}
\(\#\) & Min & \(1 Q\) & Median & \(3 Q\) & \(\operatorname{Max}\) \\
\(\#\) & -4.7552 & -0.0694 & -0.0059 & 0.0576 & 4.5976
\end{tabular}
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.0060546 0.0071439 0.848 0.397
# L(y, 1) -0.0011457 0.0005148-2.226 0.161
# L(d(y), 1) 0.0518074 0.0102210 5.069 4.1e-07 ***
# L(X, 0) 0.5359345 0.0127566 42.012 < 2e-16 ***
# L(X, 1) -0.5348523 0.0127698-41.884 < 2e-16 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.'
0.1 ' ' 1
#
# Residual standard error: 0.436 on 7828 degrees of freedom
# Multiple R-squared: 0.1852, Adjusted R-squared: 0.1848
# F-statistic: 593 on 3 and 7828 DF, p-value: < 2.2e-16
# determines the hedge ratio
lmresult <- lm(KO ~ O + PEP, mydata )
hedgeRatio <- coef(lmresult) # 1.011409
z <- residuals(lmresult) # The residuals should be
stationary (mean-reverting)
plot(z) # This should produce a chart similar to Figure 7.3.
# A test for correlation
dailyReturns <- calculateReturns(cbind(adjcls1, adjcls2), 1)
result <- cor.test(dailyReturns[,1], dailyReturns[,2])
result # correlation coefficient is 0.4849239, with
p-value < 2.2e-16, definitely correlated!
# Pearson's product-moment correlation
# data: dailyReturns[, 1] and dailyReturns[, 2]
# t = 49.0707, df = 7832, p-value < 2.2e-16
# alternative hypothesis: true correlation is not equal to 0
# 95 percent confidence interval:
# 0.4678028 0.5016813
# sample estimates:
# cor
# 0.4849239
Stationarity is not limited to the spread between stocks: it can also be found in certain currency rates. For example, the Canadian dollar/Australian dollar (CAD/AUD) cross-currency rate is quite stationary, both being commodities currencies. Numerous pairs of futures as well as fixed-income instruments can be found to be cointegrating as well. (The simplest examples of cointegrating futures pairs are calendar spreads: long and short futures contracts of the same underlying commodity but different expiration months. Similarly for fixed-income instruments, one can long and short bonds by the same issuer but of different maturities.) 平稳性不仅限于股票之间的价差:某些货币汇率中也可以发现平稳性。例如,加拿大元/澳大利亚元(CAD/AUD)交叉货币汇率相当平稳,因为两者都是商品货币。许多期货对以及固定收益工具也可以发现协整关系。(协整期货对最简单的例子是日历价差:同一标的商品但不同到期月份的多头和空头期货合约。对于固定收益工具,也可以通过同一发行人但不同期限的债券进行多头和空头操作。)
FACTOR MODELS 因子模型
Financial commentators often say something like this: “The current market favors value stocks,” “The market is focusing on earnings growth,” or, “Investors are paying attention to inflation numbers.” How do we quantify these and other common drivers of returns? 金融评论员经常会说类似的话:“当前市场偏好价值股”,“市场关注盈利增长”,或者“投资者正在关注通胀数据。”我们如何量化这些以及其他常见的收益驱动因素?
There is a well-known framework in quantitative finance called factor models (also known as arbitrage pricing theory [APT]) that attempts to capture the different drivers of returns such as earnings growth rates, interest rate, or the market capitalization of a company. These drivers are called factors. Mathematically, we can write the excess returns (returns minus risk-free rate) RR of NN stocks as 在量化金融中,有一个著名的框架称为因子模型(也称为套利定价理论 [APT]),它试图捕捉不同的收益驱动因素,如盈利增长率、利率或公司的市值。这些驱动因素被称为因子。数学上,我们可以将 NN 只股票的超额收益(收益减去无风险利率)写成
R=Xb+uR=X b+u
where XX is an N xx FN \times F matrix of factor exposures (also known as factor loadings), bb is an FF vector of factor returns, and uu an NN vector of specific returns. (Every one of these quantities is time dependent, but I suppress this explicit dependence for simplicity.) 其中 XX 是一个 N xx FN \times F 矩阵,表示因子暴露(也称为因子载荷), bb 是一个 FF 向量,表示因子收益, uu 是一个 NN 向量,表示特定收益。(这些量都是时间相关的,但为了简化,我省略了这种显式的时间依赖性。)
The terms factor exposure, factor return, and specific return are commonly used in quantitative finance, and it is well worth our effort to understand their meanings. 因子暴露、因子收益和特定收益这几个术语在量化金融中被广泛使用,理解它们的含义非常值得我们花时间去掌握。
Let’s focus on a specific category of factors called time-series factors-they are returns on specially constructed long-short portfolios called hedge portfolios, explained as follows. These factor returns are the common drivers of stock returns, and are therefore independent of a particular stock, but they do vary over time (hence, they are call time-series factors). 让我们关注一类特定的因子,称为时间序列因子——它们是通过特别构建的多空组合(称为对冲组合)的收益率来表示,具体解释如下。这些因子收益是股票收益的共同驱动因素,因此与特定股票无关,但它们随时间变化(因此称为时间序列因子)。
Factor exposures are the sensitivities of a stock to each of these common drivers. Any part of a stock’s return that cannot be explained by these common factor returns is deemed a specific return (i.e., specific to a stock and essentially regarded as just random noise within the APT framework). Each stock’s specific return is assumed to be uncorrelated to another stock’s. 因子暴露是指股票对每个这些共同驱动因素的敏感度。股票收益中任何无法用这些共同因子收益解释的部分都被视为特定收益(即特定于某只股票的收益,在 APT 框架内本质上被视为随机噪声)。假设每只股票的特定收益彼此不相关。
Let’s illustrate these using a simple time-series factor model called the Fama-French Three-Factor model (Fama and French, 1992). 让我们用一个简单的时间序列因子模型来说明这些,称为 Fama-French 三因子模型(Fama 和 French,1992)。
This model postulates that the excess return of a stock depends linearly on only three factors: 该模型假设股票的超额收益仅线性依赖于三个因子:
The return of the market (the market factor). 市场的收益(市场因子)。
The return of a hedge portfolio that longs small (based on market capitalization) stocks and shorts big stocks. This is the SMB, or small-minus-big, factor. 一个对冲组合的收益,该组合做多小市值股票,做空大市值股票。这就是 SMB,或称小盘股减大盘股因子。
The return of a hedge portfolio that longs high book-to-priceratio (or “cheap”) stocks and shorts low book-to-price-ratio (or “expensive”) stocks. This is the HML, or high-minus-low, factor. 一个对冲组合的收益,该组合做多高账面市值比(或“便宜”)的股票,做空低账面市值比(或“昂贵”)的股票。这就是 HML,或称高账面市值比减低账面市值比因子。
More intuitively, the SMB factor measures whether the market favors small-cap stocks. It usually does, except for the last three year as of this writing. The HML factor measures whether the market favors value stocks. It usually does, except for 8 of the last 12 years as of this writing (Phillips, 2020)! 更直观地说,SMB 因子衡量市场是否偏好小盘股。通常是这样,除了截至本文撰写时的最近三年。HML 因子衡量市场是否偏好价值股。通常也是如此,除了截至本文撰写时的过去 12 年中有 8 年例外(Phillips,2020)!
The factor exposures of stock are the sensitivity (regression coefficient) of its returns with respect to the factor returns: its beta (i.e., its sensitivity to the market index), its sensitivity to SMB, and its sensitivity to HML. Factor exposures are obviously different for each stock. A small-cap stock has a positive exposure to SMB, while a growth stock has a negative exposure to HML. (Factor exposures are often normalized such that the average of the factor exposures within a universe of stocks is zero, and the standard deviation is 1.) 股票的因子暴露是指其收益对因子收益的敏感度(回归系数):即其贝塔值(即对市场指数的敏感度)、对 SMB 的敏感度以及对 HML 的敏感度。显然,每只股票的因子暴露都是不同的。小盘股对 SMB 有正向暴露,而成长股对 HML 有负向暴露。(因子暴露通常被标准化,使得在一个股票组合中的因子暴露均值为零,标准差为 1。)
To find the factor exposures of a stock, run a linear regression of its historical returns against the Fama-French factors, as in Equation 7.1. (You can download the historical Fama-French factors from mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library. html.) Note this regression is contemporaneous, not predictiveyou can’t use these factor returns to predict a stock’s next day’s return, unless you can predict the next day’s SMB and HML. 要找到一只股票的因子暴露,可以对其历史收益率与 Fama-French 因子进行线性回归,如公式 7.1 所示。(你可以从 mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html 下载历史 Fama-French 因子数据。)注意,这个回归是同时期的,而非预测性的——除非你能预测下一天的 SMB 和 HML,否则不能用这些因子收益来预测股票的次日收益。
There are other stock factors, such as the price-to-earnings ratio or dividend yield of a stock, where we can directly observe the factor exposures of each stock (e.g., the price-to-earnings factor exposure of AAPL is just AAPL’s P/E ratio!). These have been 还有其他股票因子,比如股票的市盈率或股息收益率,我们可以直接观察每只股票的因子暴露(例如,AAPL 的市盈率因子暴露就是 AAPL 的市盈率!)。这些被称为
called cross-sectional factors, because we have to estimate the factor return of a single period by regressing all the stocks’ returns against their factor exposures. For example, if we want to estimate the P//E\mathrm{P} / \mathrm{E} factor return, the XX variable of this regression is a matrix of different stocks’ P//E\mathrm{P} / \mathrm{E} ratios, while the YY variable is a vector of the corresponding returns of those stocks in the calendar quarter during which the earnings were announced. Note that this regression is also contemporaneous, not predictive-you can’t use these factor returns to predict a stock’s next quarter, unless you can predict their next quarter’s P/E. 横截面因子,因为我们必须通过将所有股票的收益率对其因子暴露进行回归,来估计单一时期的因子收益率。例如,如果我们想估计 P//E\mathrm{P} / \mathrm{E} 因子收益率,这个回归的 XX 变量是不同股票 P//E\mathrm{P} / \mathrm{E} 比率的矩阵,而 YY 变量是这些股票在公布收益的日历季度内对应的收益率向量。注意,这个回归也是同时期的,而非预测性的——除非你能预测它们下季度的市盈率,否则不能用这些因子收益率来预测股票的下季度表现。
The Fama-French model has no monopoly on the choice of time-series factors. In fact, you can construct as many factors as creativity and rationality allow. For example, some people have constructed the WML (winners-minus-losers) factor, which is a momentum factor that measures the return of a hedge portfolio that longs stocks that previously had positive returns and short stocks that previously had negative returns. There are even more choices for cross-sectional factors. For example, you can choose return on equity as a factor exposure. You can choose any number of other economic, fundamental, or technical time-series or cross-sectional factors. Whether the factor exposures you have chosen are sensible will determine whether the factor model explains the excess returns of the stocks adequately. If the factor exposures (and consequently the model as a whole) are poorly chosen, the linear regression fit will produce specific returns of significant sizes, and the R^(2)R^{2} statistic of the fit will be small. According to experts (Grinold and Kahn, 1999), the R^(2)R^{2} statistic of a good factor model with monthly returns of 1,000 stocks and 50 factors is typically about 30 to 40 percent. Fama-French 模型并不是时间序列因子选择的唯一选择。实际上,你可以根据创造力和理性构建尽可能多的因子。例如,有人构建了 WML(赢家减输家)因子,这是一种动量因子,用于衡量一个对冲组合的收益,该组合做多之前表现为正收益的股票,做空之前表现为负收益的股票。横截面因子的选择则更多样。例如,你可以选择股本回报率作为因子暴露。你还可以选择任意数量的其他经济、基本面或技术的时间序列或横截面因子。你所选择的因子暴露是否合理,将决定因子模型是否能充分解释股票的超额收益。如果因子暴露(以及因此整个模型)选择不当,线性回归拟合将产生显著规模的特定收益,且拟合的 R^(2)R^{2} 统计量将较小。根据专家(Grinold 和 Kahn,1999)的说法,对于一个包含 1000 只股票和 50 个因子的月度收益的良好因子模型, R^(2)R^{2} 统计量通常约为 30%到 40%。
Since these factor models are contemporaneous-that is, given historical returns and factor exposures, we can compute the factor returns of those same historical periods-what good are they for trading? It turns out that often factor returns are more stable than individual stock returns-they exhibit stronger serial autocorrelations than individual stock’s returns. In other words, they have momentum. You can therefore assume that their values remain unchanged from the current period (known from the regression fit) to the next time period. If this is the case, then, of course, you can 由于这些因子模型是同时期的——也就是说,给定历史收益和因子暴露,我们可以计算出那些相同历史时期的因子收益——那么它们对交易有什么用呢?事实证明,因子收益往往比个股收益更稳定——它们表现出比个股收益更强的序列自相关性。换句话说,它们具有动量。因此,你可以假设它们的数值从当前时期(通过回归拟合得知)到下一个时期保持不变。如果情况如此,那么,当然,你可以
also predict the excess returns, as long as the factor exposures are well chosen and therefore the time-varying specific returns are not significant. 也预测超额收益,只要因子暴露选择得当,因此时间变化的特定收益不显著。
Let me clarify one point of potential confusion. Even though I stated that factor models can be useful as a predictive model (and therefore for trading) only if we assume the factor returns have momentum, it does not mean that factor models cannot capture mean reversion of stock returns. You can, in fact, construct a factor exposure that captures mean reversion, such as the negative of the previous period return. If stock returns are indeed mean reverting, then the corresponding factor return will be positive. 让我澄清一个可能引起混淆的点。尽管我说过,只有在假设因子收益具有动量的情况下,因子模型才可以作为预测模型(因此也适用于交易),但这并不意味着因子模型无法捕捉股票收益的均值回归。实际上,你可以构建一个捕捉均值回归的因子暴露,比如前一时期收益的负值。如果股票收益确实存在均值回归,那么相应的因子收益将是正的。
If you are interested in building a trading model based on fundamental factors, there are a number of vendors from whom you can obtain historical factor data: 如果你有兴趣基于基本面因子构建交易模型,有许多供应商可以提供历史因子数据:
Sharadar: sharadar.com (This is the most affordable source.) Sharadar:sharadar.com(这是最实惠的来源。)
A very comprehensive and technical introduction to factor models can be found in Ruppert (2015). 关于因子模型的非常全面且技术性的介绍可以参见 Ruppert(2015)。
Example 7.4: Principal Component Analysis as an Example of the Factor Model 示例 7.4:主成分分析作为因子模型的一个例子
The examples of factor exposures I described above are typically economic (e.g., outperformance of value stocks), fundamental (e.g., book-to-price ratio), or technical (e.g., previous period’s return). However, there is one kind of factor model that relies on nothing more than historical returns to construct. These are the so-called statistical factors, obtained using methods such as the principal component analysis (PCA). 我上面描述的因子暴露示例通常是经济性的(例如,价值股的超额表现)、基本面的(例如,账面市值比)或技术面的(例如,前一时期的收益)。然而,有一种因子模型仅依赖于历史收益来构建。这些就是所谓的统计因子,通常使用主成分分析(PCA)等方法获得。
If we use PCA to construct the statistical factor exposures and factor returns, we must assume that the factor exposures are constant (time independent) over the estimation period. (This rules out factors that represent mean reversion or momentum, since these factor exposures depend on the prior period returns.) In a sense, this is more similar to the time-series factors such as SMB than the cross sectional factors such as P/E, because 如果我们使用 PCA 来构建统计因子暴露和因子收益,就必须假设因子暴露在估计期间是恒定的(时间无关的)。 (这排除了代表均值回归或动量的因子,因为这些因子暴露依赖于前一时期的收益。)从某种意义上说,这更类似于时间序列因子,如 SMB,而不是横截面因子,如市盈率,因为
the factor exposures of time-series factors are also assumed to be constant over a long lookback period. However, unlike time-series factors, the statistical factors are unobservable, and unlike cross-sectional factor exposures, the statistical factor exposures are also unobservable. More importantly, we assume that the factor returns are uncorrelated; that is to say, their covariance matrix (:bb^(T):)\left\langle b b^{T}\right\rangle is diagonal. If we use the eigenvectors of the covariance matrix (:RR^(TT):)\left\langle R R^{\top}\right\rangle as the columns of the matrix X\mathbf{X} in the APT equation R=Xb+uR=X b+u, we will find via elementary linear algebra that (:bb^(T):)\left\langle b b^{T}\right\rangle is indeed diagonal; and furthermore, the eigenvalues of (:RR^(TT):)\left\langle R R^{\top}\right\rangle are none other than the variances of the factor returns bb. But of course, there is no point to use factor analysis if the number of factors is the same as the number of stocks-typically, we can just pick the eigenvectors with the top few eigenvalues to form the matrix X\mathbf{X}. The number of eigenvectors to pick is a parameter that you can adjust to optimize your trading model. 时间序列因子的因子暴露也被假设在较长的回溯期内保持不变。然而,与时间序列因子不同,统计因子是不可观测的;与横截面因子暴露不同,统计因子暴露也是不可观测的。更重要的是,我们假设因子收益是无相关的;也就是说,它们的协方差矩阵 (:bb^(T):)\left\langle b b^{T}\right\rangle 是对角矩阵。如果我们使用协方差矩阵 (:RR^(TT):)\left\langle R R^{\top}\right\rangle 的特征向量作为 APT 方程 R=Xb+uR=X b+u 中矩阵 X\mathbf{X} 的列,通过初等线性代数可以发现 (:bb^(T):)\left\langle b b^{T}\right\rangle 确实是对角矩阵;此外, (:RR^(TT):)\left\langle R R^{\top}\right\rangle 的特征值正是因子收益 bb 的方差。当然,如果因子数量与股票数量相同,使用因子分析就没有意义——通常,我们可以选择具有前几个最大特征值的特征向量来构成矩阵 X\mathbf{X} 。选择多少个特征向量是一个参数,你可以调整它以优化你的交易模型。
In the following programs, I illustrate a possible trading strategy applying PCA to S&P 600 small-cap stocks. It is a strategy based on the assumption that factor returns have momentum: They remain constant from the current time period to the next. Hence, we can buy the stocks with the highest expected returns based on these factors, and short the ones with the lowest expected returns. The average annualized return of this strategy is only 2%2 \% (MATLAB) to 4%4 \% (Python and R), and only when we assume no transaction costs. (The difference in returns among the programs are essentially round off errors.) 在以下程序中,我演示了一个可能的交易策略,该策略将主成分分析(PCA)应用于标普 600 小型股。该策略基于一个假设:因子收益具有动量特性——它们从当前时间段到下一个时间段保持不变。因此,我们可以根据这些因子买入预期收益最高的股票,卖空预期收益最低的股票。该策略的平均年化收益率仅为 2%2 \% (MATLAB)到 4%4 \% (Python 和 R),且仅在假设无交易成本的情况下成立。(各程序间收益的差异本质上是四舍五入误差。)
Using MATLAB 使用 MATLAB
You can download the MATLAB code as example7_4.m. 你可以下载名为 example7_4.m 的 MATLAB 代码。
clear;
lookback=252; % use lookback days as estimation
(training) period for determining factor exposures.
numFactors=5;
topN=50; % for trading strategy, long stocks with topN
expected 1-day returns.
onewaytcost=0/10000;
load('IJR_20080114.mat');
% test on SP600 smallcap stocks. (This MATLAB binary input
file contains tday, stocks, op, hi, lo, cl arrays.
mycls=fillMissingData(cl);
positionsTable=zeros(size(cl));
dailyret=(mycls-backshift(1, mycls))./backshift(1, mycls);
% note the rows of dailyret are the observations at
different time periods
end_index = length(tday);
for t=lookback+2:end_index
R=dailyret(t-lookback:t-1,:)'; % here the columns of
R are the different observations.
hasData=find(all(isfinite(R), 2)); % avoid any
stocks with missing returns
R=R(hasData, :);
[PCALoadings,PCAScores,PCAVar] = pca(R);
X = PCAScores(:,1:numFactors);
y = dailyret(t, hasData)';
Xreg = [ones(size(X, 1), 1) X];
[b,sigma]=mvregress(Xreg,y);
pred = Xreg*b;
Rexp=sum(pred,2); % Rexp is the expected return for
next period assuming factor returns remain constant.
[foo idxSort]=sort(Rexp, 'ascend');
positionsTable(t, hasData(idxSort(1:topN)))=-1; %
short topN stocks with lowest expected returns
positionsTable(t, hasData(idxSort(end-
topN+1:end)))=1; % buy topN stocks with highest
expected returns
end
ret=smartsum(backshift(1, positionsTable).*dailyret-
onewaytcost*abs (positionsTable-backshift(1, positionsTa-
ble)), 2)./smartsum(abs(backshift(1, positionsTable)),
2); % compute daily returns of trading strategy
fprintf(1, 'AvgAnnRet=%f Sharpe=%f\n',
smartmean(ret,1)*252, sqrt(252)*smartmean(ret,1)/
smartstd(ret,1));
% AvgAnnRet=0.020205 Sharpe=0.211120
This program made use of a function mvregress for multivariate linear regression with possible missing or NaN values in the input matrix. Using this function, the computation time is under a minute. Otherwise, it may take hours.
Using Python 使用 Python
You can download the Python code as example7_4.py.
\# Principal Component Analysis as an Example of Factor Model
import math
import numpy as np
import pandas as pd
from numpy.linalg import eig
from numpy.linalg import eigh
#from statsmodels.api import OLS
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn.multioutput import MultiOutputRegressor
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn import linear_model
from sklearn.linear_model import Ridge
import time
lookback=252 # training period for factor exposure
numFactors=5
topN=50 # for trading strategy, long stocks with topN
exepcted 1-day returns
df=pd.read_table('IJR_20080114.txt')
df['Date']=df['Date'].astype('int')
df.set_index('Date', inplace=True)
df.sort_index(inplace=True)
df.fillna(method='ffill', inplace=True)
dailyret=df.pct_change() # note the rows of dailyret are
the observations at different time periods
positionsTable=np.zeros(df.shape)
end_index = df.shape[0]
#end_index = lookback + 10
for t in np.arange(lookback+1,end_index):
R=dailyret.iloc[t-lookback+1:t,].T # here the
columns of R are the different observations.
hasData=np.where(R.notna().all(axis=1))[0]
R.dropna(inplace=True) # avoid any stocks with
missing returns
pca = PCA()
X = pca.fit_transform(R.T)[:, :numFactors]
X = sm.add_constant(X)
yl = R.T
clf = MultiOutputRegressor(LinearRegression(fit_
intercept=False),n_jobs=4).fit(X, y1)
Rexp = np.sum(clf.predict(X),axis=0)
R=dailyret.iloc[t-lookback+1:t+1,].T # here the
columns of R are the different observations.
idxSort=Rexp.argsort()
positionsTable[t, hasData[idxSort[np.arange(0,
topN) ] ] ] =-1
# positionsTable[t, hasData[idxSort[np.arange(-
topN, 0) ] ] ] =1
rm(list=ls()) # clear workspace
backshift <- function(mylag, x) {
rbind(matrix(NaN, mylag, ncol(x)),
as.matrix(x[1:(nrow(x)-mylag),]))
}
#install.packages('pls')
library("pls")
library('zoo')
source('calculateReturns.R')
#source('backshift.R')
lookback <- 252 # use lookback days as estimation
(training) period for determining factor exposures.
numFactors <- 5
topN <- 50 # for trading strategy, long stocks with topN
expected 1-day returns.
data1 <- read.csv("IJR_20080114.csv") # Tab-delimited
cl <- data.matrix(datal[, 2:ncol(datal)])
cl[ is.nan(cl) ] <- NA
tday <- data.matrix(datal[, 1])
mycls <- na.fill(cl, type="locf", nan=NA, fill=NA)
end_loop <- nrow(mycls)
positionsTable <- matrix(0, end_loop, ncol(mycls))
dailyret <- calculateReturns(mycls, 1)
dailyret[is.nan(dailyret)] <- 0
dailyret <- dailyret[1:end_loop,]
for (it in (lookback+2):end_loop) {
R <- dailyret[(it-lookback+2):it,]
hasData <- which(complete.cases(t(R)))
R <- R[, hasData ]
PCA <- prcomp(t(R))
X <- t(PCA$x[1:numFactors,])
Rexp <- rep(0,ncol(R))
for (s in 1:ncol(R)) {
reg_result <- lm(R[,s] ~ X )
pred <- predict(reg_result)
pred[is.nan(pred)] <- 0
Rexp[s] <- sum(pred)
}
result <- sort(Rexp, index.return=TRUE)
positionsTable[it, hasData[result$ix[1:topN]] ] = -1
positionsTable[it, hasData[result$ix[(length(result
$ix)-topN-1):length(result$ix)]] ] = 1
}
capital <- rowSums(abs(backshift(1, positionsTable)),
na.rm = TRUE, dims = 1)
ret <- rowSums(backshift(1, positionsTable)*dailyret,
na.rm = TRUE, dims = 1)/capital
avgret <- 252*mean(ret, na.rm = TRUE)
avgstd <- sqrt(252)*sd(ret, na.rm = TRUE)
Sharpe = avgret/avgstd
print(avgret)
print(avgstd)
print(Sharpe)
#0.04052422056844459
#0.07002908500498846
#0.5786769963588398
How good are the performances of factor models in real trading? Naturally, it mostly depends on which factor model we are looking at. But one can make a general observation that factor models that are dominated by fundamental and macroeconomic factors have one major drawback-they depend on the fact that investors persist in using the same metric to value companies. This is just another way of saying that the factor returns must have momentum for factor models to work. 因子模型在实际交易中的表现有多好?自然,这主要取决于我们所关注的是哪种因子模型。但可以做出一个普遍的观察:以基本面和宏观经济因素为主导的因子模型有一个主要缺点——它们依赖于投资者持续使用相同的指标来评估公司。这换句话说,就是因子收益必须具有动量,因子模型才能发挥作用。
For example, even though the value (HML) factor returns are usually positive, there are periods of time when investors prefer growth stocks, such as during the internet bubble in the late 1990s, in 2007, and quite recently from 2017 to 2020. As The Economist noted, one reason growth stocks were back in favor in 2007 is the simple fact that their price premium over value stocks had narrowed significantly (Economist, 2007a). Another reason is that as the US economy slowed, investors increasingly opted for companies that still managed to generate increasing earnings instead of those that were hurt by the recessionary economy. In 2020, Covid-19 caused many sectors of the economy to slump, but not for technology companies, as consumers and businesses moved much of their activities to online only. 例如,尽管价值(HML)因子收益通常为正,但在某些时期,投资者更偏好成长股,比如 1990 年代末的互联网泡沫时期、2007 年,以及最近的 2017 年至 2020 年。正如《经济学人》所指出的,成长股在 2007 年重新受到青睐的一个原因是它们相对于价值股的价格溢价显著缩小了(《经济学人》,2007a)。另一个原因是随着美国经济放缓,投资者越来越倾向于选择那些仍能实现盈利增长的公司,而非受经济衰退影响的公司。2020 年,Covid-19 导致许多经济部门下滑,但科技公司却不受影响,因为消费者和企业将大量活动转移到了线上。
Therefore, it is not uncommon for factor models to experience steep drawdown during the times when investors’ valuation methods shift, even if only for a short duration. But then, this problem is common to practically any trading model that holds stocks overnight. 因此,在投资者估值方法发生转变的时期,即使只是短暂的,因子模型经历大幅回撤也并不罕见。但这一问题几乎适用于任何隔夜持股的交易模型。
WHAT IS YOUR EXIT STRATEGY? 你的退出策略是什么?
While entry signals are very specific to each trading strategy, there isn’t usually much variety in the way exit signals are generated. They are based on one of these: 虽然入场信号对于每个交易策略来说都非常具体,但退出信号的生成方式通常没有太多变化。它们基于以下之一:
A fixed holding period 固定持有期
A target price or profit cap 目标价格或利润上限
The latest entry signals 最新的入场信号
A stop price 止损价
A fixed holding period is the default exit strategy for any trading strategy, whether it is a momentum model, a reversal model, or some kind of seasonal trading strategy, which can be either momentum or reversal based (more on this later). I said before that one of the ways momentum is generated is the slow diffusion of information. 固定持有期是任何交易策略的默认退出策略,无论是动量模型、反转模型,还是某种季节性交易策略,这些策略都可能基于动量或反转(稍后会详细讲)。我之前提到,动量产生的方式之一是信息的缓慢扩散。
In this case, the process has a finite lifetime. The average value of this finite lifetime determines the optimal holding period, which can usually be discovered in a backtest. 在这种情况下,该过程具有有限的生命周期。这个有限生命周期的平均值决定了最佳持有期,通常可以通过回测发现。
One word of caution on determining the optimal holding period of a momentum model: As I said before, this optimal period typically decreases due to the increasing speed of the diffusion of information and the increasing number of traders who catch on to this trading opportunity. Hence, a momentum model that has worked well with a holding period equal to a week in the backtest period may work only with a one-day holding period now. Worse, the whole strategy may become unprofitable a year into the future. Also, using a backtest of the trading strategy to determine holding period can be fraught with data-snooping bias, since the number of historical trades may be limited. Unfortunately, for a momentum strategy where the trades are triggered by news or events, there are no other alternatives. For a mean-reverting strategy, however, there is a more statistically robust way to determine the optimal holding period that does not depend on the limited number of actual trades. 关于确定动量模型的最佳持有期,有一点需要注意:正如我之前所说,随着信息传播速度的加快以及越来越多的交易者发现这一交易机会,最佳持有期通常会缩短。因此,在回测期间持有期为一周且表现良好的动量模型,现在可能只能适用于一天的持有期。更糟的是,整个策略在一年后可能变得无利可图。此外,使用交易策略的回测来确定持有期可能存在数据挖掘偏差,因为历史交易次数可能有限。不幸的是,对于由新闻或事件触发交易的动量策略,没有其他替代方法。然而,对于均值回归策略,有一种更具统计学稳健性的方法来确定最佳持有期,这种方法不依赖于有限的实际交易次数。
The mean reversion of a time series can be modeled by an equation called the Ornstein-Uhlenbeck formula (Uhlenbeck and Ornstein, 1930). Let’s say we denote the mean-reverting spread (long market value minus short market value) of a pair of stocks as z(t)z(t). Then we can write 时间序列的均值回归可以用一个称为 Ornstein-Uhlenbeck 公式的方程来建模(Uhlenbeck 和 Ornstein,1930)。假设我们将一对股票的均值回归价差(多头市值减去空头市值)记为 z(t)z(t) 。那么我们可以写成
dz(t)=-theta(z(t)-mu)dt+dWd z(t)=-\theta(z(t)-\mu) d t+d W
where mu\mu is the mean value of the prices over time and dWd W is simply some random Gaussian noise. Given a time series of the daily spread values, we can easily find theta\theta (and mu\mu ) by performing a linear regression fit of the daily change in the spread dzd z against the spread itself. Mathematicians tell us that the average value of z(t)z(t) follows an exponential decay to its mean mu\mu, and the half-life of this exponential decay is equal to ln(2)//theta\ln (2) / \theta, which is the expected time it takes for the spread to revert to half its initial deviation from the mean. This half-life can be used to determine the optimal holding period for a mean-reverting position. Since we can make use of the entire time series to find the best estimate of theta\theta, and not just on the days where a trade was triggered, the estimate for the half-life is much more 其中 mu\mu 是价格随时间的均值, dWd W 仅仅是一些随机的高斯噪声。给定每日价差值的时间序列,我们可以通过对价差的每日变化 dzd z 与价差本身进行线性回归拟合,轻松找到 theta\theta (和 mu\mu )。数学家告诉我们, z(t)z(t) 的平均值遵循对其均值 mu\mu 的指数衰减,这种指数衰减的半衰期等于 ln(2)//theta\ln (2) / \theta ,即价差回归到其初始偏离均值一半所需的预期时间。这个半衰期可以用来确定均值回归仓位的最佳持有期。由于我们可以利用整个时间序列来找到 theta\theta 的最佳估计,而不仅仅是在触发交易的那些日子,因此对半衰期的估计要更加
robust than can be obtained directly from a trading model. In Example 7.5, I demonstrate this method of estimating the half-life of mean reversion using our favorite spread between GLD and GDX. 比直接从交易模型中获得的更稳健。在示例 7.5 中,我演示了使用我们最喜欢的 GLD 和 GDX 之间的价差来估计均值回复半衰期的方法。
Example 7.5: Calculation of the Half-Life of a Mean-Reverting Time Series 示例 7.5:均值回复时间序列半衰期的计算
We can use the mean-reverting spread between GLD and GDX in Example 7.2 to illustrate the calculation of the half-life of its mean reversion. 我们可以使用示例 7.2 中 GLD 和 GDX 之间的均值回复价差来说明其均值回复半衰期的计算。
Using MATLAB 使用 MATLAB
The MATLAB code is available as example7_5.m. (The first part of the program is the same as example7_2.m.) MATLAB 代码可在 example7_5.m 中获得。(程序的第一部分与 example7_2.m 相同。)
% === Insert example7_2.m in the beginning here ===
prevz=backshift(1, z); % z at a previous time-step
dz=z-prevz;
dz (1) = [];
prevz(1)=[];
% assumes dz=theta*(z-mean(z))dt+w,
% where w is error term
results=ols(dz, prevz-mean(prevz));theta=results.beta;
halflife=-log(2)/theta
% halflife =
%
% 7.8390
The program finds that the half-life for mean reversion of the GLD-GDX is about 10 days, which is approximately how long you should expect to hold this spread before it becomes profitable. 程序发现 GLD-GDX 的均值回归半衰期约为 10 天,这大致是你应持有该价差直到获利的时间。
Using Python 使用 Python
The Python code is available as example7_5.ipynb. (The first part of the program is the same as example7_2.ipynb.) Python 代码可在 example7_5.ipynb 中获得。(程序的第一部分与 example7_2.ipynb 相同。)
Calculation of the Half-Life of a Mean-Reverting Time
Series
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import coint
from statsmodels.api import OLS
df1=pd.read_excel('GLD.xls')
df2=pd.read_excel('GDX.xls')
df=pd.merge(df1, df2, on='Date', suffixes=('_GLD', '_GDX'))
df.set_index('Date', inplace=True)
df.sort_index(inplace=True)
Run cointegration (Engle-Granger) test
coint_t, pvalue, crit_value=coint(df['Adj Close_GLD'],
df['Adj Close_GDX'])
(coint_t, pvalue, crit_value) # abs(t-stat) > critical
value at 95%. pvalue says probability of null hypo-
thesis (of no cointegration) is only 1.8%
(-3.6981160763300593,
0.018427835409537425,
array([-3.92518794, -3.35208799, -3.05551324]))
Determine hedge ratio
model=OLS(df['Adj Close_GLD'], df['Adj Close_GDX'])
results=model.fit()
hedgeRatio=results.params
hedgeRatio
Adj Close_GDX 1.639523
dtype: float64
z = GLD - hedgeRatio*GDX
z=df['Adj Close_GLD']-hedgeRatio[0]*df['Adj Close_GDX']
plt.plot(z)
prevz=z.shift()
dz=z-prevz
dz=dz[1:,]
prevz=prevz[1:,]
model2=OLS(dz, prevz-np.mean(prevz))
results2=model2.fit()
theta=results2.params
theta
x1 -0.088423
dtype: float64
halflife=-np.log(2)/theta
halflife
x1 7.839031
dtype: float64
Using R\mathbf{R} 使用 R\mathbf{R}
The \(R\) code is available as example7_5.R. (The first part of the program is the same as example7_2.R.)
source(backshift.R')
datal <- read.delim("GLD.txt") # Tab-delimited
data_sort1 <- data1[order(as.Date(datal[,1],
'%m/%d/%Y')),] # sort in ascending order of dates
(1st column of data)
tday1 <- as.integer(format(as.Date(data_sort1[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls1 <- data_sort1[,ncol(data_sort1)]
data2 <- read.delim("GDX.txt") # Tab-delimited
data_sort2 <- data2[order(as.Date(data2[,1],
'%m/%d/%Y')),] # sort in ascending order of dates (1st
column of data)
tday2 <- as.integer(format(as.Date(data_sort2[,1],
'%m/%d/%Y'), '%Y%m%d'))
adjcls2 <- data_sort2[,ncol(data_sort2)]
# find the intersection of the two data sets
tday <- intersect(tday1, tday2)
adjcls1 <- adjcls1[tday1 %in% tday]
adjcls2 <- adjcls2[tday2 %in% tday]
# determines the hedge ratio on the trainset
result <- lm(adjcls1 ~ 0 + adjcls2 )
hedgeRatio <- coef(result) # 1.64
spread <- adjcls1-hedgeRatio*adjcls2 # spread = GLD -
hedgeRatio*GDX
prevSpread <- backshift(1, as.matrix(spread))
prevSpread <- prevSpread - mean(prevSpread, na.rm = TRUE)
deltaSpread <- c(NaN, diff(spread)) # Change in spread
from t-1 to t
result2 <- lm(deltaSpread ~ 0 + prevSpread )
theta <- coef(result2)
halflife <- -log(2)/theta # 7.839031
If you believe that your price series is mean reverting, then you also have a ready-made target price-the mean value of the historical prices of the security, or mu\mu in the Ornstein-Uhlenbeck formula. This target price can be used together with the half-life as exit signals (exit when either criterion is met). 如果你认为你的价格序列是均值回归的,那么你也有一个现成的目标价格——该证券历史价格的均值,或者在 Ornstein-Uhlenbeck 公式中的 mu\mu 。这个目标价格可以与半衰期一起用作退出信号(当任一条件满足时退出)。
Target prices can also be used in the case of momentum models if you have a fundamental valuation model of a company. But as fundamental valuation is at best an inexact science, target prices are not as easily justified in momentum models as in mean-reverting models. If it were that easy to profit using target prices based on fundamental valuation, all investors have to do is to check out stock analysts’ reports every day to make their investment decisions. 如果你有公司的基本面估值模型,目标价格也可以用于动量模型。但由于基本面估值充其量是一门不精确的科学,目标价格在动量模型中并不像在均值回归模型中那样容易被证明合理。如果利用基于基本面估值的目标价格获利如此简单,那么所有投资者每天只需查看股票分析师的报告即可做出投资决策。
Suppose you are running a trading model, and you entered into a position based on its signal. Sometime later, you run this model again. If you find that the sign of this latest signal is opposite to your original position (e.g., the latest signal is “buy” when you have an existing short position), then you have two choices. Either you 假设你正在运行一个交易模型,并且根据其信号建立了一个头寸。过了一段时间,你再次运行该模型。如果你发现最新信号的方向与你的原始头寸相反(例如,最新信号是“买入”,而你持有的是空头头寸),那么你有两个选择。要么你
simply use the latest signal to exit the existing position and become flat or you can exit the existing position and then enter into an opposite position. Either way, you have used a new, more recent entry signal as an exit signal for your existing position. This is a common way to generate exit signals when a trading model can be run in shorter intervals than the optimal holding period. 简单地使用最新的信号来退出现有仓位并变为空仓,或者你也可以先退出现有仓位,然后进入相反的仓位。无论哪种方式,你都是用一个新的、更近期的入场信号作为现有仓位的退出信号。这是在交易模型可以以比最优持有期更短的时间间隔运行时,生成退出信号的常见方法。
Notice that this strategy of exiting a position based on running an entry model also tells us whether a stop-loss strategy is recommended. In a momentum model, when a more recent entry signal is opposite to an existing position, it means that the direction of momentum has changed, and thus a loss (or more precisely, a drawdown) in your position has been incurred. Exiting this position now is almost akin to a stop loss. However, rather than imposing an arbitrary stop-loss price and thus introducing an extra adjustable parameter, which invites data-snooping bias, exiting based on the most recent entry signal is clearly justified based on the rationale for the momentum model. 注意,这种基于运行入场模型来退出仓位的策略也告诉我们是否推荐使用止损策略。在动量模型中,当一个更近期的入场信号与现有仓位方向相反时,意味着动量方向已经发生变化,因此你的仓位已经产生了亏损(或者更准确地说,是回撤)。现在退出该仓位几乎等同于止损。然而,与其设定一个任意的止损价格,从而引入额外的可调参数并带来数据挖掘偏差,不如基于最新的入场信号退出,这显然是基于动量模型的合理性所支持的。
Consider a parallel situation when we are running a reversal model. If an existing position has incurred a loss, running the reversal model again will simply generate a new signal with the same sign. Thus, a reversal model for entry signals will never recommend a stop loss. (On the contrary, it can recommend a target price or profit cap when the reversal has gone so far as to hit the opposite entry threshold.) And, indeed, it is much more reasonable to exit a position recommended by a mean-reversal model based on holding period or profit cap than stop loss, as a stop loss in this case often means you are exiting at the worst possible time. (The only exception is when you believe that you have suddenly entered into a momentum regime because of recent news.) 考虑一种平行的情况,当我们运行一个反转模型时。如果一个已有的仓位已经亏损,再次运行反转模型将仅仅产生一个相同方向的新信号。因此,作为入场信号的反转模型永远不会建议止损。(相反,当反转已经达到相反的入场阈值时,它可以建议目标价或利润上限。)实际上,基于持有期或利润上限退出由均值反转模型推荐的仓位要比止损更合理,因为在这种情况下,止损往往意味着你在最糟糕的时机退出。(唯一的例外是当你认为由于近期新闻你突然进入了一个动量行情。)
SEASONAL TRADING STRATEGIES 季节性交易策略
This type of trading strategy is also called the calendar effect. Generally, these strategies recommend that you buy or sell certain securities at a fixed date of every year, and close the position at another 这种类型的交易策略也称为日历效应。通常,这些策略建议你在每年的固定日期买入或卖出某些证券,并在另一个日期平仓。
fixed date. These strategies have been applied to both equity and commodity futures markets. However, from my own experience, much of the seasonality in equity markets has weakened or even disappeared in recent years, perhaps due to the widespread knowledge of this trading opportunity, whereas some seasonal trades in commodity futures are still profitable. 固定日期。这些策略已应用于股票和商品期货市场。然而,根据我自己的经验,近年来股票市场中的许多季节性特征已经减弱甚至消失,可能是由于这一交易机会被广泛知晓,而商品期货中的一些季节性交易仍然有利可图。
The most famous seasonal trade in equities is called the January effect. There are actually many versions of this trade. One version states that small-cap stocks that had the worst returns in the previous calendar year will have higher returns in January than small-cap stocks that had the best returns (Singal, 2006). The rationale for this is that investors like to sell their losers in December to benefit from tax losses, which creates additional downward pressure on their prices. When this pressure disappeared in January, the prices recovered somewhat. This strategy did not work in 2006-2007, but worked wonderfully in January 2008, which was a spectacular month for mean-reversal strategies. (That January was the one that saw a major trading scandal at Société Générale, which indirectly may have caused the Federal Reserve to have an emergency 75 -basis-point rate cut before the market opened. The turmoil slaughtered many momentum strategies, but mean-reverting strategies benefited greatly from the initial severe downturn and then dramatic rescue by the Fed.) The codes for backtesting this January effect strategy are given in Example 7.6. 股票市场中最著名的季节性交易被称为“一月效应”。实际上,这种交易有许多不同的版本。其中一个版本指出,上一日历年表现最差的小盘股在一月份的回报率会高于表现最好的小盘股(Singal,2006)。其原理是投资者喜欢在十二月卖出亏损股票以利用税收亏损,这对股价造成额外的下行压力。当这种压力在一月份消失时,股价会有所回升。该策略在 2006-2007 年未能奏效,但在 2008 年一月表现极佳,那是均值回归策略表现出色的一个月。(那年一月发生了法国兴业银行的重大交易丑闻,间接导致美联储在市场开盘前紧急降息 75 个基点。市场动荡重创了许多动量策略,但均值回归策略则从最初的剧烈下跌和随后美联储的戏剧性救市中大大受益。)用于回测这一一月效应策略的代码见示例 7.6。
Example 7.6: Backtesting the January Effect 示例 7.6:一月效应的回测
Here are the codes to compute the returns of a strategy applied to S&P 600 small-cap stocks based on the January effect. 以下是基于一月效应应用于标普 600 小型股策略的收益计算代码。
Using MATLAB 使用 MATLAB
The MATLAB codes can be found at epchan.com/book/example7_6.m, and the input data is also available there. MATLAB 代码可在 epchan.com/book/example7_6.m 找到,输入数据也可在那里获取。
clear;
load(‘IJR_20080131’); load('IJR_20080131');
onewaytcost =0.0005=0.0005; % 5bp one way transaction cost onewaytcost =0.0005=0.0005 ; % 单边交易成本 5 个基点
years=year(datetime(tday, ‘ConvertFrom’, ‘yyyymmdd’)); years=year(datetime(tday, 'ConvertFrom', 'yyyymmdd'));
months=month(datetime(tday, ‘ConvertFrom’, ‘yyyymmdd’)); months=month(datetime(tday, 'ConvertFrom', 'yyyymmdd'));
nextdayyear=fwdshift(1, years);
nextdaymonth=fwdshift(1, months);
lastdayofDec=find(months==12 & nextdaymonth==1);
lastdayofJan=find(months==1 & nextdaymonth==2);
% lastdayofDec starts in 2004,
% so remove 2004 from lastdayofJan
lastdayofJan(1)=[];% Ensure each lastdayofJan date
after each
% lastdayofDec date
assert(all(tday(lastdayofJan) > tday(lastdayofDec)));
eoy=find(years~=nextdayyear); % End Of Year indices
eoy(end)=[]; % last index is not End of Year
% Ensure eoy dates match lastdayofDec dates
assert(all(tday(eoy)==tday(lastdayofDec)));
annret=..
(cl(eoy(2:end),:)-cl(eoy(1:end-1),:))./..
cl(eoy(1:end-1),:); % annual returns
janret=..
(cl(lastdayofJan(2:end),:)-
cl(lastdayofDec(2:end),:))./cl(lastdayofDec(2:end),:);
% January returns
for y=1:size(annret, 1)
% pick those stocks with valid annual returns
hasData=..
find(isfinite(annret(y,:)));
% sort stocks based on prior year's returns
[foo sortidx]=sort(annret(y, hasData), 'ascend');
% buy stocks with lowest decile of returns,
% and vice versa for highest decile
topN=round(length(hasData)/10);
% portfolio returns
portRet=..
(smartmean(janret(y, hasData(sortidx(1:topN))))-..
smartmean(janret(y, hasData(..
sortidx(end-topN+1:end)))))/2-2*onewaytcost;
fprintf(1,'Last holding date %i: Portfolio
return=%7.4f\n', tday(lastdayofJan(y+1)), portRet);
end
% These should be the output
% Last holding date 20060131: Portfolio return=-0.0244
% Last holding date 20070131: Portfolio return=-0.0068
% Last holding date 20080131: Portfolio return= 0.0881
This program uses a number of utility programs. The first one is the assert function, which is very useful for ensuring the program is working as expected.
function assert(pred, str)
% ASSERT Raise an error if the predicate is not true.
% assert(pred, string)
if nargin<2, str = ''; end
if ~pred
s = sprintf('assertion violated: %s', str);
error(s);
end
The second one is the fwdshift function, which works in the opposite way to the lagl function: It shifts the time series one step forward. 第二个是 fwdshift 函数,它的作用与 lagl 函数相反:它将时间序列向前移动一步。
The Python codes can be found at epchan.com/book/example7_6.py, and the input data is also available there. Python 代码可以在 epchan.com/book/example7_6.py 找到,输入数据也在那里提供。
# Backtesting the January Effect
import numpy as np
import pandas as pd
onewaytcost=0.0005
df=pd.read_table('IJR_20080131.txt')
df['Date']=df['Date'].round().astype('int')
df['Date']=pd.to_datetime(df['Date'], format='%Y%m%d')
df.set_index('Date', inplace=True)
eoyPrice=df.resample('Y').last()[0:-1] # End of Decem-
ber prices. Need to remove last date because it isn't
really end of year
annret=eoyPrice.pct_change().iloc[1:,:] # first row has NaN
eojPrice=df.resample('BA-JAN').last()[1:-1] # End of
January prices. Need to remove first date to match
the years in lastdayofDec. Need to remove last date
because it isn't really end of January.
janret=(eojPrice.values-eoyPrice.values)/eoyPrice.values
janret=janret[1:,] # match number of rows in annret
for y in range(len(annret)):
hasData=np.where(np.isfinite(annret.iloc[y, :])) [0]
sortidx=np.argsort(annret.iloc[y, hasData])
topN=np.round(len(hasData)/10)
portRet=(np.nanmean(janret[y, hasData[sortidx.
iloc[np.arange(0, topN)]]])-np.nanmean(janret[y,
hasData[sortidx.iloc[np.arange(-topN+1,
-1)]]]))/2-2*onewaytcost # portfolio returns
print("Last holding date %s: Portfolio return=%f" %
(eojPrice.index[y+1], portRet))
#Last holding date 2006-01-31 00:00:00: Portfolio
return=-0.023853
#Last holding date 2007-01-31 00:00:00: Portfolio
return=-0.003641
#Last holding date 2008-01-31 00:00:00: Portfolio
return=0.088486
Using R\mathbf{R} 使用 R\mathbf{R}
The R codes can be found at epchan.com/book/example7_6.R, and the input data is also available there.
# Need the lubridate package for its dates handling
# install.packages('lubridate')
library('lubridate')
source('calculateReturns.R')
source('fwdshift.R')
onewaytcost <- 5/10000 # 5bps one way transaction cost
datal <- read.delim("IJR_20080131.txt") # Tab-delimited
cl <- data.matrix(data1[, 2:ncol(data1)])
tday <- ymd(data.matrix(datal[, 1])) # dates in lub-
ridate format
years <- year(tday)
months <- month(tday)
years <- as.matrix(years, length(years), 1)
months <- as.matrix(months, length(months), 1)
nextdayyear <- fwdshift(1, years)
nextdaymonth <- fwdshift(1, months)
eom <- which(months!=nextdaymonth) # End of month indices.
eoy <- which(years!=nextdayyear) # End Of Year indices.
Note that in R, 2008!=NaN returns FALSE whereas in
Matlab 2008~=NaN returns TRUE
annret <- calculateReturns(cl[eoy,], 1) # annual returns
annret <- annret[-1,]
monret <- calculateReturns(cl[eom,], 1) # monthly returns
janret <- monret[months[eom]==1,] # January returns
janret <- janret[-(1:2),] # First January does not have
preceding year
exitDay <- tday[months==1 & nextdaymonth==2] # Last day
of Janurary
exitDay <- exitDay[-(c(1))] # Exclude first January
for (y in 1:nrow(annret)) {
hasData <- which(is.finite(annret[y,])) # pick those
stocks with valid annual returns
sortidx <- order(annret[y, hasData]) # sort stocks
based on prior year's returns
topN <- round(length(hasData)/10) # buy stocks with
lowest decile of returns, and vice versa for highest
decile
portRet <- (sum(janret[y,
hasData[sortidx[1:topN]]], na.rm=TRUE) -
sum(janret[y, hasData[sortidx[(length(sortidx)-
topN+1):length(sortidx)]]], na.rm=TRUE)) /2/
topN-2*onewaytcost # portfolio returns
msg <- sprintf('Last holding date %s: Portfolio
return=%7.4f\n', as.character(exitDay[y+1]), portRet)
cat (msg)
}
# Last holding date 2006-01-31: Portfolio return=-0.0244
# Last holding date 2007-01-31: Portfolio return=-0.0068
# Last holding date 2008-01-31: Portfolio return= 0.0881
Does this seasonal stock strategy still work? I will leave it as an out-of-sample exercise for the reader. 这个季节性股票策略还有效吗?我将把它留作读者的样本外练习。
Another seasonal strategy in equities was proposed more recently (Heston and Sadka, 2007; available at lcb1.uoregon.edu/ rcg/seminars/seasonal072604.pdf). This strategy is very simple: each month, buy a number of stocks that performed the best in the same month a year earlier, and short the same number of stocks that performed poorest in that month a year earlier. The average annual return before 2002 was more than 13 percent before transaction costs. However, I have found that this effect has disappeared since then, as you can check for yourself in Example 7.7. (See the readers’ comments to my blog post epchan.blogspot.com/2007/11/ seasonal-trades-in-stocks.html.) 另一种股票季节性策略是最近提出的(Heston 和 Sadka,2007;可在 lcb1.uoregon.edu/rcg/seminars/seasonal072604.pdf 获取)。该策略非常简单:每个月买入在一年前同月表现最好的若干只股票,同时卖空一年前同月表现最差的同样数量的股票。2002 年之前的平均年回报率在扣除交易成本前超过 13%。然而,我发现这一效应自那以后已经消失,您可以在示例 7.7 中自行验证。(参见读者对我博客文章 epchan.blogspot.com/2007/11/seasonal-trades-in-stocks.html 的评论。)
Example 7.7: Backtesting a Year-on-Year Seasonal Trending Strategy 示例 7.7:回测年对年季节性趋势策略
Here are the codes for the year-on-year seasonal trending strategy I quoted earlier. Note that the data contains survivorship bias, as it is based on the S&P 500 index on November 23, 2007. 以下是我之前引用的年对年季节性趋势策略的代码。请注意,数据存在存活偏差,因为它基于 2007 年 11 月 23 日的标普 500 指数。
Using MATLAB 使用 MATLAB
The source code can be downloaded from epchan.com/book/example7_7.m. The data is also available at that site. 源代码可以从 epchan.com/book/example7_7.m 下载。数据也可以在该网站获得。
%
% written by:
clear;
load('SPX_20071123', 'tday', 'stocks', 'cl');
monthEnds=find(isLastTradingDayOfMonth(tday)); % find
the indices of the days that are at month ends.
tday=tday(monthEnds);
cl=cl(monthEnds, :);
monthlyRet=(cl-lag1(cl))./lag1(cl);
positions=zeros(size(monthlyRet));
for m=14:size(monthlyRet, 1)
[monthlyRetSorted sortIndex]=sort(monthlyRet(m-12, :));
badData=find(~isfinite(monthlyRet(m-12, :)) |
~isfinite(cl(monthEnds(m-1), :)));
sortIndex=setdiff(sortIndex, badData, 'stable');
topN=floor(length(sortIndex)/10); % take top decile
of stocks as longs, bottom decile as shorts
positions(m-1, sortIndex(1:topN))=-1;
positions(m-1, sortIndex(end-topN+1:end))=1;
end
ret=smartsum(lag1(positions).*monthlyRet, 2)./
smartsum(abs(lag1(positions)), 2);
ret (1:13)=[];
avgannret=12*smartmean(ret);
sharpe=sqrt(12)*smartmean(ret)/smartstd(ret);
fprintf(1, 'Avg ann return=%7.4f Sharpe ratio=%7.4f\n',
avgannret, sharpe);
% Output should be
% Avg ann return=-0.0129 Sharpe ratio=-0.1243
This program contains a few utility functions. The first one is LastTradingDayOfMonth, which returns a logical array of 1 s and 0 s , indicating whether a month in a trading-date array is the last trading day of a month. 该程序包含一些实用函数。第一个是 LastTradingDayOfMonth,它返回一个由 1 和 0 组成的逻辑数组,表示交易日期数组中的某一天是否为该月的最后一个交易日。
function isLastTradingDayOfMonth=..
isLastTradingDayOfMonth(tday)
% isLastTradingDayOfMonth=
% isLastTradingDayOfMonth(tday) returns a logical
% array. True if tday(t) is last trading day of month.
tdayStr=datestr(datenum(num2str(tday), 'yyyymmdd'));
todayMonth=month(tdayStr);
tmrMonth=fwdshift(1, todayMonth); % tomorrow's month
isLastTradingDayOfMonth=false(size(tday));
isLastTradingDayOfMonth(todayMonth~=tmrMonth & ..
isfinite(todayMonth) & isfinite(tmrMonth))=true;
Another is the backshift function, which is like the lagl function except that one can shift any arbitrary number of periods instead of just 1. 另一个是 backshift 函数,它类似于 lagl 函数,但可以向后移动任意多个周期,而不仅仅是 1 个。
function y=backshift(day,x)
% y=backshift(day,x)
assert (day>=0);
y= [NaN(day,size(x,2), size(x, 3));x(1:end-day,:,:)];
You can try the most recent five years instead of the entire data period, and you will find that the average returns are even worse. 你可以尝试最近五年的数据,而不是整个数据周期,你会发现平均收益甚至更差。
Using Python 使用 Python
# Backtesting a Year-on-Year Seasonal Trending Strategy
import numpy as np
import pandas as pd
df=pd.read_table('SPX_20071123.txt')
df['Date']=df['Date'].round().astype('int')
df['Date']=pd.to_datetime(df['Date'], format='%Y%m%d')
df.set_index('Date', inplace=True)
eomPrice=df.resample('M').last()[:-1] # End of month
prices. Need to remove last date because it isn't
really end of January.
monthlyRet=eomPrice.pct_change(1, fill_method=None)
positions=np.zeros(monthlyRet.shape)
for m in range(13, monthlyRet.shape[0]):
hasData=np.where(np.isfinite(monthlyRet.iloc[m-12,
:])) [0]
sortidx=np.argsort(monthlyRet.iloc[m-12, hasData])
badData=np.where(np.logical_not(np.
isfinite(monthlyRet.iloc[m-1, hasData[sortidx]])))
[0] # these are indices
sortidx.drop(sortidx.index[badData], inplace=True)
topN=np.floor(len(sortidx)/10).astype('int')
The source code can be downloaded as example7_7.R. 源代码可以下载为 example7_7.R。
# Need the lubridate package for its dates handling # 需要 lubridate 包来处理日期
install.packages(‘lubridate’) install.packages('lubridate')
library(‘lubridate’) library('lubridate')
source(‘calculateReturns.R’) source('calculateReturns.R')
source(‘backshift.R’) source('backshift.R')
source(‘fwdshift.R’) source('fwdshift.R')
datal <- read.delim(“SPX_20071123.txt”) # Tab-delimited datal <- read.delim("SPX_20071123.txt") # 制表符分隔
cl <- data.matrix(datal[, 2:ncol(datal)])
tday <- ymd(data.matrix(datal[, 1])) # dates in lubridate format tday <- ymd(data.matrix(datal[, 1])) # 以 lubridate 格式表示的日期
years <- year(tday)
months <- month(tday)
years <- as.matrix(years, length(years), 1)
months <- as.matrix(months, length(months), 1)
nextdaymonth <- fwdshift(1, months)
eom <- which(months!=nextdaymonth) # End of month indices. eom <- which(months != nextdaymonth) # 月末索引。
monret <- calculateReturns(cl[eom,], 1) # monthly returns monret <- calculateReturns(cl[eom, ], 1) # 月度收益
positions <- matrix(0, nrow(monret), ncol(monret))
for (m in 14:nrow(monret)) {
prevYearSortIdx <- order(monret[m-12,], na.last == NA) prevYearSortIdx <- order(monret[m-12,], na.last = NA)
prevYearSortIdx <- setdiff(prevYearSortIdx, which(!is.
finite(cl[eom[m-1],]))) # Note setdiff in R does not re-sort data. It is equivalent to setdiff( x,yx, y, ‘stable’) in Matlab finite(cl[eom[m-1],]))) # 注意,R 中的 setdiff 不会重新排序数据。它相当于 Matlab 中的 setdiff( , 'stable')
topN <- round(length(prevYearSortIdx)/10) # buy stocks
with top decile of returns, and sell stocks of bot-
tom decile
positions[m-1, prevYearSortIdx[1:topN]] <- -1
positions[m-1, prevYearSortIdx[(length(prevYearSort
Idx)-topN+1):length(prevYearSortIdx)]] <- 1
}
ret <- rowSums(backshift(1, positions)*monret, na.rm
= TRUE)/rowSums(abs(backshift(1, positions)),
na.rm=TRUE)
ret <- ret[-(1:13)]
avgannret <- 12*mean(ret, na.rm = TRUE)
avgannret # -0.01139674
sharpe <- sqrt(12)*mean(ret, na.rm = TRUE)/sd(ret,
na.rm=TRUE)
sharpe # -0.1095098
In contrast to equity seasonal strategies, commodity futures’ seasonal strategies are alive and well. That is perhaps because seasonal demand for certain commodities is driven by “real” economic needs rather than speculations. 与股票季节性策略相比,商品期货的季节性策略依然活跃。这或许是因为某些商品的季节性需求是由“真实”的经济需求驱动的,而非投机行为。
One of the most intuitive commodity seasonal trades is the gasoline future trade: Simply buy the gasoline future contract that expires in May near the middle of April, and sell it by the end of April. This trade has been profitable for 19 of the last 21 years, as of 2015, the last 9 of which are out of sample (see the sidebar for details). It appears that one can always depend on approaching summer driving seasons in North America to drive up gasoline futures prices in the spring. 最直观的商品季节性交易之一是汽油期货交易:只需在四月中旬附近买入五月到期的汽油期货合约,并在四月底卖出。截止 2015 年,这笔交易在过去 21 年中有 19 年是盈利的,其中最近 9 年是样本外数据(详情见侧栏)。看来,人们总能依赖北美即将到来的夏季驾车季节,在春季推高汽油期货价格。
A SEASONAL TRADE IN GASOLINE FUTURES 汽油期货的季节性交易
Whenever the summer driving season comes up, it should not surprise us that gasoline futures prices will be rising seasonally. The only question for the trader is: which month contract to buy, and to hold for what period? After scanning the literature, the best trade I have found so far is one where we buy one May contract of RB (the unleaded gasoline futures trading on the New York 每当夏季驾车季节临近,汽油期货价格季节性上涨也就不足为奇。交易者唯一需要考虑的问题是:买入哪个月份的合约,以及持有多长时间?经过文献调研,我目前发现的最佳交易策略是买入一份五月份的 RB 合约(纽约交易所交易的无铅汽油期货)
Mercantile Exchange [NYMEX]) at the close of April 13 (or the following trading day if it is a holiday), and sell it at the close of April 25 (or the previous trading day if it is a holiday). Historically, we would have realized a profit every year since 1995. Here is the annual profit and loss (P&L) and maximum drawdown (measured from day 1, the entry point) experienced by this position (the 20072015 numbers are out-of-sample results): 在 4 月 13 日收盘时(如果当天是节假日,则为下一个交易日)买入纽约商业交易所(NYMEX)的合约,并在 4 月 25 日收盘时(如果当天是节假日,则为前一个交易日)卖出。历史数据显示,自 1995 年以来,我们每年都会实现盈利。以下是该仓位的年度盈亏(P&L)和最大回撤(从第 1 天,即入场点开始计算)(2007-2015 年的数据为样本外结果):
For those who desire less risk, you can buy the mini gasoline futures QU at NYMEX, which trade at half the size of RB, though it is illiquid. 对于那些希望降低风险的人,可以购买纽约商业交易所(NYMEX)交易的迷你汽油期货合约 QU,其交易规模是 RB 的一半,尽管流动性较差。
This research has been inspired by the monthly seasonal trades published by Paul Kavanaugh who published at PFGBest. Even though the trades were profitable, the futures broker PFGBest was shut down in 2012 since it embezzled client money. You can read up on this and other seasonal futures patterns in Fielden (2006) or Toepke (2004.) 这项研究的灵感来自 Paul Kavanaugh 在 PFGBest 发布的月度季节性交易。尽管这些交易是盈利的,但期货经纪商 PFGBest 因挪用客户资金于 2012 年被关闭。你可以在 Fielden(2006)或 Toepke(2004)中阅读有关此类及其他季节性期货模式的内容。
Besides demand for gasoline, natural gas demand also goes up as summer approaches due to increasing demand from power generators to provide electricity for air conditioning. Hence, another commodity seasonal trade that has been profitable for 13 consecutive years as of this writing is the natural gas trade: Buy the natural gas futures contract that expires in June near the end of February, and sell it by the middle of April. (Again, see sidebar for details.) 除了汽油需求外,随着夏季临近,天然气需求也会上升,因为发电厂为了提供空调用电而增加了需求。因此,截至本文撰写时,另一个连续 13 年盈利的商品季节性交易是天然气交易:在二月底接近到期的六月天然气期货合约买入,并在四月中旬卖出。(详情请参见边栏。)
A SEASONAL TRADE IN NATURAL GAS FUTURES 天然气期货的季节性交易
The summer season is also when natural gas demand goes up due to the increasing demand from power generators to provide electricity for air conditioning. This suggests a seasonal trade in natural gas where we long a June contract of NYMEX natural gas futures (Symbol: NG) at the close of February 25 (or the following trading day if it is a holiday), and exit this position on April 15 (or the previous trading day if it is a holiday). This trade has been profitable for 14 consecutive years at of this writing. Here is the annual P&L and maximum drawdown of this trade. (The 2007-2015 numbers are out-ofsample results): 夏季也是天然气需求上升的时期,因为发电厂为了给空调提供电力,需求增加。这表明天然气存在一个季节性交易策略,即在 2 月 25 日收盘时(如果当天是节假日,则在下一个交易日)买入纽约商业交易所(NYMEX)天然气期货 6 月合约(代码:NG),并在 4 月 15 日(如果当天是节假日,则在前一个交易日)平仓。截止本文撰写时,该交易策略已连续 14 年获利。以下是该交易的年度盈亏和最大回撤情况。(2007-2015 年的数据为样本外结果):
Year P&L in $ Maximum Drawdown in $
2012* -7,180 -8,070
2013* 5,950 -1,219
2014* 40 -3,168
2015* -2,770 -4,325
2016* 530 -1,166| Year | P&L in $ | Maximum Drawdown in $ |
| :--- | ---: | :--- |
| 2012* | $-7,180$ | $-8,070$ |
| 2013* | 5,950 | $-1,219$ |
| 2014* | 40 | $-3,168$ |
| 2015* | $-2,770$ | $-4,325$ |
| 2016* | 530 | $-1,166$ |
Out-of-sample results. 样本外结果。
Unlike the gasoline trade, this natural gas trade didn’t hold up as well out-of-sample. Natural gas futures are notoriously volatile, and we have seen big trading losses for hedge funds (e.g., Amaranth Advisors, loss =$6=\$ 6 billion) and major banks (e.g., Bank of Montreal, loss =$450=\$ 450 million). Therefore, one should be cautious if one wants to try out this trade-perhaps at reduced capital using the mini QG futures at half the size of the full NG contract. 与汽油交易不同,这笔天然气交易在样本外表现不佳。天然气期货以波动性大著称,我们已经看到对冲基金(例如,Amaranth Advisors,损失 =$6=\$ 6 亿美元)和大型银行(例如,蒙特利尔银行,损失 =$450=\$ 450 百万美元)遭受了巨大交易损失。因此,如果想尝试这笔交易,应保持谨慎——或许可以使用迷你 QG 期货,以全额 NG 合约一半的规模,减少投入资金。
Commodity futures seasonal trades do suffer from one drawback, despite their consistent profitability: they typically occur only once a year; therefore, it is hard to tell whether the backtest performance is a result of data-snooping bias (which is why I especially marked the more recent years as out-of-sample). As usual, one way to alleviate this problem is to try somewhat different entry and exit dates to see if the profitability holds up. In addition, one should consider only those trades where the seasonality makes some economic sense. The gasoline and natural gas trades amply satisfy these criteria. 商品期货季节性交易尽管持续盈利,但存在一个缺点:它们通常每年只发生一次;因此,很难判断回测表现是否是数据挖掘偏差的结果(这也是我特别将较近年份标记为样本外的原因)。通常,缓解这一问题的方法是尝试稍微不同的进出场日期,看看盈利能力是否依然存在。此外,应只考虑那些季节性具有一定经济意义的交易。汽油和天然气交易完全符合这些标准。
HIGH-FREQUENCY TRADIVG STRATEGIES 高频交易策略
In general, if a high Sharpe ratio is the goal of your trading strategy (as it should be, given what I said in Chapter 6), then you should be trading at high frequencies, rather than holding stocks overnight. 一般来说,如果交易策略的目标是获得较高的夏普比率(正如我在第 6 章中所说的那样),那么你应该进行高频交易,而不是隔夜持有股票。
What are high-frequency trading strategies, and why do they have superior Sharpe ratios? Many experts in high-frequency trading would not regard any strategy that holds positions for more 什么是高频交易策略,为什么它们具有更高的夏普比率?许多高频交易领域的专家不会认为任何持仓时间超过更长时间的策略属于高频交易
than a few seconds as high frequency, but here I would take a more pedestrian approach and include any strategy that does not hold a position overnight. Many of the early high-frequency strategies were applied to the foreign exchange market, and then later on to the futures market, because of their abundance of liquidity. In the last decade, however, with the increasing liquidity in the equity market, the availability of historical tick database for stocks, and mushrooming computing power, these types of strategies have become widespread for stock trading as well (Lewis, 2014). 高频交易通常指持仓时间少于几秒钟的策略,但在这里我会采取更通俗的定义,包含任何不隔夜持仓的策略。许多早期的高频策略最初应用于外汇市场,随后扩展到期货市场,因为这些市场流动性充足。然而,在过去十年中,随着股票市场流动性的增加、股票历史逐笔数据库的可用性以及计算能力的飞速提升,这类策略也广泛应用于股票交易(Lewis,2014)。
The reason why these strategies have high Sharpe ratio is simple: Based on the “law of large numbers,” the more bets you can place, the smaller the percent deviation from the mean return you will experience. With high-frequency trading, one can potentially place hundreds if not thousands of bets all in one day. Therefore, provided the strategy is sound and generates positive mean return, you can expect the day-to-day deviation from this return to be minimal. With this high Sharpe ratio, one can increase the leverage to a much higher level than longer-term strategies can, and this high leverage in turn boosts the return-on-equity of the strategy to often stratospheric levels. 这些策略之所以具有高夏普比率,原因很简单:基于“大数定律”,你下注的次数越多,实际收益偏离平均收益的百分比就越小。通过高频交易,交易者一天内可能进行数百甚至数千次下注。因此,只要策略合理且产生正的平均收益,你可以预期每日收益的偏差将非常小。凭借这种高夏普比率,交易者可以将杠杆提高到远高于长期策略的水平,而这种高杠杆反过来又将策略的股本回报率推高到往往是惊人的水平。
Of course, the law of large numbers does not explain why a particular high-frequency strategy has positive mean return in the first place. In fact, it is impossible to explain in general why high-frequency strategies are often profitable, as there are as many such strategies as there are fund managers. Some of them are mean reverting, while others are trend following. Some are market-neutral pair traders, while others are long-only directional traders. In general, though, these strategies aim to exploit tiny inefficiencies in the market or to provide temporary liquidity needs for a small fee. Unlike betting on macroeconomic trends or company fundamentals where the market environment can experience upheavals during the lifetime of a trade, such inefficiencies and need for liquidity persist day to day, allowing consistent daily profits to be made. Furthermore, high-frequency strategies typically trade securities in modest sizes. Without large positions to unwind, risk management for high-frequency portfolios is fairly easy: “Deleveraging” can be done very quickly in the face of losses, and certainly one can stop trading and be completely in cash 当然,大数定律并不能解释为什么某个特定的高频策略最初会有正的平均收益。事实上,普遍解释高频策略为何常常盈利是不可能的,因为高频策略的种类多如基金经理的数量。有些策略是均值回归型的,而有些是趋势跟踪型的。有些是市场中性配对交易策略,而有些则是仅做多的方向性交易策略。总体而言,这些策略旨在利用市场中的微小无效性,或以小额费用满足临时的流动性需求。与押注宏观经济趋势或公司基本面不同,后者在交易周期内市场环境可能经历剧烈变动,而这些无效性和流动性需求是日复一日持续存在的,从而允许实现稳定的日常利润。此外,高频策略通常以适度的规模交易证券。由于没有需要平仓的大额头寸,高频投资组合的风险管理相对简单:在亏损面前可以非常迅速地“去杠杆”,当然也可以停止交易,完全持有现金。
when the going gets truly rough. The worst that can happen as these strategies become more popular is a slow death as a result of gradually diminishing returns. Sudden drastic losses are not likely, nor are contagious losses across multiple accounts. 当情况变得真正艰难时。这些策略变得更受欢迎时,最坏的情况是由于收益逐渐减少而导致的缓慢消亡。突然的巨大亏损不太可能发生,多账户之间的连锁亏损也不太可能。
Though successful high-frequency strategies have such numerous merits, it is not easy to backtest such strategies when the average holding period decreases to minutes or even seconds. Transaction costs are of paramount importance in testing such strategies. Without incorporating transactions, the simplest strategies may seem to work at high frequencies. As a consequence, just having highfrequency data with last prices is not sufficient-data with bid, ask, and last quotes is needed to find out the profitability of executing on the bid versus the ask. Sometimes, we may even need historical order book information for backtesting. Quite often, the only true test for such strategies is to run it in real-time unless one has an extremely sophisticated simulator. 尽管成功的高频策略有诸多优点,但当平均持仓期缩短到几分钟甚至几秒时,回测这类策略并不容易。交易成本在测试这类策略时至关重要。如果不考虑交易成本,最简单的策略在高频率下可能看起来有效。因此,仅仅拥有带有最新价格的高频数据是不够的——需要包含买价、卖价和最新报价的数据,才能判断在买价和卖价执行交易的盈利性。有时,我们甚至需要历史订单簿信息来进行回测。通常,这类策略的唯一真正测试方法是实时运行,除非拥有极其复杂的模拟器。
Backtesting is only a small part of the game in high-frequency trading. High-speed execution may account for a large part of the actual profits or losses. Professional high-frequency trading firms have been writing their strategies in C instead of other, more userfriendly languages, and locating their servers next to the exchange or a major Internet backbone to reduce the microsecond delays. So even though the Sharpe ratio is appealing and the returns astronomical, truly high-frequency trading is not by any means easy for an independent trader to achieve in the beginning. But there is no reason not to work toward this goal gradually as expertise and resources accrue. 回测只是高频交易中的一小部分。高速执行可能占据实际利润或亏损的很大一部分。专业的高频交易公司一直使用 C 语言编写他们的策略,而不是其他更易用的语言,并将他们的服务器放置在交易所或主要互联网骨干网附近,以减少微秒级的延迟。因此,尽管夏普比率很吸引人,回报率也非常高,但真正的高频交易对于独立交易者来说,在一开始绝非易事。但随着专业知识和资源的积累,逐步朝这个目标努力是完全没有理由不去做的。
IS IT BETTER TO HAVE A HIGH-LEVERAGE VERSUS A HIGH-BETA PORTFOLIO? 高杠杆组合和高贝塔组合哪个更好?
In Chapter 6, I discussed the optimal leverage to apply to a portfolio based on the Kelly formula. In the section on factor models earlier in this chapter, I discussed the Fama-French Three-Factor model, which suggests that return of a portfolio (or a stock) is proportional to its beta (if we hold the market capitalization and book value of its 在第 6 章中,我讨论了基于凯利公式应用于组合的最优杠杆。在本章前面关于因子模型的部分,我讨论了 Fama-French 三因子模型,该模型表明组合(或股票)的回报与其贝塔成正比(如果我们保持市值和账面价值不变)
stocks fixed). In other words, you can increase return on a portfolio by either increasing its leverage or increasing its beta (by selecting high-beta stocks.) Both ways seem commonsensical. In fact, it is clear that given a low-beta portfolio and a high-beta portfolio, it is easy to apply a higher leverage on the low-beta portfolio so as to increase its beta to match that of the high-beta portfolio. And assuming that the stocks of two portfolios have the same average market capitalizations and book values, the average returns of the two will also be the same (ignoring specific returns, which will decrease in importance as long as we increase the number of stocks in the portfolios), according to the Fama-French model. So should we be indifferent to which portfolio to own? 股票固定)。换句话说,你可以通过增加杠杆或提高贝塔值(选择高贝塔股票)来提升投资组合的回报率。这两种方法看起来都很合乎常理。事实上,显而易见的是,给定一个低贝塔投资组合和一个高贝塔投资组合,可以很容易地对低贝塔投资组合施加更高的杠杆,使其贝塔值提升到与高贝塔投资组合相匹配。并且假设两个投资组合中的股票具有相同的平均市值和账面价值,根据 Fama-French 模型,两者的平均回报率也将相同(忽略特定回报,只要我们增加投资组合中的股票数量,特定回报的重要性会降低)。那么,我们是否应该对持有哪个投资组合无所谓呢?
The answer is no. Recall in Chapter 6 that the long-term compounded growth rate of a portfolio, if we use the Kelly leverage, is proportional to the Sharpe ratio squared, and not to the average return. So if the two hypothetical portfolios have the same average return, then we would prefer the one that has the smaller risk or standard deviation. Empirical studies have found that a portfolio that consists of low-beta stocks generally has lower risk and thus a higher Sharpe ratio. 答案是否定的。回想第 6 章中提到的,如果我们使用凯利杠杆,投资组合的长期复合增长率与夏普比率的平方成正比,而不是与平均收益成正比。因此,如果两个假设的投资组合具有相同的平均收益,我们会更倾向于选择风险或标准差较小的那个。实证研究发现,由低贝塔股票组成的投资组合通常风险较低,因此夏普比率更高。
For example, in a paper titled “Risk Parity Portfolios” (not publicly distributed), Dr. Edward Qian at PanAgora Asset Management argued that a typical 60-40 asset allocation between stocks and bonds is not optimal because it is overweighted with risky assets (stocks in this case). Instead, to achieve a higher Sharpe ratio while maintaining the same risk level as the 60-40 portfolio, Dr. Qian recommended a 23-77 allocation while leveraging the entire portfolio by 1.8 . 例如,在一篇题为《风险平价投资组合》(未公开发行)的论文中,PanAgora 资产管理公司的 Edward Qian 博士认为,典型的 60-40 股票与债券资产配置并非最优,因为它在风险资产(此处为股票)上配置过重。相反,为了在保持与 60-40 投资组合相同风险水平的同时实现更高的夏普比率,Qian 博士建议采用 23-77 的配置比例,并将整个投资组合杠杆放大 1.8 倍。
Somehow, the market is chronically underpricing high-beta stocks. Hence, given a choice between a portfolio of high-beta stocks and a portfolio of low-beta stocks, we should prefer the lowbeta ones, which we can then leverage up to achieve the maximum compounded growth rate. 市场在某种程度上长期低估了高贝塔股票的价值。因此,在高贝塔股票组合和低贝塔股票组合之间做选择时,我们应优先选择低贝塔股票,然后通过杠杆放大,以实现最大复合增长率。
There is one usual caveat, however. All this is based on the Gaussian assumption of return distributions. (See discussions in Chapter 6 on this issue.) Since the actual returns distributions have fat tails, one should be quite wary of using too much leverage on normally low-beta stocks. 不过,有一个常见的警告。所有这些都基于收益分布的高斯假设。(关于此问题的讨论见第 6 章。)由于实际收益分布具有厚尾特性,因此在通常低贝塔股票上使用过多杠杆时应格外谨慎。
SUMMARY 总结
This book has been largely about a particular type of quantitative trading called statistical arbitrage in the investment industry. Despite this fancy name, statistical arbitrage is actually far simpler than trading derivatives (e.g., options) or fixed-income instruments, both conceptually and mathematically. I have described a large part of the statistical arbitrageur’s standard arsenal: mean reversion and momentum, regime switching, stationarity and cointegration, arbitrage pricing theory or factor model, seasonal trading models, and, finally, high-frequency trading. 本书主要讲述了投资行业中一种特定类型的量化交易,称为统计套利。尽管这个名字听起来很高大上,统计套利在概念和数学上实际上比交易衍生品(如期权)或固定收益工具要简单得多。我描述了统计套利者的标准武器库中的大部分内容:均值回归和动量、状态切换、平稳性和协整、套利定价理论或因子模型、季节性交易模型,以及最后的高频交易。
Some of the important points to note can be summarized here: 一些需要注意的重要点可以总结如下:
Mean-reverting regimes are more prevalent than trending regimes. 均值回归状态比趋势状态更为普遍。
There are some tricky data issues involved with backtesting mean-reversion strategies: Outlier quotes and survivorship bias are among them. 在回测均值回归策略时存在一些棘手的数据问题:异常报价和存活偏差就是其中之一。
Trending regimes are usually triggered by the diffusion of new information, the execution of a large institutional order, or “herding” behavior. 趋势行情通常由新信息的传播、大型机构订单的执行或“羊群”行为触发。
Competition between traders tends to reduce the number of mean-reverting trading opportunities. 交易者之间的竞争往往会减少均值回归交易机会的数量。
Competition between traders tends to reduce the optimal holding period of a momentum trade. 交易者之间的竞争往往会缩短动量交易的最佳持有期。
Trading parameters for each day or even each trade can be optimized using a machine-learning-based method we called CPO. 每天甚至每笔交易的交易参数都可以使用我们称之为 CPO 的基于机器学习的方法进行优化。
A stationary price series is ideal for a mean-reversion trade. 平稳的价格序列是均值回归交易的理想选择。
Two or more nonstationary price series can be combined to form a stationary one if they are “cointegrating.” 如果两个或多个非平稳价格序列是“协整”的,它们可以组合成一个平稳序列。
Cointegration and correlation are different things: Cointegration is about the long-term behavior of the prices of two or more stocks, while correlation is about the short-term behavior of their returns. 协整和相关性是不同的概念:协整关注两个或多个股票价格的长期行为,而相关性关注它们收益的短期行为。
Factor models, or arbitrage pricing theory, are commonly used for modeling how fundamental factors affect stock returns linearly. 因子模型,或套利定价理论,通常用于建模基本面因子如何线性影响股票收益。
One of the most well-known factor models is the Fama-French Three-Factor model, which postulates that stock returns are proportional to their beta and book-to-price ratio, and negatively to their market capitalizations. 最著名的因子模型之一是 Fama-French 三因子模型,该模型假设股票收益与其贝塔值和账面市值比成正比,与其市值成反比。
Factor models typically have a relatively long holding period and long drawdowns due to regime switches. 因子模型通常持有期较长,并且由于市场状态切换,回撤时间也较长。
Exit signals should be created differently for mean-reversion versus momentum strategies. 均值回归策略和动量策略的退出信号应当采用不同的设计方法。
Estimation of the optimal holding period of a mean-reverting strategy can be quite robust, due to the Ornstein-Uhlenbeck formula. 由于 Ornstein-Uhlenbeck 公式,均值回归策略的最优持有期估计可以相当稳健。
Estimation of the optimal holding period of a momentum strategy can be error prone due to the small number of signals. 由于信号数量较少,动量策略的最佳持有期估计可能存在误差。
Stop loss can be suitable for momentum strategies but not reversal strategies. 止损适用于动量策略,但不适用于反转策略。
Seasonal trading strategies for stocks (i.e., calendar effect) have become unprofitable in recent years. 近年来,股票的季节性交易策略(即日历效应)已变得无利可图。
Seasonal trading strategies for commodity futures continue to be profitable. 商品期货的季节性交易策略仍然保持盈利。
High-frequency trading strategies rely on the “law of large numbers” for their high Sharpe ratios. 高频交易策略依赖“大数定律”来实现其高夏普比率。
High-frequency trading strategies typically generate the highest long-term compounded growth due to their high Sharpe ratios. 高频交易策略通常由于其高夏普比率而产生最高的长期复合增长。
High-frequency trading strategies are very difficult to backtest and very technology-reliant for their execution. 高频交易策略非常难以进行回测,并且在执行上高度依赖技术。
Holding a highly leveraged portfolio of low-beta stocks should generate higher long-term compounded growth than holding an unleveraged portfolio of high-beta stocks. 持有高杠杆的低贝塔股票组合,应该比持有无杠杆的高贝塔股票组合产生更高的长期复合增长。
Most statistical arbitrage trading strategies are some combination of these effects or models: Whether they are profitable or not is more of an issue of where and when to apply them than whether they are theoretically correct. 大多数统计套利交易策略都是这些效应或模型的某种组合:它们是否盈利更多取决于应用的时间和地点,而不是它们在理论上的正确性。
REFERENCES 参考文献
Alexander, Carol. 2001. Market Models: A Guide to Financial Data Analysis. West Sussex: John Wiley & Sons Ltd. Alexander, Carol. 2001. 《市场模型:金融数据分析指南》。西萨塞克斯:约翰·威利父子公司。
Chan, Ernest. 2013. Algorithmic Trading: Winning Strategies and Their Rationale. Wiley. Chan, Ernest. 2013. 《算法交易:获胜策略及其原理》。威利出版社。
Fama, Eugene, and Kenneth French. 1992. “The Cross-Section of Expected Stock Returns.” Journal of Finance XLVII (2): 427-465. 法玛,尤金,和肯尼斯·弗伦奇。1992 年。“预期股票收益的横截面。”《金融杂志》XLVII(2):427-465。
Fielden, Sandy. 2006. “Seasonal Surprises.” Energy Risk, September. https://www.risk.net/infrastructure/1523742/seasonal-surprises. 菲尔登,桑迪。2006 年。“季节性惊喜。”《能源风险》,9 月。https://www.risk.net/infrastructure/1523742/seasonal-surprises。
Grinold, Richard, and Ronald Kahn. 1999. Active Portfolio Management. New York: McGraw-Hill. 格林诺德,理查德,和罗纳德·卡恩。1999 年。《主动投资组合管理》。纽约:麦格劳-希尔。
Heston, Steven, and Ronnie Sadka. 2007. “Seasonality in the Cross-Section of Expected Stock Returns.” AFA 2006 Boston Meetings Paper, July. lcb1. uoregon.edu/rcg/seminars/seasonal072604.pdf. Heston, Steven, 和 Ronnie Sadka. 2007. “预期股票收益横截面中的季节性。” AFA 2006 波士顿会议论文,7 月。lcb1.uoregon.edu/rcg/seminars/seasonal072604.pdf.
Khandani, Amir, and Andrew Lo. 2007. “What Happened to the Quants in August 2007?” Preprint. web.mit.edu/alo/www/Papers/august07.pdf. Khandani, Amir, 和 Andrew Lo. 2007. “2007 年 8 月量化交易者发生了什么?” 预印本。web.mit.edu/alo/www/Papers/august07.pdf.
Kochkodin, Brandon. 2021. “How WallStreetBets Pushed GameStop Shares to the Moon.” www.bloomberg.com/news/articles/2021-01-25/how-wallstreetbets-pushed-gamestop-shares-to-the-moon?sref=MqSE4VuP. Kochkodin, Brandon. 2021. “WallStreetBets 如何将 GameStop 股票推向高峰。” www.bloomberg.com/news/articles/2021-01-25/how-wallstreetbets-pushed-gamestop-shares-to-the-moon?sref=MqSE4VuP.
Singal, Vijay. 2006. Beyond the Random Walk. Oxford University Press, USA. Singal, Vijay. 2006 年。《超越随机漫步》。美国牛津大学出版社。
Schiller, Robert. 2008. “Economic View; How a Bubble Stayed under the Radar.” New York Times, March 2. www.nytimes.com/2008/03/02/ business/02view.html?ex=1362286800&en=da9e48989b6f937a&ei=5124&p artner=permalink&exprod=permalink. Schiller, Robert. 2008 年。“经济视角;泡沫如何未被察觉。”纽约时报,3 月 2 日。www.nytimes.com/2008/03/02/business/02view.html?ex=1362286800&en=da9e48989b6f937a&ei=5124&partner=permalink&exprod=permalink.
Toepke, Jerry. 2004. “Fill 'Er Up! Benefit from Seasonal Price Patterns in Energy Futures.” Stocks, Futures and Options Magazine (3) (March 3). www.sfomag.com/issuedetail.asp?MonthNameID=March&yearID=2004. Toepke, Jerry. 2004 年。“加满油!利用能源期货的季节性价格模式获利。”《股票、期货与期权杂志》(3)(3 月 3 日)。www.sfomag.com/issuedetail.asp?MonthNameID=March&yearID=2004.
Uhlenbeck, George, and Leonard Ornstein. 1930. “On the Theory of Brownian Motion.” Physical Review 36: 823-841. Uhlenbeck, George 和 Leonard Ornstein。1930 年。《布朗运动理论》。物理评论 36:823-841。
CIIAPTER 8 第 8 章
Conclusion 结论
Can Independent Traders Succeed? 独立交易者能成功吗?
Quantitative trading gained notoriety in the summer of 2007 when some enormous hedge funds run by some of the most reputable money managers rung up losses measured in billions in just a few days (though some had recovered by the end of the month). It is déjà vu all over again in January 2021. They brought back bad memories of other notorious hedge fund debacles such as that of Long-Term Capital Management and Amaranth Advisors (both referenced in Chapter 6), except that this time it was not just one trader or one firm, but losses at multiple funds over a short period of time. 量化交易在 2007 年夏天声名鹊起,当时一些由最有声望的资金经理管理的巨大对冲基金在短短几天内亏损数十亿美元(尽管有些在月底前已恢复)。2021 年 1 月,这种情况再次重演。他们唤起了人们对其他臭名昭著的对冲基金灾难的坏回忆,比如长期资本管理公司和阿马兰斯顾问公司(两者均在第 6 章提及),不过这次不仅仅是一个交易员或一家公司,而是在短时间内多个基金出现亏损。
And yet, ever since I began my career in the institutional quantitative trading business, I have spoken to many small, independent traders, working in shabby offices or their spare bedrooms, who gain small but steady and growing profits year-in and year-out, quite unlike the stereotypical reckless day traders of the popular imagination. In fact, many independent traders that I know of have not only survived the periods when big funds lost billions, but actually thrived in those times. This has been the central mystery of trading to me for many years: how does an independent trader with insignificant equity and minimal infrastructure trade with a high Sharpe ratio while firms with all-star teams fail spectacularly? 然而,自从我开始从事机构量化交易业务以来,我与许多小型独立交易者交谈过,他们在简陋的办公室或空闲的卧室里工作,年复一年地获得小而稳定且不断增长的利润,这与大众想象中典型的鲁莽日内交易者截然不同。事实上,我认识的许多独立交易者不仅在大型基金亏损数十亿美元的时期幸存下来,反而在那些时期表现出色。这多年来一直是我交易中的一个核心谜题:一个拥有微不足道的资金和最小基础设施的独立交易者,如何能以高夏普比率进行交易,而拥有全明星团队的公司却惨烈失败?
At the beginning of 2006, I left the institutional money management business and struck out on my own to experience this firsthand. I figured that if I could not trade profitably when I was free of all institutional constraints and politics, then either trading is a hoax or I am just not cut out to be a trader. Either way, I promised myself that in such an event I would quit trading forever. Fortunately, I survived. Along the way, I also found the key to that central mystery to which I alluded earlier. 2006 年初,我离开了机构资金管理业务,开始独立操作,亲身体验这一切。我想,如果在摆脱了所有机构限制和政治因素后,我仍然无法盈利交易,那么要么交易本身就是个骗局,要么我根本不适合做交易员。无论哪种情况,我都答应自己,如果真是这样,我将永远放弃交易。幸运的是,我挺过来了。在这个过程中,我还找到了之前提到的那个核心谜题的关键。
The key, it turns out, is capacity, a concept I introduced at the end of Chapter 2. (To recap: Capacity is the amount of equity a strategy can generate good returns on.) It is far, far easier to generate a high Sharpe ratio trading a $100,000\$ 100,000 account than a $100\$ 100 million account. There are many simple and profitable strategies that can work at the low-capacity end that would be totally unsuitable to hedge funds. This is the niche for independent traders like us. 事实证明,关键是容量,这是我在第二章末尾介绍的一个概念。(回顾一下:容量是指一个策略能够在其上产生良好回报的资金规模。)用 0 美元账户交易比用 100 万美元账户交易更容易获得高夏普比率。有许多简单且盈利的策略适合低容量的资金规模,而这些策略对对冲基金来说完全不适用。这正是像我们这样的独立交易者的市场定位。
Let me elaborate on this capacity issue. Most profitable strategies that have low capacities are acting as market makers: providing short-term liquidity when it is needed and taking quick profits when the liquidity need disappears. If, however, you have billions of dollars to manage, you now become the party in need of liquidity, and you have to pay for it. To minimize the cost of this liquidity demand, you necessarily have to hold your positions over long periods of time. When you hold for long periods, your portfolio will be subject to macroeconomic changes (i.e., regime shifts) that can cause great damage to your portfolio. Though you may still be profitable in the long run if your models are sound, you cannot avoid the occasional sharp drawdowns that attract newspaper headlines. 让我详细说明一下这个容量问题。大多数低容量的盈利策略实际上是在充当做市商:在需要时提供短期流动性,并在流动性需求消失时迅速获利。然而,如果你管理的是数十亿美元的资金,你就变成了需要流动性的一方,并且必须为此付出代价。为了最小化这种流动性需求的成本,你必然需要持有头寸较长时间。当你持有时间较长时,投资组合将受到宏观经济变化(即体制转变)的影响,这可能对你的投资组合造成巨大损失。尽管如果你的模型可靠,长期来看你仍可能盈利,但你无法避免偶尔出现的剧烈回撤,这些回撤往往会成为报纸头条。
Other disadvantages beset large-capacity strategies favored by large institutions. The intense competition among hedge funds means the strategies become less profitable. The lowered returns, in turn, pressure the fund manager to overleverage. To beat out the competition, traders need to resort to more and more complicated models, which in turn invite data-snooping bias. But despite the increasing complexity of the models, the fundamental market inefficiency that they are trying to exploit may remain the same, and thus their portfolios may still end up holding very similar positions. We discussed this phenomenon in Chapter 6. When market environment 大型机构青睐的大容量策略也存在其他缺点。对冲基金之间的激烈竞争意味着这些策略的盈利能力会下降。收益降低反过来又迫使基金经理过度杠杆化。为了击败竞争对手,交易者需要采用越来越复杂的模型,而这又容易引入数据挖掘偏差。但尽管模型日益复杂,它们试图利用的基本市场无效率可能依然不变,因此它们的投资组合最终可能仍持有非常相似的头寸。我们在第 6 章讨论过这一现象。当市场环境
changes, a stampede out of similar losing positions can (and did) cause a complete meltdown of the market. 发生变化时,类似亏损头寸的集体抛售可能(并且确实)导致市场的全面崩溃。
Another reason that independent traders can often succeed when large funds fail is the myriad constraints imposed by management in an institutional setting. For example, as a trader in a quantitative fund, you may be prohibited from trading a long-only strategy, but long-only strategies are often easier to find, simpler, more profitable, and if traded in small sizes, no more risky than market-neutral strategies. Or you may be prohibited from trading futures. You may be required to be not only market neutral but also sector neutral. You may be asked to find a momentum strategy when you know that a mean-reverting strategy would work. And on and on. Many of these constraints are imposed for risk management reasons, but many others may be just whims, quirks, and prejudices of the management. As every student of mathematical optimization knows, any constraint imposed on an optimization problem decreases the optimal objective value. Similarly, every institutional constraint imposed on a trading strategy tends to decrease its returns, if not its Sharpe ratio as well. Finally, some senior managers who oversee frontline portfolio managers of quantitative funds are actually not well versed in quantitative techniques, and they tend to make decisions based on anything but quantitative theories. 独立交易者常常能够成功,而大型基金却失败的另一个原因,是机构环境中管理层施加的各种限制。例如,作为量化基金的交易员,你可能被禁止交易仅做多的策略,但仅做多的策略通常更容易找到,更简单,更有利可图,而且如果交易规模较小,其风险并不比市场中性策略更大。或者你可能被禁止交易期货。你可能不仅被要求保持市场中性,还要保持行业中性。你可能被要求寻找动量策略,而你明明知道均值回归策略更有效。诸如此类。许多这些限制是出于风险管理的考虑,但也有许多可能只是管理层的怪癖、偏见和随意决定。正如每个数学优化的学生都知道的,任何加在优化问题上的约束都会降低最优目标值。同样,任何加在交易策略上的机构限制往往会降低其收益,甚至可能降低其夏普比率。 最后,一些负责监督量化基金一线投资组合经理的高级管理人员实际上并不精通量化技术,他们往往根据除量化理论之外的任何因素做出决策。
When your strategy shows initial profits, these managers may impose enormous pressure for you to scale up quickly, and when your strategy starts to lose, they may force you to liquidate the portfolios and abandon the strategy immediately. None of these interferences in the quantitative investment process is mathematically optimal. 当你的策略显示出初步盈利时,这些管理者可能会施加巨大压力,要求你迅速扩大规模;而当策略开始亏损时,他们可能会强迫你清算投资组合并立即放弃该策略。这些对量化投资过程的干预在数学上都不是最优的。
Besides, such managers often have a mercurial temper, which seldom mixes well with quantitative investment management. When loss of money occurs, rationality is often the first victim. 此外,这类管理者往往脾气多变,这与量化投资管理很难融合。当资金亏损发生时,理性往往是首当其冲的牺牲品。
As an independent trader, you are free from such constraints and interferences, and as long as you are emotionally capable of adhering to the discipline of quantitative trading, your trading environment may actually be closer to the optimal than that of a large fund. 作为一名独立交易者,你不受此类限制和干扰的约束,只要你在情绪上能够坚持量化交易的纪律,你的交易环境实际上可能比大型基金更接近最优。
Actually, there is one more reason why it is easier for hedge funds to blow up than for individual traders trading their own accounts to 实际上,还有一个原因使得对冲基金爆仓比个人交易者用自己的账户交易更容易。
do so. When one is trading other people’s money, one’s upside is almost unlimited, while the downside is simply to get fired. Hence, despite the pro forma adherence to stringent institutional risk management procedures and constraints, one is fundamentally driven to trade strategies that are more risky in an institutional setting, as long as you can sneak past the risk manager. But Mr. Jérôme Kerviel at Société Générale has shown us that this is not at all difficult! 当你交易的是别人的钱时,你的上行几乎是无限的,而下行只是被解雇而已。因此,尽管形式上严格遵守机构风险管理程序和限制,但从根本上讲,只要能瞒过风险经理,你就会被驱使去交易在机构环境中更具风险的策略。但法国兴业银行的杰罗姆·凯尔维尔先生向我们展示了,这根本不难做到!
L’Affaire Société Générale cost the bank $7.1\$ 7.1 billion and may have indirectly led to an emergency Fed rate cut in the United States. The bank’s internal controls failed to discover the rogue trades for three years because Mr. Kerviel has worked in the back office and has gained great familiarity with ways to evade the control procedures (Clark, 2008). “法国兴业银行事件”让该银行损失了 $7.1\$ 7.1 亿美元,并可能间接导致美国联邦储备紧急降息。该银行的内部控制未能在三年内发现这笔违规交易,因为凯尔维尔先生曾在后台工作,对规避控制程序的方法非常熟悉(Clark,2008)。
In fact, Mr. Kerviel’s deceptive technique is by no means original. When I was working at a large investment bank, there was a pair of proprietary traders who traded quantitatively. They were enclosed in a glass bubble at a corner of the vast trading floor, either because they could not be bothered by the hustle and bustle of the nonquantitative traders or they had to keep their trade secrets, well, secret. As far as I could tell, neither of them ever talked to anyone. Nor, it seemed, did they ever speak to each other. 事实上,Kerviel 先生的欺骗手法绝非原创。当我在一家大型投资银行工作时,有一对专有交易员采用量化交易。他们被关在交易大厅一角的玻璃隔间里,要么是不想被非量化交易员的喧嚣打扰,要么是必须保守他们的交易秘密。就我所知,他们俩似乎从不与任何人交谈,甚至似乎也不互相说话。
One day, one of the traders disappeared, never to return. Shortly thereafter, hordes of auditors were searching through his files and computers. It turned out that, just like Mr. Kerviel, this trader had worked in the information technology (IT) department and was quite computer savvy. He managed to manufacture many millions of false profits without anyone’s questioning him until, one day, a computer crash somehow stopped his rogue program in its track and exposed his activities. Rumor had it that he disappeared to India and has been enjoying the high life ever since. 有一天,其中一名交易员消失了,再也没有回来。不久之后,大批审计人员开始搜查他的文件和电脑。事实证明,就像 Kerviel 先生一样,这名交易员曾在信息技术(IT)部门工作,电脑技术相当娴熟。他设法制造了数百万的虚假利润,没人质疑他,直到有一天,一次电脑崩溃意外地阻止了他的违规程序,揭露了他的行为。传闻说他逃到了印度,从此过上了奢华的生活。
So there you have it. I hope I have made a convincing case that independent traders can gain an edge over institutional traders, if trading is conducted with discipline and care. Of course, the side benefits of being independent are numerous, and they begin with freedom. Personally, I am much happier with my work now than I have ever been in my career, despite the inevitable gut-wrenching drawdowns from time to time. 就是这样。我希望我已经充分说明了,如果交易时保持纪律和谨慎,独立交易者是可以比机构交易者获得优势的。当然,作为独立交易者的额外好处很多,首先就是自由。就我个人而言,尽管偶尔会经历令人痛苦的回撤,但我现在对自己的工作比职业生涯中任何时候都更满意。
NEXT STEPS 下一步
So let’s say you have found a few good, simple strategies and are happily trading in your spare bedroom. Where do you go from here? How do you grow? 假设你已经找到了一些不错的、简单的策略,并且愉快地在你的闲置卧室里进行交易。那么接下来该怎么办?你如何实现成长?
Actually, I discussed growth in Chapter 6, but in a limited sense. Using the Kelly formula, you can indeed achieve exponential growth of your equity, but only up to the total capacity of your strategies. After that, the source of growth has to come from increasing the number of strategies. You can, for example, look for strategies that trade at higher frequencies than the ones you currently have. To do that, you have to invest and upgrade your technological infrastructure, and purchase expensive high-frequency historical data. Or, conversely, you can look for strategies that hold for longer periods. Despite their typically lower Sharpe ratios, they do enormously improve your capacity. For many of these strategies, you probably have to invest in expensive historical fundamental data for your backtest. If you are an equity trader, you can branch out into futures or currencies. If you run out of ideas or lack expertise in a new market that you want to enter, you can form collaborations with other like-minded traders, or you can hire consultants to help with the research. If you are running too many strategies to manage manually on your own, you can push your automation further so that there is no need for you to manually intervene in the daily trading unless exceptions or problems occur. Of course, you can also hire a trader to monitor all these strategies for you. Finally, if you don’t think you have a monopoly of all trading ideas in the universe, and would like to diversify and benefit from trading talent all around the world, hire a subadvisor from the many websites that offer them (e.g., iasg.com, rcmalternatives.com). 实际上,我在第六章讨论了增长,但范围有限。使用凯利公式,确实可以实现股本的指数增长,但这仅限于策略的总容量之内。超过这个容量后,增长的来源必须来自增加策略的数量。例如,你可以寻找比当前策略交易频率更高的策略。为此,你需要投资并升级技术基础设施,购买昂贵的高频历史数据。或者,相反,你也可以寻找持仓时间更长的策略。尽管它们的夏普比率通常较低,但确实大大提高了你的容量。对于许多这类策略,你可能需要投资昂贵的历史基本面数据来进行回测。如果你是股票交易者,可以拓展到期货或外汇市场。如果你缺乏新市场的想法或专业知识,可以与志同道合的交易者合作,或者聘请顾问协助研究。 如果你管理的策略太多,无法手动操作,可以进一步推动自动化,这样除非出现例外或问题,否则你无需每天手动干预交易。当然,你也可以雇佣一名交易员来帮你监控所有这些策略。最后,如果你不认为自己掌握了宇宙中所有的交易想法,并且希望多元化投资,受益于全球的交易人才,可以从许多提供此类服务的网站(例如 iasg.com、rcmalternatives.com)雇佣一名子顾问。
These investments in data, infrastructure, and personnel are all part of reinvesting some of your earnings to further the growth of your trading business, not unlike growing any other type of business. When you have reached a point where your capacity is higher than what the Kelly formula suggests you can prudently utilize, it may be time for you to start taking on investors, who will at the very least defray the costs of your infrastructure, if not provide an incentive fee. 对数据、基础设施和人员的投资,都是将部分收益再投资以促进交易业务增长的一部分,这与发展任何其他类型的业务类似。当你的交易能力超过凯利公式建议你谨慎利用的水平时,可能就是你开始吸引投资者的时候了,投资者至少可以分担你的基础设施成本,甚至可能带来激励费用。
Alternatively, you might want to take your strategy (and, more importantly, your track record) to one of the larger hedge funds or proprietary trading firms and ask for a profit-sharing contract. In the last couple of years, our own fund at QTS has taken on an increasing number of subadvisors, and overall it has greatly improved the consistency of our returns. 或者,你可能想将你的策略(更重要的是你的业绩记录)带到大型对冲基金或自营交易公司,寻求利润分成合同。在过去几年里,我们在 QTS 的基金接纳了越来越多的子顾问,这总体上大大提高了我们回报的稳定性。
After the major losses at quantitative hedge funds in 2007 and again in 2020 Q1, many people have started to wonder if quantitative trading is viable in the long term. Though the talk of the demise of quantitative strategies appears to be premature at this point, it is still an important question from the perspective of an independent trader. Once you have automated everything and your equity is growing exponentially, can you just sit back, relax, and enjoy your wealth? Unfortunately, experience tells us that strategies do lose their potency over time as more traders catch on to them. It takes ongoing research to supply you with new strategies to overcome this dreaded alpha decay. 在 2007 年和 2020 年第一季度量化对冲基金遭受重大亏损之后,许多人开始怀疑量化交易是否具有长期可行性。尽管目前关于量化策略消亡的讨论似乎为时过早,但从独立交易者的角度来看,这仍然是一个重要的问题。一旦你实现了全部自动化,且你的资金呈指数增长,你是否可以坐享其成,轻松享受财富?不幸的是,经验告诉我们,随着越来越多的交易者发现这些策略,策略的效力会随着时间流逝而减弱。持续的研究是提供新策略以克服这种令人畏惧的阿尔法衰减的关键。
Upheavals and major regime changes may occur once every decade or so, and these might cause sudden deaths to certain strategies. As with any commercial endeavor, a period of rapid growth will inevitably be followed by the steady if unspectacular returns of a mature business. As long as financial markets demand instant liquidity, however, there will always be a profitable niche for quantitative trading. 动荡和重大体制变革可能每十年左右发生一次,这些变革可能会导致某些策略的突然失败。正如任何商业活动一样,快速增长期之后不可避免地会进入一个成熟企业的稳定但不显著的回报期。然而,只要金融市场需要即时流动性,量化交易就总会有一个盈利的利基市场。
REFERENCES 参考文献
Clark, Nicola. 2008. “French Bank Says Its Controls Failed for 2 Years.” New York Times, February 21. http://www.nytimes.com/2008/02/21/business /worldbusiness/21bank.html?ex=1361336400&en=cf84f3776a877eac&ei=512 4&partner=permalink&exprod=permalink. Clark, Nicola. 2008. “法国银行称其控制措施失效长达两年。”纽约时报,2 月 21 日。http://www.nytimes.com/2008/02/21/business/worldbusiness/21bank.html?ex=1361336400&en=cf84f3776a877eac&ei=5124&partner=permalink&exprod=permalink.
APPENDIX 附录
A Quick Survey of MATLAB MATLAB 简要概述
MATLAB is a general-purpose software package developed by Mathworks, Inc., which is used by many institutional quantitative researchers and traders as their platform for backtesting, particularly those who work in statistical arbitrage. In Chapter 3, I introduced this platform and compared its pros and cons with some other alternatives. Most of the strategy examples in this book are written in MATLAB. Many of those strategies are portfolio-trading strategies involving hundreds of stocks that are very difficult to backtest in Excel. Here, I will provide a quick survey of MATLAB for those traders who are unfamiliar with the language, so they can see if it is worthwhile for them to invest in acquiring and learning to use this platform for their own backtesting. MATLAB 是由 Mathworks 公司开发的一款通用软件包,许多机构量化研究员和交易员将其作为回测平台,尤其是那些从事统计套利的人员。在第 3 章中,我介绍了该平台,并将其优缺点与其他一些替代方案进行了比较。本书中的大多数策略示例都是用 MATLAB 编写的。许多策略是涉及数百只股票的组合交易策略,这些策略在 Excel 中很难进行回测。在这里,我将为不熟悉该语言的交易员提供一个 MATLAB 的快速概述,以便他们判断是否值得投资购买并学习使用该平台进行自己的回测。
MATLAB is not only a programming language; it is also an integrated development platform that includes a very user-friendly program editor and debugger. It is an interpreted language, meaning that it’s similar to Visual Basic, but unlike a conventional programming language like C , it does not need to be compiled before it can be run. Yet it is much more flexible and powerful for backtesting than using Excel or Visual Basic because of the large number of built-in functions useful for mathematical computations, and because it is an array-processing language that is specially designed to make computations on arrays (i.e., vectors or matrices) simply MATLAB 不仅是一种编程语言;它还是一个集成开发平台,包含一个非常用户友好的程序编辑器和调试器。它是一种解释型语言,这意味着它类似于 Visual Basic,但与传统的编程语言如 C 不同,它不需要在运行前进行编译。然而,由于内置了大量对数学计算有用的函数,并且它是一种专门设计用于对数组(即向量或矩阵)进行计算的数组处理语言,因此在回测方面比使用 Excel 或 Visual Basic 更灵活、更强大。
and quickly. In particular, many loops that are necessary in C or Visual Basic can be replaced by just one line of code in MATLAB. It also includes extensive text-processing facilities such that it is useful as a powerful tool for parsing and analyzing texts (such as web pages). Furthermore, it has a comprehensive graphics library that enables easy plotting of many types of graphs, even animations. (Many of the figures and charts in this book are created using MATLAB.) Finally, MATLAB codes can be compiled into C or C++\mathrm{C}++ executables that can run on computers without the MATLAB platform installed. In fact, there is third-party software that can convert MATLAB code into C source codes, too. 并且速度很快。特别是,许多在 C 或 Visual Basic 中必需的循环,在 MATLAB 中只需一行代码即可替代。它还包含了广泛的文本处理功能,因此作为解析和分析文本(如网页)的强大工具非常有用。此外,它拥有一个全面的图形库,可以轻松绘制多种类型的图表,甚至动画。(本书中的许多图形和图表都是使用 MATLAB 创建的。)最后,MATLAB 代码可以编译成 C 或 C++\mathrm{C}++ 可执行文件,这些文件可以在未安装 MATLAB 平台的计算机上运行。事实上,还有第三方软件可以将 MATLAB 代码转换成 C 源代码。
The basic syntax of MATLAB is very similar to Visual Basic or C. For example, we can initialize the elements of an array xx like this: MATLAB 的基本语法与 Visual Basic 或 C 非常相似。例如,我们可以这样初始化一个数组 xx 的元素:
x(1)=0.1;
x(2)=0.3;
% 3 elements of an array initialized.
% This is by default a row-vector
x(3)=0.2;
Note that we don’t need to first “declare” this array, nor do we need to tell MATLAB its expected size beforehand. If you leave out the “;” sign, MATLAB will print out the result of the content of the variable being assigned a value. Any comments can be written after the “%” sign. If you wish, you can initialize a large number of elements en masse to a common value: 注意,我们不需要先“声明”这个数组,也不需要事先告诉 MATLAB 它的预期大小。如果省略“;”号,MATLAB 会打印出被赋值变量的内容结果。任何注释都可以写在“%”号后面。如果你愿意,可以一次性将大量元素初始化为相同的值:
% assigning the value 0.8 to all elements of a 3-vector y.
This is a row-vector.y=0.8*ones(1, 3)
Now if you want to do a vector addition of the two vectors, you can do it the old-fashioned way (just as you would in C), that is, using a loop: 现在如果你想对这两个向量做向量加法,你可以用传统的方法(就像在 C 语言中那样),也就是使用循环:
for i=1:3
z(i)=x(i)+y(i) % z is [0.9 1.1 1]
end
But the power of MATLAB is that it can handle many array operations in parallel very succinctly, without using loops. (That’s why 但 MATLAB 的强大之处在于它可以非常简洁地并行处理许多数组操作,而无需使用循环。(这就是
it is called a vector-processing language.) So instead of the previous loop, you can just write 它被称为向量处理语言的原因。)所以你可以不用之前的循环,直接写成这样: z=x+y%zz=x+y \% z is the same [[0.9,1.1,1]]\left[\begin{array}{llll}0.9 & 1.1 & 1\end{array}\right] z=x+y%zz=x+y \% z 与 [[0.9,1.1,1]]\left[\begin{array}{llll}0.9 & 1.1 & 1\end{array}\right] 相同
Even more powerful, you can easily select part of the different arrays and operate on them. What do you think would be the results of the following? 更强大的是,你可以轻松选择不同数组的部分元素并对它们进行操作。你觉得下面的结果会是什么? w=x([[1,3]])+z([[2,1]])w=x\left(\left[\begin{array}{ll}1 & 3\end{array}\right]\right)+z\left(\left[\begin{array}{ll}2 & 1\end{array}\right]\right) x([13])\mathrm{x}([13]) selected the first and third elements of x , so x([13])\mathrm{x}([13]) is just [0.10.2].z([21])[0.10 .2] . \mathrm{z}([21]) selected the second and first elements of y , in that order, so z([21])\mathrm{z}([21]) is [1.1 0.9]. So w is [1.2 1.1]. x([13])\mathrm{x}([13]) 选择了 x 的第一个和第三个元素,所以 x([13])\mathrm{x}([13]) 只是 [0.10.2].z([21])[0.10 .2] . \mathrm{z}([21]) 按顺序选择了 y 的第二个和第一个元素,所以 z([21])\mathrm{z}([21]) 是 [1.1 0.9]。因此 w 是 [1.2 1.1]。
You can delete parts of an array just as easily: 你也可以同样轻松地删除数组的部分内容:
x([1 3])=[] % this leaves x as [0.3]
To concatenate two arrays is also trivial. To concatenate by rows, use the “;” to separate the arrays: 连接两个数组也很简单。按行连接时,使用“;”分隔数组:
u=[z([1 1]); w]
% u is now
% [0.9000 0.9000;
% 1.2000 1.1000]
To concatenate by columns, omit the “;”: 按列连接时,省略“;”:
v=[z([l 1]) w]
% v is now
% [0.9000 0.9000 1.2000 1.1000]
Selection of a subarray can be done not only with arrays containing indices; it can be done with arrays containing logical values as well. For example, here is a logical array: 子数组的选择不仅可以用包含索引的数组完成;也可以用包含逻辑值的数组完成。例如,下面是一个逻辑数组:
vlogical=v<1.1
% vlogical is [1 1 0 0], where the 0s and 1's
% indicate whether that element is less than 1.1 or
% note.
vlt=v(vlogical) % vlt is [0.9 0.9]
In fact, we can select the same subarray with the oft-used shorthand. 实际上,我们可以用常用的简写方式选择相同的子数组。
vlt=v(v<1.1) % vlt is the same [0.9 0.9]
If, for some reason, you are interested in the actual indices of the elements of vv that have value less than 1.1, you can use the “find” function: 如果出于某种原因,你对 vv 中值小于 1.1 的元素的实际索引感兴趣,可以使用“find”函数:
idx=find(v<1.1); % idx is [1 2]
Naturally, you can use this index array to select the same subarray as before: 当然,你可以使用这个索引数组来选择之前相同的子数组:
vlt=v(idx); % vlt is the again same [0.9 0.9]
So far, the array examples are all one-dimensional. But of course, MATLAB can deal with multidimensional arrays as well. Here is a two-dimensional example: 到目前为止,数组示例都是一维的。但当然,MATLAB 也可以处理多维数组。这里是一个二维的例子:
You can select the entire row or column of a multidimensional array by using the “:” symbol. For example: 你可以使用“:”符号选择多维数组的整行或整列。例如:
xrl=x(1,:) % xr1 is the first row of x, i.e. xr1 is [1 2 3]
xc2=x(:, 2)% xc2 is the second column of x, i.e. xc2 is
% 2
% 5
% 8
Naturally, you can delete an entire row from an array using the same method. 自然,你也可以用同样的方法删除数组的整行。
x(1,:) = [] % x is now just [4 5 6; 7 8 9]
The transpose of a matrix is indicated by a simple “”. So the transpose of xx is just x^(')x^{\prime}, which is 矩阵的转置用一个简单的“'”表示。所以 xx 的转置就是 x^(')x^{\prime} ,即
7
5
69
Elements of arrays do not have to be numbers. They can be strings, or even arrays themselves. This kind of array is called cell array in MATLAB. In the following example, C is just such a cell array: 数组的元素不必是数字。它们可以是字符串,甚至是数组本身。这种数组在 MATLAB 中称为单元数组。在下面的例子中,C 就是这样一个单元数组:
C={[1 2 3]; [`a' `b' `c' `d']}
% C is
% [lll}12 3
% `abcd'
One of the beauties of MATLAB is that practically all built-in functions can work on all elements of arrays concurrently. For example, MATLAB 的一个优点是几乎所有内置函数都可以同时作用于数组的所有元素。例如,
If the built-in functions of the basic MATLAB platform do not meet all your needs, you can always purchase additional toolboxes from MATLAB. I will discuss some of them as follows. 如果基本 MATLAB 平台的内置函数不能满足你的所有需求,你总是可以从 MATLAB 购买额外的工具箱。我将在下面讨论其中的一些。
There is also a newer data structure in MATLAB called “tables” that I found very useful. (It is very similar to Python’s Pandas Dataframes, in case any Python programmers are reading this.) Tables are arrays that come with column headings and possibly dates for rows, so you don’t need to remember that, for example, cell (1562, 244) refers to the closing price of Tesla on October 23, 2020. Instead, you can retrieve that price by writing: MATLAB 中还有一种较新的数据结构,称为“表格”,我发现它非常有用。(如果有 Python 程序员在阅读这本书,它与 Python 的 Pandas 数据框非常相似。)表格是带有列标题且行可能带有日期的数组,因此你不需要记住例如单元格(1562, 244)指的是 2020 年 10 月 23 日特斯拉的收盘价。相反,你可以通过编写以下代码来获取该价格:
TT{'23-Oct-2020', 'TSLA'}
Tables and timetables make for some powerful simplifications in manipulating time series data that has two dimensions (called panel 表格和时间表在处理具有两个维度(称为面板数据)的时间序列数据时,能够带来一些强大的简化效果
data by economists), such as the P&Ls of a portfolio of stocks. For example, you can use the synchronize function to quickly merge the P&L table and the capital allocation table, automatically aligning their dates. You can also use the retime function to convert daily P&Ls to monthly P&Ls with a single line of code. 经济学家提供的数据),例如一组股票的盈亏表。例如,你可以使用 synchronize 函数快速合并盈亏表和资金分配表,自动对齐它们的日期。你还可以使用 retime 函数,只需一行代码就能将每日盈亏转换为每月盈亏。
Some of the toolboxes useful to quantitative traders are Optimization, Global Optimization (when you don’t want to get stuck in a local minima/maxima and want to try simulated annealing or genetic algorithms), Statistics and Machine Learning (that’s the most useful toolbox for me), Deep Learning, Signal Processing (technical indicators are but a small subset of signal processing functions), Financial (for backtesting!), Financial Instruments (good for options pricing, backing out implied volatility etc.), Econometrics (GARCH, ARIMA, VAR, and all the techniques you need for financial time series analysis), Datafeed (to retrieve data from IQFeed, Quandl, etc., easily), and Trading (to send orders to Interactive Brokers, TT, or even FIX). These toolboxes typically cost only about $50\$ 50 each for a home user. If they still do not meet all your needs, there are also a number of free user-contributed toolboxes available for download from the internet. I have introduced one of them in this book: the Econometrics toolbox developed by James LeSage (www.spatial-econometrics .com). There are a number of others that I have used before such as the Bayes Net toolbox from Kevin Murphy (github.com/bayesnet /bnt). The easy availability of these user-contributed toolboxes and the large community of MATLAB users from whom you can ask for help greatly enhance the usefulness of MATLAB as a computational platform. 对量化交易者有用的一些工具箱包括优化、全局优化(当你不想陷入局部极小值/极大值,想尝试模拟退火或遗传算法时)、统计与机器学习(这是我最常用的工具箱)、深度学习、信号处理(技术指标只是信号处理函数的一个小子集)、金融(用于回测!)、金融工具(适合期权定价、推导隐含波动率等)、计量经济学(GARCH、ARIMA、VAR,以及所有你进行金融时间序列分析所需的技术)、数据馈送(方便从 IQFeed、Quandl 等获取数据)和交易(用于向 Interactive Brokers、TT 甚至 FIX 发送订单)。这些工具箱对于家庭用户通常每个只需约 $50\$ 50 。如果它们仍不能满足你的所有需求,网上还有许多免费用户贡献的工具箱可供下载。 我在本书中介绍了其中一个工具箱:由 James LeSage 开发的计量经济学工具箱(www.spatial-econometrics.com)。我之前还使用过其他一些工具箱,比如 Kevin Murphy 的贝叶斯网络工具箱(github.com/bayesnet/bnt)。这些用户贡献的工具箱的易得性以及庞大的 MATLAB 用户社区,使你能够寻求帮助,这极大地提升了 MATLAB 作为计算平台的实用性。
You can, of course, write your own functions in MATLAB, too. I have given a number of example functions in this book, all of which can be downloaded from my website, www.epchan.com /book. In fact, it is very helpful for you to develop your own library of utilities functions that you often use for constructing trading strategies. As this homegrown library grows, your productivity in developing new strategies will increase as well. 当然,你也可以在 MATLAB 中编写自己的函数。我在本书中给出了许多示例函数,所有这些都可以从我的网站 www.epchan.com/book 下载。事实上,开发你自己常用的实用函数库来构建交易策略非常有帮助。随着这个自制库的不断壮大,你开发新策略的效率也会随之提高。
Bibliography 参考文献
Alexander, Carol. 2001. Market Models: A Guide to Financial Data Analysis. West Sussex: John Wiley & Sons Ltd.
Bailey, David, J. Borwein, Marcos López de Prado, and J. Zhu. 2014. “Pseudo-mathematics and financial charlatanism: The effects of backtest overfitting on out-of-sample performance.” Notices of the American Mathematical Society 61 (5) (May): 458-471. https://ssrn.com/abstract=2308659=2308659. Bailey, David, J. Borwein, Marcos López de Prado 和 J. Zhu. 2014 年。“伪数学与金融骗子行为:回测过拟合对样本外表现的影响。”美国数学学会通报 61 (5) (五月): 458-471. https://ssrn.com/abstract =2308659=2308659 .
Bailey, David, and Marcos López de Prado. 2012. “The Sharpe Ratio Efficient Frontier.” Journal of Risk 15 (2): Winter 2012/13. https://papers .ssrn.com/sol3/papers.cfm?abstract_id=1821643. Bailey, David 和 Marcos López de Prado. 2012 年。“夏普比率有效前沿。”风险杂志 15 (2): 2012/13 冬季. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1821643.
Chan, Ernest. 2006a. “A ‘Highly Improbable’ Event? A Historical Analysis of the Natural Gas Spread Trade That Bought Down Amaranth.” Quantitative Trading blog, October 2, http://epchan.blogspot.com/2006/10/ highly-improbable-event.html. Chan, Ernest. 2006a. “一个‘极不可能’的事件?对导致 Amaranth 崩盘的天然气价差交易的历史分析。”量化交易博客,10 月 2 日,http://epchan.blogspot.com/2006/10/highly-improbable-event.html.
Chan, Ernest. 2006b. “Reader Suggests a Possible Trading Strategy with the GLD-GDX Spread.” Quantitative Trading. November 17. http:// epchan.blogspot.com/2006/11/reader-suggested-possible-trading.html. Chan, Ernest. 2006b. “读者建议的 GLD-GDX 价差可能交易策略。”量化交易,11 月 17 日。http://epchan.blogspot.com/2006/11/reader-suggested-possible-trading.html.
Chan, Ernest. 2013. Algorithmic Trading: Winning Strategies and Their Rationale. Wiley. 陈恩荣。2013 年。《算法交易:获胜策略及其原理》。Wiley 出版社。
Chan, Ernest. 2017. “Paradox Resolved: Why Risk Decreased Expected Log Return but Not Expected Wealth.” Quantitative Trading. May 4. http://epchan.blogspot.com/2017/05/paradox-resolved-why-risk-decreases .html. 陈恩荣。2017 年。“悖论解决:为何风险降低了期望对数收益但未降低期望财富。”《量化交易》。5 月 4 日。http://epchan.blogspot.com/2017/05/paradox-resolved-why-risk-decreases.html。
Chan, Ernest. 2020. “What Is the Probability of Your Profit?” PredictNow .ai. https://www.predictnow.ai/blog/what-is-the-probability-of-profit-of -your-next-trade-introducing-predictnow-ai/. 陈恩荣。2020 年。“你的盈利概率是多少?”PredictNow.ai。https://www.predictnow.ai/blog/what-is-the-probability-of-profit-of-your-next-trade-introducing-predictnow-ai/。
Clark, Nicola. 2008. “French Bank Says Its Controls Failed for 2 Years.” 尼古拉·克拉克。2008 年。“法国银行称其控制措施失效长达两年。”
New York Times, February 21. http://www.nytimes.com/2008/02/21 纽约时报,2 月 21 日。http://www.nytimes.com/2008/02/21
/business/worldbusiness/21bank.html?ex=1361336400&en=cf84f3776a877
eac&ei=5124&partner=permalink&exprod=permalink.
Duhigg, Charles. 2006. “Street Scene; A Smarter Computer to Pick Stock.” 查尔斯·杜希格。2006 年。“街头场景;更智能的计算机选股。”
New York Times, November 24. 纽约时报,11 月 24 日。
Economist. 2007a. “This Year’s Model.” December 13. www.economist .com/finance/displaystory.cfm?story_id=10286619. 经济学人。2007 年 a。“今年的模型。”12 月 13 日。www.economist.com/finance/displaystory.cfm?story_id=10286619。
Economist. 2007b. “Too Much Information.” July 12. www.economist .com/finance/displaystory.cfm?story_id=9482952. 经济学人。2007 年 b。“信息过载。”7 月 12 日。www.economist.com/finance/displaystory.cfm?story_id=9482952。
Economist. 2019. “March of the Machines. The Stock Market Is Now Run by Computers, Algorithms, and Passive Managers.” October 5. www .economist.com/briefing/2019/10/05/the-stockmarket-is-now-run-by -computers-algorithms-and-passive-managers. 经济学人。2019 年。“机器的进军。股市现在由计算机、算法和被动管理者掌控。”10 月 5 日。www.economist.com/briefing/2019/10/05/the-stockmarket-is-now-run-by-computers-algorithms-and-passive-managers。
Fama, Eugene, and Kenneth French. 1992. “The Cross-Section of Expected Stock Returns.” Journal of Finance XLVII (2): 427-465. Fama, Eugene, 和 Kenneth French. 1992. “预期股票收益的横截面。”《金融学杂志》XLVII (2): 427-465。
Fielden, Sandy. 2006. “Seasonal Surprises.” Energy Risk. September. https://www.risk.net/infrastructure/1523742/seasonal-surprises. Fielden, Sandy. 2006. “季节性惊喜。”《能源风险》。九月。https://www.risk.net/infrastructure/1523742/seasonal-surprises。
Gershgorn. 2017. “The Data That Transformed AI Research—and Possibly the World.” Qz. https://qz.com/1034972/the-data-that-changed-the-direction -of-ai-research-and-possibly-the-world/. Gershgorn. 2017. “改变人工智能研究方向——甚至可能改变世界的数据。”Qz。https://qz.com/1034972/the-data-that-changed-the-direction -of-ai-research-and-possibly-the-world/。
Grinold, Richard, and Ronald Kahn. 1999. Active Portfolio Management. New York: McGraw-Hill. Grinold, Richard, 和 Ronald Kahn. 1999. 《主动投资组合管理》。纽约:麦格劳-希尔出版社。
Heston, Steven, and Ronnie Sadka. 2007. “Seasonality in the Cross-Section of Expected Stock Returns.” AFA 2006 Boston Meetings Paper, July. lcb1. uoregon.edu/rcg/seminars/seasonal072604.pdf. Heston, Steven, 和 Ronnie Sadka. 2007. “预期股票收益横截面的季节性。” AFA 2006 波士顿会议论文,7 月。lcb1.uoregon.edu/rcg/seminars/seasonal072604.pdf.
Kahneman, Daniel. 2011. Thinking, Fast and Slow. Farrar, Straus and Giroux. Kahneman, Daniel. 2011. 《思考,快与慢》。Farrar, Straus and Giroux 出版社.
Khandani, Amir E., and Andrew Lo. 2007. “What Happened to the Quants in August 2007?” MIT. https://web.mit.edu/Alo/www/Papers/august07.pdf. Khandani, Amir E., 和 Andrew Lo. 2007. “2007 年 8 月量化交易者发生了什么?” MIT. https://web.mit.edu/Alo/www/Papers/august07.pdf.
Lo, Andrew, 2019. Adaptive Markets: Financial Evolution at the Speed of Thought. Princeton University Press. 卢,安德鲁。2019 年。《适应性市场:以思维速度演进的金融》。普林斯顿大学出版社。
López de Prado, Marcos. 2018. Advances in Financial Machine Learning. Wiley. 洛佩兹·德·普拉多,马科斯。2018 年。《金融机器学习的进展》。Wiley 出版社。
Lowenstein, Roger. 2000. When Genius Failed: The Rise and Fall of Long-Term Capital Management. Random House. 洛文斯坦,罗杰。2000 年。《天才失败时:长期资本管理公司的兴衰》。兰登书屋。
Lux, Hal. 2000. “The Secret World of Jim Simons.” Institutional Investor Magazine, November 1. Lux, Hal. 2000. “吉姆·西蒙斯的秘密世界。”《机构投资者杂志》,11 月 1 日。
Markoff, John. 2007. “Faster Chips Are Leaving Programmers in Their Dust.” New York Times, December 17. www.nytimes.com/2007/12/17 /technology/17chip.html?ex=1355634000&en=a81769355deb7953&ei =5124&partner=permalink&exprod=permalink. Markoff, John. 2007. “更快的芯片让程序员望尘莫及。”《纽约时报》,12 月 17 日。www.nytimes.com/2007/12/17/technology/17chip.html?ex=1355634000&en=a81769355deb7953&ei=5124&partner=permalink&exprod=permalink。
Oldfield, Richard. 2007. Simple but Not Easy. Doddington Publishing. Peters, O., and M. Gell-Mann 2016. “Evaluating Gambles Using Dyanmics.” Chaos 26: 023103. https://doi.org/10.1063/1.4940236. Oldfield, Richard. 2007. 《简单但不易》。多丁顿出版社。Peters, O., 和 M. Gell-Mann 2016. “利用动力学评估赌博。”《混沌》26: 023103。https://doi.org/10.1063/1.4940236。
Phillips, Daniel. 2020. “Investment Strategy Commentary: Value Stocks: Trapped or Spring-Loaded?” Northern Trust, September 24. https://www .northerntrust.com/canada/insights-research/2020/investment-management /value-stocks. Phillips, Daniel. 2020. “投资策略评论:价值股:被困还是蓄势待发?”北方信托,9 月 24 日。https://www.northerntrust.com/canada/insights-research/2020/investment-management/value-stocks。
Poundstone, William. 2005. Fortune’s Formula. New York: Hill and Wang. Ritter, Jay. 2003. “Behavioral Finance.” Pacific-Basin Finance Journal 11 (4) September: 429-437. Poundstone, William. 2005. 《财富公式》。纽约:Hill and Wang。Ritter, Jay. 2003. “行为金融学。”《太平洋盆地金融杂志》11(4)9 月:429-437。
Schiller, Robert. 2008. “Economic View; How a Bubble Stayed under the Radar.” New York Times, March 2. www.nytimes.com/2008/03/02/ business/02view.html?ex=1362286800&en=da9e48989b6f937a&ei=5124&p artner=permalink&exprod=permalink. Schiller, Robert. 2008. “经济视角;泡沫如何未被察觉。”《纽约时报》,3 月 2 日。www.nytimes.com/2008/03/02/business/02view.html?ex=1362286800&en=da9e48989b6f937a&ei=5124&partner=permalink&exprod=permalink。
Sharpe, William. 1994. “The Sharpe Ratio.” Journal of Portfolio Management, Fall. https://jpm.pm-research.com/content/21/1/49. Sharpe, William. 1994. “夏普比率。”《投资组合管理杂志》,秋季。https://jpm.pm-research.com/content/21/1/49。
Singal, Vijay. 2006. Beyond the Random Walk. Oxford University Press. Singal, Vijay. 2006. 《超越随机漫步》。牛津大学出版社。
Taleb, Nassim. 2007. The Black Swan: The Impact of the Highly Improbable. Random House. 塔勒布,纳西姆。2007 年。《黑天鹅:高度不可能事件的影响》。兰登书屋。
Thaler, Richard. 1994. The Winner’s Curse. Princeton, NJ: Princeton University Press. 塞勒,理查德。1994 年。《赢家的诅咒》。新泽西普林斯顿:普林斯顿大学出版社。
Thorp, Edward. 1997. “The Kelly Criterion in Blackjack, Sports Betting, and the Stock Market.” Handbook of Asset and Liability Management, Volume I, Zenios and Ziemba (eds.). Elsevier 2006. www.EdwardOThorp.com. 索普,爱德华。1997 年。“凯利准则在二十一点、体育博彩和股票市场中的应用。”资产与负债管理手册,第一卷,Zenios 和 Ziemba(编)。爱思唯尔,2006 年。www.EdwardOThorp.com。
Toepke, Jerry. 2004. “Fill 'Er Up! Benefit from Seasonal Price Patterns in Energy Futures.” Stocks, Futures and Options Magazine, March 3 (3). www.sfomag.com/issuedetail.asp?MonthNameID=March&yearID=2004. Uhlenbeck, George, and Leonard Ornstein. 1930. “On the Theory of Brownian Motion.” Physical Review 36: 823-841. 托普克,杰瑞。2004 年。“加满油!利用能源期货的季节性价格模式获利。”《股票、期货与期权杂志》,3 月 3 日(第 3 期)。www.sfomag.com/issuedetail.asp?MonthNameID=March&yearID=2004。乌伦贝克,乔治,和伦纳德·奥恩斯坦。1930 年。“布朗运动理论。”《物理评论》36:823-841。
About the Author 关于作者
Ernest P . Chan is the founder of PredictNow.ai, a financial machine-learning SaaS available as a no-code service or via an API. He is also the founder of QTS Capital Management, LLC. (www.qtscm.com), which manages a hedge fund and separate client brokerage accounts. He has been quoted by the Wall Street Journal, New York Times, and CIO magazine on quantitative investing, and has appeared on CNBC’s Closing Bell. His firm has also been profiled in a Bloomberg Businessweek article. Ernest P. Chan 是 PredictNow.ai 的创始人,这是一款金融机器学习 SaaS,提供无代码服务或通过 API 访问。他也是 QTS Capital Management, LLC(www.qtscm.com)的创始人,该公司管理对冲基金和独立客户经纪账户。他在《华尔街日报》、《纽约时报》和《CIO》杂志上就量化投资发表过观点,并曾出现在 CNBC 的《Closing Bell》节目中。他的公司也曾在《彭博商业周刊》的一篇文章中被报道。
Ernie is the author of Quantitative Trading: How to Build Your Own Algorithmic Trading Business, Algorithmic Trading: Winning Strategies and Their Rationale, and Machine Trading: Deploying Computer Algorithms to Conquer the Markets, all published by John Wiley & Sons. To learn more about his books and training courses, visit www.epchan.com. He maintains a popular blog at predictnow.ai/blog, where readers can also download his other publications. Ernie 是《量化交易:如何建立你自己的算法交易业务》、《算法交易:获胜策略及其原理》和《机器交易:部署计算机算法征服市场》三本书的作者,均由约翰·威利父子公司出版。想了解更多关于他的书籍和培训课程,请访问 www.epchan.com。他在 predictnow.ai/blog 维护着一个受欢迎的博客,读者还可以在该博客下载他的其他出版物。
Ernie is an expert in developing statistical models and advanced computer algorithms to discover patterns and trends from large quantities of data. He was a machine-learning researcher at IBM’s T. J. Watson Research Center’s Human Language Technologies group, at the Data Mining and Artificial Intelligence group at Morgan Stanley, and at the Horizon proprietary trading group at Credit Suisse, as well as at various other hedge funds. He is also an adjunct faculty at Northwestern University’s Master’s in Data Science program. Ernie received his undergraduate degree from the University of Toronto and a doctor of philosophy degree in theoretical physics from Cornell University. Ernie 是开发统计模型和高级计算机算法的专家,能够从大量数据中发现模式和趋势。他曾在 IBM 的 T. J. Watson 研究中心人类语言技术组、摩根士丹利的数据挖掘与人工智能组、瑞士信贷的 Horizon 专有交易组以及其他多家对冲基金担任机器学习研究员。他还是西北大学数据科学硕士项目的兼职教师。Ernie 本科毕业于多伦多大学,获得康奈尔大学理论物理学博士学位。
Index 指数
A
Academic trading strategies, 11-12 学术交易策略,第 11-12 页
Actual performance, expected vs., 104-107 实际表现,预期与实际,第 104-107 页
AI (artificial intelligence), xii, 28-29 人工智能(AI),第 xii 页,第 28-29 页
Algorithmic trading, see Quantitative trading 算法交易,参见量化交易
Algoseek, 41, 95 Algoseek,第 41 页,第 95 页
Alpaca, 86, 95, 99 Alpaca,第 86 页,第 95 页,第 99 页
Alpha decay, xii-xiii, 198 阿尔法衰减,第 xii-xiii 页,第 198 页
Alternative trading systems, 85 另类交易系统,85
Amaranth Advisors, 129, 186, 193 Amaranth Advisors,129,186,193
Annualized Sharpe ratio, 48-49 年化夏普比率,48-49
Annualized standard deviation of returns, 49 年化收益标准差,49
Anwar, Yaser, 28 安瓦尔,亚瑟,28
Application programming interface (API), 86, 94, 98-99 应用程序编程接口(API),86,94,98-99
Arbitrage pricing theory (APT), see Factor models 套利定价理论(APT),见因子模型
Arithmetic mean, 112 算术平均数,112
Array processing, in MATLAB, 199-203 数组处理,MATLAB,199-203
Artificial intelligence (AI), xii, 28-29 人工智能(AI),xii,28-29
assert function, MATLAB, 176-177 assert 函数,MATLAB,176-177
Asset allocation, optimal, 189 资产配置,最优,189
Augmented Dickey-Fuller test of cointegration, 149, 155 协整的增强型迪基-富勒检验,149,155
Automated trading systems, 93-101 自动交易系统,93-101
fully automated, 94, 98-100 全自动,94,98-100
growth with, 197 随之增长,197
hiring consultants to build, 100-101 雇佣顾问来构建,100-101
paper trading to test, 103-104 模拟交易测试,103-104
for part-time traders, 15 适合兼职交易者,15
semiautomated, 94-98 半自动化,94-98
software risk with, 125 软件风险,125
and time commitment for quantitative trading, 5-7 量化交易所需的时间和精力投入,5-7
Averages, ensemble vs. time series, 127, 128 平均值,集成与时间序列,127,128
Average daily volume, 101-102 平均日交易量,101-102
B
backshift function, MATLAB, 181 反移位函数,MATLAB,181
Backtesting, xvii, 33-79 回测,xvii,33-79
of academic strategies, 11-12 学术策略的回测,11-12
common platforms for, 34-40 常用回测平台,34-40
Backtesting (Continued) 回测(续)
defined, 57-58 已定义,57-58
determining optimal holding period with, 170 用以确定最佳持有期,170
execution and, 95, 100 执行与回测,95,100
of half-Kelly betting, 122-123 半凯利投注,122-123
of high-frequency 高频
strategies, 188 策略,188
historical databases for, 40-47 历史数据库,40-47
impact of survivorship bias on, 26-27 生存者偏差的影响,26-27
by independent traders, 3 由独立交易者,3
of January effect, 175-179 一月效应,175-179
of mean-reverting strategies, 135 均值回归策略,135
minimum duration of, 60-61 最短持续时间,60-61
paper trading and, 103-104 模拟交易和,103-104
performance measurement for, 47-57 绩效衡量,47-57
pitfalls with, 57-72 陷阱,57-72
of prospective strategies, 20 潜在策略,20
for risk management, 124 风险管理,124
of strategy modifications, 128-129 策略修改,128-129
strategy refinement after, 77-78 策略改进后,77-78
of trader-forum strategies, 12-13 交易者论坛策略,12-13
transaction costs in, 72-77 交易成本,72-77
Backtracker, 94 回溯器,94
Bailey, David J., 60 David J. Bailey,60
Bank of Montreal, 186 蒙特利尔银行,186
Bankruptcy risk, 84 破产风险,84
BasketTrader, 96, 97 BasketTrader,96,97
Basket traders, 94, 96-98 篮子交易员,94,96-98
Bayes Net toolbox, MATLAB, 204 贝叶斯网络工具箱,MATLAB,204
Half-Kelly betting, 112, 122-123 半凯利投注,112,122-123
Half-life, mean-reverting time series, 170-173 半衰期,均值回复时间序列,170-173
“Hard to borrow” stocks, 106 “难以借入”股票,106
Harris, Mike, 12 哈里斯,迈克,12
Hedge funds, 120-122, 193, 198 对冲基金,120-122,193,198
Hedge portfolios, 160 对冲投资组合,160
Hedge ratio, 68-69, 149, 151 对冲比率,68-69,149,151
Herdlike investor behavior, 136-137 群体投资者行为,136-137
High-beta portfolios, 188-189 高贝塔投资组合,188-189
High-capital accounts, 15-19 高资本账户,15-19
High data, in historical databases, 46-47 历史数据库中的高数据,46-47
High-frequency trading: 高频交易:
automated systems for, 94, 100 自动化系统,94,100
capital availability for, 17 资金可用性,17
data-snooping bias and, 59 数据窥探偏差与,59
growth in, 197 增长于,197
programming skills for, 15 编程技能,15
strategies involving, 186-188 涉及的策略,186-188
High-leverage portfolios, 188-189 高杠杆投资组合,188-189
High-minus-low (HML) factor, 161, 169 高减低(HML)因子,161,169
High-speed internet connection, for trading, 88 高速互联网连接,用于交易,88
High watermark, 23 高水位线,23
Historical data, 29, 40-41 历史数据,29,40-41
Historical databases: 历史数据库:
backtesting with, 40-47 用于回测,40-47
errors in, 135 其中的错误,135
high and low data in, 46-47 高低数据,46-47
split and dividend adjusted date on, 41-44 拆分和股息调整日期,41-44
survivorship-bias free data in, 44-46 无生存偏差数据,44-46
HML (high-minus-low) factor, 161, 169 HML(高减低)因子,161,169
Hoffstein, Corey, 12 Hoffstein,Corey,12
Holding period: 持有期:
fixed, 169-173 固定,169-173
and lookback period, 119-120 和回溯期,119-120
optimal, 170-173 最优,170-173
return consistency and, 19 返回一致性,19
HTTP requests, 99 HTTP 请求,99
Hunter, Brian, 129 Hunter,Brian,129
I
IBM, 3 IBM,3
IBroker, 94 IBroker,94
IDEs (Integrated Development Environments), 37, 38 集成开发环境(IDEs),37,38
Incorporation, 82-83 公司注册,82-83
Incremental execution, for large orders, 102, 136 大额订单的增量执行,102,136
Independent traders: average daily volume for, 101-102 business structure for, 81-85 characteristics of, 3-4 drawdown for, 24 exchange of ideas for, 13, 14 独立交易者:平均日交易量,101-102;业务结构,81-85;特征,3-4;回撤,24;交流想法,13,14
Independent traders (Continued) 独立交易者(续)
performance of, xi, 193-196 表现,xi,193-196
processes for, 8-9 流程,8-9
programming consultants for, 100 编程顾问,100
psychological preparedness for, 130 心理准备,130
quantitative trading for, xv-xvi 量化交易,xv-xvi
Industry, cointegration of stocks in same, 153 行业,同一行业股票的协整,153
Information ratio, 21 信息比率,21
Institutional traders: 机构交易者:
drawdown for, 24 回撤,24
independent trading strategies for, xvi 独立交易策略,xvi
low-priced stock trades for, 101 低价股交易的成本,101
market impact costs for, 102 市场影响成本,102
performance of, xi, 193-196 表现,xi,193-196
prospective strategies that compete with, 30 与之竞争的潜在策略,30
psychological preparedness for, 129-130 心理准备,129-130
qualifications of, 2 资格,2
quantitative trading for, xv-xvi 定量交易,xv-xvi
Integrated Development Environments (IDEs), 37, 38 集成开发环境(IDEs),37,38
Interactive Brokers (IBKR), 17 盈透证券(IBKR),17
databases for backtesting from, 41 用于回测的数据库,41
dividend and earnings data from, 94 股息和收益数据,94
execution by, 96-97 执行,96-97
execution system of, 85-86 执行系统,85-86
portfolio margin with, 16 组合保证金,16
programming consultant referrals from, 100 编程顾问推荐,100
RESTful API of, 99 RESTful API,99
retrieving input data with, 95 检索输入数据,95
Interday positions, 16 日间持仓,16
Internet software firm, profitability for, 9 互联网软件公司,盈利能力,9
Intraday positions, 16, 18, 41 盘中持仓,16,18,41
Investment products, 8, 86 投资产品,8,86
ITG, 85 ITG,85
J
January effect, 175-179 一月效应,175-179
Java, 15, 94, 99 Java,15,94,99
K
Kahneman, Daniel, 127 丹尼尔·卡尼曼,127
Kavanaugh, Paul, 184 保罗·卡瓦诺,184
Kelly formula, 111-125 凯利公式,111-125
for capital allocation, 111-112, 114-119 用于资本分配,111-112,114-119
defined, 111 定义,111
with Gaussian return distribution, 131-132 具有高斯收益分布,131-132
half-Kelly betting, 112, 122-123 半凯利投注,112,122-123
for leverage, 112, 113 用于杠杆,112,113
maximum capital and leverage from, 120 最大资本和杠杆,120
psychological preparedness to use, 129-130 使用的心理准备,129-130
for risk management, 120-125 风险管理,120-125
Kerviel, Jérôme, 196 杰罗姆·凯尔维尔,196
Khandani, Amir, 72, 120, 135 阿米尔·汗达尼,72,120,135
Knight Capital Group, 98 奈特资本集团,98
KO (Coca-Cola), 153-159 KO(可口可乐),153-159
Kurzweil, Ray, 28 库兹韦尔,雷,28
L
Lagged historical data, 58 滞后历史数据,58
Large numbers, law of, 187 大数定律,187
Large orders, incremental execution of, 102, 136 大额订单,分批执行,102,136
LastTradingDayOfMonth function, MATLAB, 180-181 LastTradingDayOfMonth 函数,MATLAB,180-181
Law of large numbers, 187 大数定律,187 lcb1.uoregon.edu, 179 lcb1.uoregon.edu,179
LEAN engine, 40 LEAN 引擎,40
Legal requirements, for proprietary trading, 82, 84 专有交易的法律要求,82,84
Legg Mason, 88 Legg Mason,88
LeSage, James, 204 LeSage,James,204
Leverage: 杠杆:
in high-frequency trading, 187-188 在高频交易中的应用,187-188
high-leverage portfolios, 188-189 高杠杆投资组合,188-189
impact of despair and greed on, 129 绝望与贪婪对杠杆的影响,129
for institutional traders, 194 机构交易者,194
Kelly formula for, 110, 112, 113 凯利公式,110,112,113
and long-term capital gain, 19-20 以及长期资本收益,19-20
for low-capital accounts, 16 低资本账户,16
and MAR ratio, 48 和 MAR 比率,48
maximum, 112, 120, 123 最大值,112,120,123
optimal, 109-111, 113 最优,109-111,113
overleveraging, 5, 129, 194 过度杠杆,5,129,194
reducing, 124-125 减少,124-125
in retail vs. proprietary trading, 81-82, 84 零售交易与自营交易中的区别,81-82,84
scaling up by using, 5 通过使用进行规模扩大,5
Leveraged return, 22 杠杆回报,22
Liability, 82-83 责任,82-83
Limited liability companies (LLCs), 82-83 有限责任公司(LLCs),82-83
Limit orders, 46 限价单,46
Limit prices, 97 限价,97
Linear scales, order size and market cap, 102 线性刻度、订单规模和市值,102
Liquidity: 流动性:
average daily volume as measure of, 101-102 以平均日成交量作为衡量标准,101-102
and capacity, 194 以及容量,194
cost of, 25 成本,25
dark-pool, 85, 103 暗池,85,103
in high-frequency trading strategies, 187 在高频交易策略中,187
in mean-reverting regimes, 123-124 在均值回归状态中,123-124
momentum changes due to private, 136-137 动量变化由于私有,136-137
Liquidnet, 85 Liquidnet,85
LLCs (limited liability companies), 82-83 有限责任公司(LLCs),82-83
Lo, Andrew, 72, 120, 135 Andrew Lo,72,120,135
Long-only strategies, 20, 48-51, 195 仅多头策略,20,48-51,195
Long-short dollar-neutral strategy, 20 多空美元中性策略,20
Long-term capital gain, 19-20 长期资本收益,19-20
Long-Term Capital Management, 129, 193 长期资本管理公司,129,193
Long-term compounded growth rate (g)(g) : 长期复合增长率 (g)(g) :
and high-leverage vs. highbeta portfolios, 189 以及高杠杆与高贝塔投资组合,189
maximized, 112-113 最大化,112-113
Long-term compounded 长期复利
growth rate (g)(Continued) 增长率 (g)(续)
for optimal capital allocation, 116 用于最优资本配置,116
optimization of, 110 的优化,110
for stock with geometric random walk, 111-112 对于几何随机游走的股票,111-112
Long-term wealth, 110 长期财富,110
Look-ahead bias: 前瞻性偏差:
in backtesting, 58-59 在回测中,58-59
checking for, in MATLAB, 67 在 MATLAB 中检查,67
paper trading to identify, 103 模拟交易以识别,103
predictors of, 29 的预测指标,29
using Excel to avoid, 34 使用 Excel 避免,34
Lookback period, 119-120 回溯期,119-120
Loss aversion, 126-128 损失厌恶,126-128
Losses: 损失:
behavioral biases and, 126, 128, 129 行为偏差,126,128,129
in high-frequency trading, 187-188 高频交易中的损失,187-188
P&L data, 103-104, 204 盈亏数据,103-104,204
position size and, 120 仓位大小和,120
at quantitative hedge funds, 193, 198 在量化对冲基金,193,198
Low-beta stocks, 189 低贝塔股票,189
Low-capital accounts, 15-19 低资本账户,15-19
Low data, in historical databases, 46-47 历史数据库中的低数据,46-47
Low-priced stocks, 101 低价股票,101
M
Machine learning, xii 机器学习,xii
for backtesting strategies, 38 用于回测策略,38
in Conditional Parameter Optimization, 138-139, 146 在条件参数优化中,138-139,146
for predicting returns on strategy, 146 用于预测策略收益,146
McKinney, Wes, 37 McKinney,Wes,37
Macroeconomic factors, 27, 168 宏观经济因素,27,168
Magazines, trading strategies from, 12 杂志,交易策略来自,12
Manual interference, in trading, 6-7, 125-126 交易中的人工干预,6-7,125-126
Market capitalization, 102 市值,102
Market factor, 161 市场因子,161
Market impact, 25, 83, 101-102 市场影响,25,83,101-102
Market index, 20, 21 市场指数,20,21
Marketing, 7-8 市场营销,7-8
Market-neutral portfolio, 16, 51-53 市场中性投资组合,16,51-53
Market on close (MOC) orders, 46-47 收盘市价单(MOC),46-47
Market on open (MOO) orders, 46-47, 78 开盘市价单(MOO),46-47,78
MAR ratio, 47-48 MAR 比率,47-48
MASS (software package), 37 MASS(软件包),37
Massachusetts Institute of Technology (MIT), 72 麻省理工学院(MIT),72
Mathworks, Inc., 199 Mathworks 公司,199
MATLAB, xiii, xvii, 199-204 MATLAB,xiii,xvii,199-204
analyzing January effect in, 175-177 分析一月效应,175-177
array processing in, 199-203 数组处理,199-203
avoiding look-ahead bias with, 58-59 利用避免前瞻偏差,58-59
backtesting with, 33-36 回测,33-36
benefits of, 199-200, 203-204 优势,199-200,203-204
cointegrating pairs of stocks formed with, 149-150 用……形成的协整股票对,149-150
correlation testing with, 156 用……进行相关性测试,156
evaluating transaction costs in, 73-75 用……评估交易成本,73-75
execution programs in, 96-98 执行程序,96-98 页
half-life of mean-reverting time series in, 171 均值回复时间序列的半衰期,171 页
maximum drawdown calculation in, 54-55 最大回撤计算,54-55 页
maximum drawdown duration in, 54-55 最大回撤持续时间,54-55 页
optimal capital allocation in, 114-116 最佳资本配置,114-116
out-of-sample testing with, 63-67 使用的样本外测试,63-67
PCA factor model using, 164-165 使用的 PCA 因子模型,164-165
Python vs., 37 Python 与,37
retrieving Yahoo! Finance data with, 35-36 使用 Yahoo! Finance 数据检索,35-36
R vs., 38 R 与,38
Sharpe ratio in, 50, 52 夏普比率,50,52
year-on-year seasonal strategy in, 180-181 同比季节性策略,180-181
Mauboussin, Michael, 88 莫博辛,迈克尔,88
Maximum capital, 120 最大资本,120
Maximum drawdown, 23, 24,47,53-5724,47,53-57 最大回撤,23, 24,47,53-5724,47,53-57
Maximum drawdown duration, 23-24, 53-57 最大回撤持续时间,23-24,53-57
Maximum leverage, 112, 120, 123 最大杠杆,112,120,123
Mean excess return, 116 平均超额收益,116
Mean-reverting regimes, 123, 126, 163 均值回复状态,123,126,163
Mean-reverting strategies: 均值回复策略:
exit signals for, 170-173 退出信号,170-173
momentum strategies vs., 134-137 动量策略与,134-137
refining, 78 优化,78
stationarity and cointegration of time series in, 133,147,153133,147,153 时间序列的平稳性和协整, 133,147,153133,147,153
transaction costs for, 26, 72-77 交易成本,26,72-77
Mean-reverting time series, halflife of, 170-173 均值回复时间序列,半衰期,170-173
Melvin Capital, 138 Melvin Capital,138
“Meme” stocks, 138 “梗”股票,138
Metalabeling, xii, 29 元标签,xii,29
Micro E-mini contracts (MES), 19 微型 E-mini 合约(MES),19
Millennium Partners, 14 千禧合伙人,14
Mini gasoline futures, 184 迷你汽油期货,184
Minimum backtest length, 60-61 最短回测长度,60-61
Mini natural gas futures, 186 迷你天然气期货,186
MIT (Massachusetts Institute of Technology), 72 麻省理工学院(MIT),72
mnormt (software package), 37 mnormt(软件包),37
MOC (market on close) orders, 46-47 收盘市价单(MOC),46-47
Model risk, 124-125 模型风险,124-125
Momentum, of factor returns, 162-163, 168-169 因子收益的动量,162-163,168-169
Momentum regimes, 123, 126, 135-137 动量状态,123,126,135-137
Momentum strategies: 动量策略:
exit signals for, 169170, 173, 174 退出信号,169170,173,174
mean-reverting strategies vs., 134-137 均值回归策略对比,134-137
Money management, 8 资金管理,8
Monitors, for trading, 89 监控器,用于交易,89
MOO (market on open) orders, 46-47, 78 MOO(开盘市价单)订单,46-47,78
W
Walk Forward Optimization, 139 前向优化,139
Wall Street Horizon, 94 华尔街地平线,94
Wealth, as objective, 4, 5 财富,作为目标,4,5
Wealth-Lab, 13 Wealth-Lab,第 13 页
“What Happened to the Quants in August 2007?” (Khandani and Lo), 120-121 “2007 年 8 月量化交易者发生了什么?”(Khandani 和 Lo),第 120-121 页
“Where Have All the Stat Arb Profits Gone?” (Sterge), 106 “统计套利的利润都去哪了?”(Sterge),第 106 页
Windows Remote Desktop, 89 Windows 远程桌面,第 89 页
Winners-minus-losers (WML) factor, 162 赢家减输家(WML)因子,162
Wittgenstein, Ludwig, 123 维特根斯坦,路德维希,123
Working hours, strategy fit to, 14-15 工作时间,策略适应,14-15
WorldCom, 86 WorldCom,86
Go to www.wiley.com/go/eula to access Wiley’s ebook EULA. 访问 www.wiley.com/go/eula 获取 Wiley 电子书最终用户许可协议。
^(1){ }^{1} This section was adapted from my blog article “Parameterless Trading Models,” which you can find at epchan.blogspot.com/2008/05/parameterless-tradingmodels.html. ^(1){ }^{1} 本节内容改编自我的博客文章“无参数交易模型”,您可以在 epchan.blogspot.com/2008/05/parameterless-tradingmodels.html 查阅。
^(1){ }^{1} This example was reproduced with corrections from my blog article “Maximizing Compounded Rate of Return,” which you can find at epchan.blogspot.com/2006/ 10/maximizing-compounded-rate-of-return.html. ^(1){ }^{1} 这个例子是根据我博客文章《最大化复合收益率》修正后重现的,您可以在 epchan.blogspot.com/2006/10/maximizing-compounded-rate-of-return.html 找到该文章。
*This commentary was reproduced from my blog article of the same title at predictnow.ai /blog/loss-aversion-is-not-a-behavioral-bias/ *本评论摘自我在 predictnow.ai/blog/loss-aversion-is-not-a-behavioral-bias/ 上同名博客文章。