Skip to content

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
jph00 committed Mar 5, 2020
1 parent e477ef2 commit 91206f8
Show file tree
Hide file tree
Showing 6 changed files with 278 additions and 308 deletions.
162 changes: 45 additions & 117 deletions 01_intro.ipynb

Large diffs are not rendered by default.

35 changes: 9 additions & 26 deletions 02_production.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The five lines of code we saw in <<chaptter_intro>> are just one small part of the process of using deep learning in practice. In this chapter, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically: we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, learn about how to create datasets, look at possible gotchas when using deep learning in practice, and more. Many of the key points will apply equally well to other deep learning problems, such as we showed in <<chaptter_intro>>. If you work through a problem similar in key respects to our example problems, we expect you to get excellent results with little code, quickly.\n",
"The five lines of code we saw in <<chapter_intro>> are just one small part of the process of using deep learning in practice. In this chapter, we're going to use a computer vision example to look at the end-to-end process of creating a deep learning application. More specifically: we're going to build a bear classifier! In the process, we'll discuss the capabilities and constraints of deep learning, learn about how to create datasets, look at possible gotchas when using deep learning in practice, and more. Many of the key points will apply equally well to other deep learning problems, such as we showed in <<chapter_intro>>. If you work through a problem similar in key respects to our example problems, we expect you to get excellent results with little code, quickly.\n",
"\n",
"Let's start with how you should frame your problem."
]
Expand Down Expand Up @@ -274,7 +274,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> important: Services that can be used for creating datasets come and go all the time, and their features, interfaces, and pricing change regularly too. In this section, we'll show how to use one particular provider, *Bing Image Search*, using the service they have as this book as written. We'll be providing more options and more up to date information on the [book website](https://book.fast.ai), so be sure to have a look there now to get the most current information on how to download images from the web to create a dataset for deep learning."
"> important: Services that can be used for creating datasets come and go all the time, and their features, interfaces, and pricing change regularly too. In this section, we'll show how to use one particular provider, _Bing Image Search_, using the service they have as this book as written. We'll be providing more options and more up to date information on the http://book.fast.ai[book website], so be sure to have a look there now to get the most current information on how to download images from the web to create a dataset for deep learning."
]
},
{
Expand Down Expand Up @@ -467,7 +467,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> j: I just love this about working in Jupyter notebooks! It's so easy to gradually build what I want, and check my work every step of the way. I make a *lot* of mistakes, so this is really helpful to me..."
"> j: I just love this about working in Jupyter notebooks! It's so easy to gradually build what I want, and check my work every step of the way. I make a _lot_ of mistakes, so this is really helpful to me..."
]
},
{
Expand Down Expand Up @@ -622,7 +622,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> jargon: DataLoaders: a fastai class which stores whatever `DataLoader` objects you pass to it, and makes them available as properties."
"> jargon: DataLoaders: A fastai class which stores whatever `DataLoader` objects you pass to it, and makes them available as properties."
]
},
{
Expand Down Expand Up @@ -1181,7 +1181,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> note: After cleaning the dataset using the above steps, we generally are seeing 100% accuracy on this task. We even see that result when we download a lot less images than the 150 per class we're using here. As you can see, the common complaint *you need massive amounts of data to do deep learning* can be a very long way from the truth!"
"\n",
"> note: After cleaning the dataset using the above steps, we generally are seeing 100% accuracy on this task. We even see that result when we download a lot less images than the 150 per class we're using here. As you can see, the common complaint _you need massive amounts of data to do deep learning_ can be a very long way from the truth!"
]
},
{
Expand Down Expand Up @@ -1707,20 +1708,6 @@
"6. Click \"Launch\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1820,7 +1807,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"> j: I started a company 20 years ago called *Optimal Decisions* which used machine learning and optimisation to help giant insurance companies set their pricing, impacting tens of billions of dollars of risks. We used the approaches described above to manage the potential downsides of something that might go wrong. Also, before we worked with our clients to put anything in production, we tried to simulate the impact by testing the end to end system on their previous year's data. It was always quite a nerve-wracking process, putting these new algorithms in production, but every rollout was successful."
"> j: I started a company 20 years ago called _Optimal Decisions_ which used machine learning and optimisation to help giant insurance companies set their pricing, impacting tens of billions of dollars of risks. We used the approaches described above to manage the potential downsides of something that might go wrong. Also, before we worked with our clients to put anything in production, we tried to simulate the impact by testing the end to end system on their previous year's data. It was always quite a nerve-wracking process, putting these new algorithms in production, but every rollout was successful."
]
},
{
Expand All @@ -1834,13 +1821,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the biggest challenges in rolling out a model is that your model may change the behaviour of the system it is a part of. For instance, consider YouTube's recommendation system. A couple of years ago Google talked about how they had introduced reinforcement learning (closely related to deep learning, but where your loss function represents a result which could be a long time after an action occurs) to improve their recommendation system. They described how they used an algorithm which made recommendations such that watch time would be optimised.\n",
"\n",
"However, human beings tend to be drawn towards controversial content. This meant that videos about things like conspiracy theories started to get recommended more and more by the recommendation system. Furthermore, it turns out that the kinds of people that are interested in conspiracy theories are also people that watch a lot of online videos! So, they started to get drawn more and more towards YouTube. The increasing number of conspiracy theorists watching YouTube resulted in the algorithm recommending more and more conspiracy theories and other extremist content, which resulted in more extremists watching videos on YouTube, and more people watching YouTube developing extremist views, which led to the algorithm recommending more extremist content... The system became so out of control that in February 2019 it led the New York Times to run the headline \"YouTube Unleashed a Conspiracy Theory Boom. Can It Be Contained?\"footnote:[https://www.nytimes.com/2019/02/19/technology/youtube-conspiracy-stars.html]\n",
"\n",
"One of our reviewers for this book, Aurélien Géron, led YouTube's video classification team from 2013 to 2016. He pointed out that it's not just feedback loops involving humans that are a problem. There can also be feedback loops without humans! He told us about an example from YouTube:\n",
"One of the biggest challenges in rolling out a model is that your model may change the behaviour of the system it is a part of. For instance, consider a \"predictive policing\" algorithm that predicts more crime in certain neighborhoods, causing more police officers to be sent to those neighborhoods, which can result in more crime being recorded in those neighborhoods, and so on. In the Royal Statiscal Society paper [To predict and serve](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/j.1740-9713.2016.00960.x), Kristian Lum and William Isaac write: \"predictive policing is aptly named: it is predicting future policing, not future crime\".\n",
"\n",
"> \"One important signal to classify the main topic of a video is the channel it comes from. For example, a video uploaded to a cooking channel is very likely to be a cooking video. But how do we know what topic a channel is about? Well… in part by looking at the topics of the videos it contains! Do you see the loop? For example, many videos have a description which indicates what camera was used to shoot the video. As a result, some of these videos might get classified as videos about “photography”. If a channel has such as misclassified video, it might be classified as a “photography” channel, making it even more likely for future videos on this channel to be wrongly classified as “photography”. This could even lead to runaway virus-like classifications! One way to break this feedback loop is to classify videos with and without the channel signal. Then when classifying the channels, you can only use the classes obtained without the channel signal. This way, the feedback loop is broken.\"\n",
"Part of the issue in this case is that in the presence of *bias* (which we'll discuss in depth in the next chapter), feedback loops can result in negative implications of that bias getting worse and worse. For instance, there are concerns that this is already happening in the US, where there is significant bias in arrest rates on racial grounds. [According to the ACLU](https://www.aclu.org/issues/smart-justice/sentencing-reform/war-marijuana-black-and-white), \"despite roughly equal usage rates, Blacks are 3.73 times more likely than whites to be arrested for marijuana\". The impact of this bias, along with the roll-out of predictive policing algorithms in many parts of the US, led Bärí Williams to [write in the NY Times](https://www.nytimes.com/2017/12/02/opinion/sunday/intelligent-policing-and-my-innocent-children.html): \"The same technology that’s the source of so much excitement in my career is being used in law enforcement in ways that could mean that in the coming years, my son, who is 7 now, is more likely to be profiled or arrested — or worse — for no reason other than his race and where we live.\"\n",
"\n",
"A helpful exercise prior to rolling out a significant machine learning system is to consider this question: \"what would happen if it went really, really well?\" In other words, what if the predictive power was extremely high, and its ability to influence behaviour was extremely significant? In that case, who would be most impacted? What would the most extreme results potentially look like? How would you know what was really going on?\n",
"\n",
Expand Down
Loading

0 comments on commit 91206f8

Please sign in to comment.