这是用户在 2025-7-21 16:23 为 https://d3c33hcgiwev3.cloudfront.net/_ef0f4ba320a5d8bd7759404a4849a425_intro_to_r_Coursera.Rmd?Expir... 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
---
title: "Introduction to R and RStudio"
output: statsr:::statswithr_lab
---
--- title: "R 和 RStudio 入门" output: statsr:::statswithr_lab ---


<div id="instructions"> Complete all **Exercises**, and submit answers to **Questions** on the Coursera platform. </div>
<div id="instructions"> 完成所有 **练习**,并在 Coursera 平台上提交 **问题** 的答案。</div>


The goal of this lab is to introduce you to R and RStudio, which you'll be using throughout the course both to learn the statistical concepts discussed in the course and to analyze real data and come to informed conclusions. To straighten out which is which: R is the name of the programming language itself and RStudio is a convenient interface.
本实验的目标是向你介绍 R 和 RStudio,你将在整个课程中使用它们来学习课程中讨论的统计概念,并分析真实数据以得出有根据的结论。需要明确的是:R 是编程语言本身的名字,而 RStudio 是一个方便的界面。


As the labs progress, you are encouraged to explore beyond what the labs dictate; a willingness to experiment will make you a much better programmer. Before we get to that stage, however, you need to build some basic fluency in R. Today we begin with the fundamental building blocks of R and RStudio: the interface, reading in data, and basic commands.
随着实验的进行,鼓励你探索实验之外的内容;愿意尝试将使你成为一名更好的程序员。然而,在达到这一阶段之前,你需要在 R 上建立一些基本的熟练度。今天,我们将从 R 和 RStudio 的基本构建块开始:界面、导入数据以及基本命令。


## RStudio  RStudio

Your RStudio window has four panels.
你的 RStudio 窗口有四个面板。


Your R Markdown file (this document) is in the upper left panel.
你的 R Markdown 文件(这份文档)位于左上角的面板中。


The panel on the lower left is where the action happens. It's called the *console*. Everytime you launch RStudio, it will have the same text at the top of the console telling you the version of R that you're running. Below that information is the *prompt*. As its name suggests, this prompt is really a request, a request for a command. Initially, interacting with R is all about typing commands and interpreting the output. These commands and their syntax have evolved over decades (literally) and now provide what many users feel is a fairly natural way to access data and organize, describe, and invoke statistical computations.
左下角的面板是“控制台”。每次启动 RStudio 时,控制台顶部都会有相同的文字,告诉你正在运行的 R 版本。控制台下方是“提示符”。如其名称所示,提示符实际上是一个请求,一个需要输入命令的请求。与 R 交互最初主要是输入命令并解释输出。这些命令及其语法在几十年的时间里不断发展,现在为许多用户提供了相对自然的方式来访问数据、组织、描述和执行统计计算。


The panel in the upper right contains your *workspace* as well as a history of the commands that you've previously entered.
右上角的面板包含你的 *工作区* 以及你之前输入的命令的历史记录。


Any plots that you generate will show up in the panel in the lower right corner. This is also where you can browse your files, access help, manage packages, etc.
你生成的任何图表都会显示在右下角的面板中。这里也可以浏览文件、访问帮助、管理包等。


## R Packages   ## R 包

R is an open-source programming language, meaning that users can contribute packages that make our lives easier, and we can use them for free. For this lab, and many others in the future, we will use the following R packages:
R 是一种开源编程语言,这意味着用户可以贡献使我们的生活更轻松的包,并且我们可以免费使用它们。对于这个实验课,以及未来许多其他实验课,我们将使用以下 R 包:


- `statsr`: for data files and functions used in this course - `dplyr`: for data wrangling - `ggplot2`: for data visualization
- `statsr`: 用于本课程的数据文件和函数 - `dplyr`: 用于数据整理 - `ggplot2`: 用于数据可视化


You should have already installed these packages using commands like `install.packages` and `install_github`.
您应该已经使用 `install.packages` 和 `install_github` 等命令安装了这些包。


Next, you need to load the packages in your working environment. We do this with the `library` function. Note that you only need to **install** packages once, but you need to **load** them each time you relaunch RStudio.
接下来,您需要在工作环境中加载这些包。我们使用 `library` 函数来加载这些包。请注意,您只需安装包一次,但每次重新启动 RStudio 时都需要加载它们。


```{r load-packages, message = FALSE} library(dplyr) library(ggplot2) library(statsr) ```
```{r load-packages, message = FALSE} library(dplyr) library(ggplot2) library(statsr) ```


To do so, you can
要这样做,你可以


- click on the green arrow at the top of the code chunk in the R Markdown (Rmd) file, or - highlight these lines, and hit the **Run** button on the upper right corner of the pane, or - type the code in the console.
- 点击 R Markdown (Rmd) 文件中代码块顶部的绿色箭头,或 - 选中这些行,然后点击窗格右上角的 **运行** 按钮,或 - 在控制台中输入代码。


Going forward you will be asked to load any relevant packages at the beginning of each lab.
接下来,每次实验课开始时你都需要加载任何相关的包。


## Dataset 1: Dr. Arbuthnot's Baptism Records
## 数据集 1:Dr. Arbuthnot 的洗礼记录


To get you started, run the following command to load the data.
要开始,请运行以下命令加载数据。


```{r load-abrbuthnot-data} data(arbuthnot) ```
```{r load-abrbuthnot-data} data(arbuthnot) ```


To do so, once again, you can
要执行此操作,您可以再次


- click on the green arrow at the top of the code chunk in the R Markdown (Rmd) file, or - put your cursor on this line, and hit the **Run** button on the upper right corner of the pane, or - type the code in the console.
- 点击 R Markdown (Rmd) 文件中代码块顶部的绿色箭头,或 - 将光标放在该行上,点击面板右上角的“运行”按钮,或 - 在控制台中输入代码。


This command instructs R to load some data. The Arbuthnot baptism counts for boys and girls. You should see that the workspace area in the upper righthand corner of the RStudio window now lists a data set called `arbuthnot` that has 82 observations on 3 variables. As you interact with R, you will create a series of objects. Sometimes you load them as we have done here, and sometimes you create them yourself as the byproduct of a computation or some analysis you have performed.
该命令指示 R 加载一些数据。阿伯斯诺特的男婴和女婴洗礼记录数据集已经加载。你应该看到 RStudio 窗口右上角的工作区区域现在列出了一个名为`arbuthnot`的数据集,该数据集包含 3 个变量,共 82 个观测值。在与 R 交互时,你会创建一系列对象。有时你会像我们在这里所做的那样加载它们,有时你会自己创建它们,作为计算或你所进行的分析的副产品。


The Arbuthnot data set refers to Dr. John Arbuthnot, an 18<sup>th</sup> century physician, writer, and mathematician. He was interested in the ratio of newborn boys to newborn girls, so he gathered the baptism records for children born in London for every year from 1629 to 1710. We can take a look at the data by typing its name into the console.
阿伯斯诺特数据集指的是 18 世纪的医生、作家和数学家约翰·阿伯斯诺特博士。他对新生儿男婴与女婴的比例感兴趣,因此收集了 1629 年至 1710 年间在伦敦出生的儿童的洗礼记录。我们可以通过在控制台中输入数据集的名称来查看数据。


```{r view-data} arbuthnot ```

However printing the whole dataset in the console is not that useful. One advantage of RStudio is that it comes with a built-in data viewer. Click on the name `arbuthnot` in the *Environment* pane (upper right window) that lists the objects in your workspace. This will bring up an alternative display of the data set in the *Data Viewer* (upper left window). You can close the data viewer by clicking on the *x* in the upper lefthand corner.
然而,在控制台中打印整个数据集并不是很有用。RStudio 的一个优点是自带了一个内置数据查看器。点击环境窗格(右上角窗口)中列出的工作空间对象的 `arbuthnot` 名称。这将在数据查看器窗格(左上角窗口)中显示数据集的另一种视图。要关闭数据查看器,请点击左上角窗口的右上角的 `x`。


What you should see are four columns of numbers, each row representing a different year: the first entry in each row is simply the row number (an index we can use to access the data from individual years if we want), the second is the year, and the third and fourth are the numbers of boys and girls baptized that year, respectively. Use the scrollbar on the right side of the console window to examine the complete data set.
你应该看到有四列数字,每一行代表不同的年份:每一行的第一个条目只是行号(我们可以用它来访问个别年份的数据),第二个条目是年份,第三和第四个条目分别是当年洗礼的男孩和女孩的数量。使用控制台窗口右侧的滚动条来查看整个数据集。


Note that the row numbers in the first column are not part of Arbuthnot's data. R adds them as part of its printout to help you make visual comparisons. You can think of them as the index that you see on the left side of a spreadsheet. In fact, the comparison to a spreadsheet will generally be helpful. R has stored Arbuthnot's data in a kind of spreadsheet or table called a *data frame*.
注意,第一列中的行号并非阿伯纳什的数据。R 在输出时会添加这些行号以帮助你进行视觉比较。你可以将它们视为电子表格左侧看到的索引。事实上,将数据与电子表格进行比较通常会很有帮助。R 已将阿伯纳什的数据存储在一种称为 *数据框* 的表格或表中。


You can see the dimensions of this data frame by typing:
你可以通过输入以下命令查看此数据框的维度:


```{r dim-data} dim(arbuthnot) ```

This command should output `[1] 82 3`, indicating that there are 82 rows and 3 columns (we'll get to what the `[1]` means in a bit), just as it says next to the object in your workspace. You can see the names of these columns (or variables) by typing:
该命令应输出 `[1] 82 3`,表示有 82 行和 3 列(稍后我们会解释 `[1]` 的含义),正如你在工作区对象旁边看到的那样。你可以通过输入以下命令查看这些列(或变量)的名称:


```{r names-data} names(arbuthnot) ```

1. How many variables are included in this data set? <ol> <li> 2 </li> <li> 3 </li> <li> 4 </li> <li> 82 </li> <li> 1710 </li> </ol>
1. 这个数据集包含多少个变量?<ol> <li> 2 </li> <li> 3 </li> <li> 4 </li> <li> 82 </li> <li> 1710 </li> </ol>


<div id="exercise"> **Exercise**: What years are included in this dataset? Hint: Take a look at the year variable in the Data Viewer to answer this question. </div>
<div id="exercise"> **练习**:这个数据集包含哪些年份?提示:查看 Data Viewer 中的 year 变量来回答这个问题。 </div>


You should see that the data frame contains the columns `year`, `boys`, and `girls`. At this point, you might notice that many of the commands in R look a lot like functions from math class; that is, invoking R commands means supplying a function with some number of arguments. The `dim` and `names` commands, for example, each took a single argument, the name of a data frame.
你应该会看到数据框包含 `year`、`boys` 和 `girls` 三列。此时,你可能会注意到 R 中的许多命令看起来很像数学课上的函数;也就是说,调用 R 命令意味着向函数提供一定数量的参数。例如,`dim` 和 `names` 命令各自都接受一个参数,即数据框的名称。


<div id="boxedtext"> **Tip: ** If you use the up and down arrow keys, you can scroll through your previous commands, your so-called command history. You can also access it by clicking on the history tab in the upper right panel. This will save you a lot of typing in the future. </div>
<div id="boxedtext"> **提示:** 如果您使用上箭头和下箭头键,可以滚动查看您之前的命令,即所谓的命令历史记录。您也可以通过点击右上角面板中的历史记录标签来访问它。这将帮助您在未来节省大量的输入时间。 </div>


### R Markdown  R Markdown

So far we asked you to type your commands in the console. The console is a great place for playing around with some code, however it is not a good place for documenting your work. Working in the console exclusively makes it difficult to document your work as you go, and reproduce it later.
到目前为止,我们要求您在控制台中输入命令。控制台是一个很好的地方来尝试一些代码,然而它不是一个很好的地方来记录您的工作。仅在控制台中工作会使您难以在进行时记录工作,并且稍后重新创建它。


R Markdown is a great solution for this problem. And, you already have worked with an R Markdown document -- this lab! Going forward type the code for the questions in the code chunks provided in the R Markdown (Rmd) document for the lab, and **Knit** the document to see the results.
R Markdown 是解决这个问题的好方法。而且,您已经使用过 R Markdown 文档——这个实验室!从现在开始,在 R Markdown(Rmd)文档中提供的代码块中输入问题的代码,并**编译**文档以查看结果。


### Some Exploration  ### 一些探索

Let's start to examine the data a little more closely. We can access the data in a single column of a data frame separately using a command like
让我们更仔细地检查一下数据。我们可以使用类似以下的命令单独访问数据框中的一列数据


```{r view-boys} arbuthnot$boys ```

This command will only show the number of boys baptized each year. The dollar sign basically says "go to the data frame that comes before me, and find the variable that comes after me".
这条命令只会显示每年洗礼的男孩数量。这里的美元符号基本上表示“前往前面的数据框,并找到其后的变量”。


2. What command would you use to extract just the counts of girls born? <ol> <li> `arbuthnot$boys` </li> <li> `arbuthnot$girls` </li> <li> `girls` </li> <li> `arbuthnot[girls]` </li> <li> `$girls` </li> </ol>
2. 你会使用哪个命令来提取仅出生女孩的数量?<ol> <li> `arbuthnot$boys` </li> <li> `arbuthnot$girls` </li> <li> `girls` </li> <li> `arbuthnot[girls]` </li> <li> `$girls` </li> </ol>


```{r extract-counts-of-girls-born} # type your code for the Question 2 here, and Knit
```{r extract-counts-of-girls-born} # 在这里输入第 2 题的代码并编译


```

Notice that the way R has printed these data is different. When we looked at the complete data frame, we saw 82 rows, one on each line of the display. These data are no longer structured in a table with other variables, so they are displayed one right after another. Objects that print out in this way are called vectors; they represent a set of numbers. R has added numbers in [brackets] along the left side of the printout to indicate locations within the vector. For example, in the arbuthnot$boys vector, 5218 follows [1], indicating that 5218 is the first entry in the vector. And if [43] starts a line, then that would mean the first number on that line would represent the 43rd entry in the vector.
注意,R 打印这些数据的方式有所不同。当我们查看整个数据框时,我们看到了 82 行数据,每行代表一个变量。这些数据不再以表格形式与其他变量一起显示,而是连续排列。以这种方式打印出来的对象称为向量;它们代表一组数字。R 在打印输出的左侧添加了方括号中的数字来指示向量中的位置。例如,在 `arbuthnot$boys` 向量中,5218 跟随 [1],表示 5218 是向量中的第一个条目。如果 [43] 开始一行,那么这一行的第一个数字就代表向量中的第 43 个条目。


R has some powerful functions for making graphics. We can create a simple plot of the number of girls baptized per year with the command
R 有一些强大的绘图功能。我们可以使用命令


```{r plot-girls-vs-year} ggplot(data = arbuthnot, aes(x = year, y = girls)) + geom_point() ```

Before we review the code for this plot, let's summarize the trends we see in the data.
在我们审查这段代码之前,让我们总结一下数据中显示的趋势。


1. Which of the following best describes the number of girls baptised over the years included in this dataset? <ol> <li> There appears to be no trend in the number of girls baptised from 1629 to 1710. </li> <li> There is initially an increase in the number of girls baptised, which peaks around 1640. After 1640 there is a decrease in the number of girls baptised, but the number begins to increase again in 1660. Overall the trend is an increase in the number of girls baptised. </li> <li> There is initially an increase in the number of girls baptised. This number peaks around 1640 and then after 1640 the number of girls baptised decreases. </li> <li> The number of girls baptised has decreased over time. </li> <li> There is an initial increase in the number of girls baptised but this number appears to level around 1680 and not change after that time point. </li> </ol>
1. 以下哪项最能描述 1629 年至 1710 年期间洗礼的女孩数量? <ol> <li> 从 1629 年到 1710 年,洗礼的女孩数量似乎没有趋势。 </li> <li> 初始阶段女孩的洗礼数量有所增加,大约在 1640 年达到峰值。1640 年后,女孩的洗礼数量开始下降,但在 1660 年又开始增加。总体趋势是女孩的洗礼数量增加。 </li> <li> 初始阶段女孩的洗礼数量有所增加,大约在 1640 年达到峰值,之后 1640 年后女孩的洗礼数量开始下降。 </li> <li> 随着时间的推移,洗礼的女孩数量在减少。 </li> <li> 初始阶段女孩的洗礼数量有所增加,但这一数量似乎在 1680 年前后趋于稳定,并在之后没有变化。 </li> </ol>


Back to the code... We use the `ggplot()` function to build plots. If you run the plotting code in your console, you should see the plot appear under the *Plots* tab of the lower right panel of RStudio. Notice that the command above again looks like a function, this time with arguments separated by commas.
回到代码...我们使用 `ggplot()` 函数来构建图表。如果你在控制台运行绘图代码,应该会在 RStudio 右下角面板的“Plots”标签页中看到生成的图表。注意,上面的命令再次看起来像一个函数,这次是通过逗号分隔参数。


- The first argument is always the dataset. - Next, we provide thevariables from the dataset to be assigned to `aes`thetic elements of the plot, e.g. the x and the y axes. - Finally, we use another layer, separated by a `+` to specify the `geom`etric object for the plot. Since we want to scatterplot, we use `geom_point`.
- 第一个参数总是数据集。- 接下来,我们提供数据集中的变量,将其分配给图表的美学元素,例如 x 轴和 y 轴。- 最后,我们使用另一个图层,通过 `+` 分隔,来指定图表的几何对象。因为我们想要绘制散点图,所以使用 `geom_point`。


You might wonder how you are supposed to know the syntax for the `ggplot` function. Thankfully, R documents all of its functions extensively. To read what a function does and learn the arguments that are available to you, just type in a question mark followed by the name of the function that you're interested in. Try the following in your console:
你可能会好奇如何知道 `ggplot` 函数的语法。幸运的是,R 对其所有函数进行了详尽的文档说明。要查看一个函数的功能以及你可以使用的参数,只需在函数名前输入一个问号。尝试在控制台中输入以下内容:


```{r plot-help, tidy = FALSE} ?ggplot ```

Notice that the help file replaces the plot in the lower right panel. You can toggle between plots and help files using the tabs at the top of that panel.
注意帮助文件会替换右下角面板中的图表。您可以通过该面板顶部的标签切换图表和帮助文件。


<div id="boxedtext"> More extensive help for plotting with the `ggplot2` package can be found at http://docs.ggplot2.org/current/. The best (and easiest) way to learn the syntax is to take a look at the sample plots provided on that page, and modify the code bit by bit until you get achieve the plot you want. </div>
<div id="boxedtext"> 要使用 `ggplot2` 包进行更详细的帮助,请参阅 http://docs.ggplot2.org/current/。学习语法的最佳(也是最简单)方法是查看该页面上提供的示例图表,并逐步修改代码,直到得到您想要的图表。</div>


### R as a big calculator
### R 作为大型计算器


Now, suppose we want to plot the total number of baptisms. To compute this, we could use the fact that R is really just a big calculator. We can type in mathematical expressions like
现在,假设我们想要绘制洗礼总数的图表。要计算这个数值,我们可以利用 R 实际上是一个大型计算器的事实。我们可以输入类似于


```{r calc-total-bapt-numbers} 5218 + 4683 ```

to see the total number of baptisms in 1629. We could repeat this once for each year, but there is a faster way. If we add the vector for baptisms for boys to that of girls, R will compute all sums simultaneously.
查看 1629 年的洗礼总数。我们每年都可以重复这种方式,但有一个更快的方法。如果我们将男孩的洗礼数量向量与女孩的洗礼数量向量相加,R 将同时计算所有和。


```{r calc-total-bapt-vars} arbuthnot$boys + arbuthnot$girls ```

What you will see are 82 numbers (in that packed display, because we aren鈥檛 looking at a data frame here), each one representing the sum we鈥檙e after. Take a look at a few of them and verify that they are right.
你将看到 82 个数字(由于我们没有查看数据框,所以显示方式是紧凑的),每个数字代表我们需要的总和。查看其中几个数字并验证它们是否正确。


### Adding a new variable to the data frame
向数据框添加一个新的变量


We'll be using this new vector to generate some plots, so we'll want to save it as a permanent column in our data frame.
我们将使用这个新的向量生成一些图表,因此我们需要将其保存为数据框中的永久列。


```{r calc-total-bapt-vars-save} arbuthnot <- arbuthnot %>% mutate(total = boys + girls) ```

What in the world is going on here? The `%>%` operator is called the **piping** operator. Basically, it takes the output of the current line and pipes it into the following line of code.
这是怎么回事?`%>%` 运算符被称为 **管道** 运算符。基本上,它将当前行的输出传递给后续代码行。


<div id="boxedtext"> **A note on piping: ** Note that we can read these three lines of code as the following:
<div id="boxedtext"> **关于管道的说明:** 请注意,我们可以将这三行代码读作如下:</div>


*"Take the `arbuthnot` dataset and **pipe** it into the `mutate` function. Using this mutate a new variable called `total` that is the sum of the variables called `boys` and `girls`. Then assign this new resulting dataset to the object called `arbuthnot`, i.e. overwrite the old `arbuthnot` dataset with the new one containing the new variable."*
将 `arbuthnot` 数据集通过管道传递给 `mutate` 函数。使用该函数创建一个新的变量 `total`,它是 `boys` 和 `girls` 变量的和。然后将这个新数据集赋值给名为 `arbuthnot` 的对象,即用包含新变量的新数据集覆盖旧的 `arbuthnot` 数据集。


This is essentially equivalent to going through each row and adding up the boys and girls counts for that year and recording that value in a new column called total. </div>
这实际上等同于逐行遍历,并计算该年份的男孩和女孩数量之和,然后将该值记录在名为“总数”的新列中。


<div id="boxedtext"> **Where is the new variable? ** When you make changes to variables in your dataset, click on the name of the dataset again to update it in the data viewer. </div>
<div id="boxedtext"> **新变量在哪里?** 当您修改数据集中变量时,请再次点击数据集的名称以在数据查看器中更新它。 </div>


You'll see that there is now a new column called `total` that has been tacked on to the data frame. The special symbol `<-` performs an *assignment*, taking the output of one line of code and saving it into an object in your workspace. In this case, you already have an object called `arbuthnot`, so this command updates that data set with the new mutated column.
你会看到数据框中现在多了一个名为 `total` 的新列。特殊符号 `<-` 用于进行 *赋值*,将一行代码的输出结果保存到工作区中的一个对象中。在这种情况下,你已经有一个名为 `arbuthnot` 的对象,因此这条命令会更新该数据集,并添加新的计算列。


We can make a plot of the total number of baptisms per year with the following command.
我们可以通过以下命令绘制每年洗礼总数的折线图。


```{r plot-total-vs-year-line} ggplot(data = arbuthnot, aes(x = year, y = total)) + geom_line() ```

Note that using `geom_line()` instead of `geom_point()` results in a line plot instead of a scatter plot. You want both? Just layer them on:
请注意,使用 `geom_line()` 而不是 `geom_point()` 会生成折线图而不是散点图。如果你想同时显示两者,只需将它们叠加即可。


```{r plot-total-vs-year-line-and-point} ggplot(data = arbuthnot, aes(x = year, y = total)) + geom_line() + geom_point() ```

<div id="exercise"> **Exercise**: Now, generate a plot of the proportion of boys born over time. What do you see? </div>
<div id="exercise"> **练习**:现在,生成一个随时间男孩出生比例的图表。你看到了什么? </div>


```{r plot-proportion-of-boys-over-time} # type your code for the Exercise here, and Knit
```{r plot-proportion-of-boys-over-time} # 在这里输入练习的代码并编译 ```


```

Finally, in addition to simple mathematical operators like subtraction and division, you can ask R to make comparisons like greater than, `>`, less than, `<`, and equality, `==`. For example, we can ask if boys outnumber girls in each year with the expression
最后,除了简单的数学运算符如减法和除法外,你还可以让 R 进行比较,比如大于 `>`、小于 `<` 和等于 `==`。例如,我们可以用表达式


```{r boys-more-than-girls} arbuthnot <- arbuthnot %>% mutate(more_boys = boys > girls) ```

This command add a new variable to the `arbuthnot` data frame containing the values of either `TRUE` if that year had more boys than girls, or `FALSE` if that year did not (the answer may surprise you). This variable contains different kind of data than we have considered so far. All other columns in the `arbuthnot` data frame have values are numerical (the year, the number of boys and girls). Here, we've asked R to create *logical* data, data where the values are either `TRUE` or `FALSE`. In general, data analysis will involve many different kinds of data types, and one reason for using R is that it is able to represent and compute with many of them.
这条命令会在 `arbuthnot` 数据框中添加一个新变量,该变量包含值为 `TRUE`(如果该年男孩多于女孩)或 `FALSE`(如果该年男孩不多于女孩)的数据。这个变量包含的数据类型与我们迄今为止考虑的不同。`arbuthnot` 数据框中的其他列都是数值型数据(年份、男孩和女孩的数量)。这里,我们要求 R 创建 *逻辑* 数据,即值为 `TRUE` 或 `FALSE` 的数据。一般来说,数据分析会涉及许多不同类型的数据,而使用 R 的一个原因是它可以表示和计算许多不同类型的数据。


## Dataset 2: Present birth records
## 数据集 2:现代表生记录


In the previous few pages, you recreated some of the displays and preliminary analysis of Arbuthnot's baptism data. Next you will do a similar analysis, but for present day birth records in the United States. Load up the present day data with the following command.
在前面的几页中,您重新创建了 Arbuthnot 派生数据的一些显示和初步分析。接下来,您将对美国现代表生记录进行类似的分析。使用以下命令加载现代表生数据。


```{r load-present-data} data(present) ```

The data are stored in a data frame called `present` which should now be loaded in your workspace.
数据存储在一个名为 `present` 的数据框中,现在应该已经加载到你的工作区中了。


4. How many variables are included in this data set? <ol> <li> 2 </li> <li> 3 </li> <li> 4 </li> <li> 74 </li> <li> 2013 </li> </ol>
4. 这个数据集包含多少个变量?<ol> <li> 2 </li> <li> 3 </li> <li> 4 </li> <li> 74 </li> <li> 2013 </li> </ol>


```{r variables-in-present} # type your code for Question 4 here, and Knit
```{r variables-in-present} # 在这里输入第 4 题的代码,并点击编译按钮 ```


```

<div id="exercise"> **Exercise**: What years are included in this dataset? **Hint:** Use the `range` function and `present$year` as its argument. </div>
<div id="exercise"> **练习**:这个数据集包括哪些年份?**提示**:使用 `range` 函数并将其参数设置为 `present$year`。</div>


```{r years-in-present-data} # type your code for Exercise here, and Knit
```{r years-in-present-data} # 在这里输入练习的代码并编译


```

5. Calculate the total number of births for each year and store these values in a new variable called `total` in the `present` dataset. Then, calculate the proportion of boys born each year and store these values in a new variable called `prop_boys` in the same dataset. Plot these values over time and based on the plot determine if the following statement is true or false: The proportion of boys born in the US has decreased over time. <ol> <li> True </li> <li> False </li> </ol>
5. 计算每年的总出生人数,并将这些值存储在 `present` 数据集的新变量 `total` 中。然后,计算每年出生的男孩比例,并将这些值存储在 `present` 数据集的新变量 `prop_boys` 中。绘制这些值随时间的变化,并根据图表判断以下陈述是真还是假:美国出生的男孩比例随时间下降了。选项:<ol> <li> 真 </li> <li> 假 </li> </ol>


```{r prop-boys-over-time} # type your code for Question 5 here, and Knit
```{r prop-boys-over-time} # 在这里输入第 5 题的代码并编译


```

6. Create a new variable called `more_boys` which contains the value of either `TRUE` if that year had more boys than girls, or `FALSE` if that year did not. Based on this variable which of the following statements is true? <ol> <li> Every year there are more girls born than boys. </li> <li> Every year there are more boys born than girls. </li> <li> Half of the years there are more boys born, and the other half more girls born. </li> </ol>
6. 创建一个名为 `more_boys` 的新变量,该变量包含该年男孩比女孩多时的值 `TRUE`,或该年男孩不比女孩多时的值 `FALSE`。基于这个变量,以下哪项陈述是正确的? * 每年出生的女孩比男孩多。 * 每年出生的男孩比女孩多。 * 每年有一半的时间男孩比女孩多出生,另一半时间女孩比男孩多出生。


```{r more-boys-per-year} # type your code for Question 6 here, and Knit

```

7. Calculate the boy-to-girl ratio each year, and store these values in a new variable called `prop_boy_girl` in the `present` dataset. Plot these values over time. Which of the following best describes the trend? <ol> <li> There appears to be no trend in the boy-to-girl ratio from 1940 to 2013. </li> <li> There is initially an increase in boy-to-girl ratio, which peaks around 1960. After 1960 there is a decrease in the boy-to-girl ratio, but the number begins to increase in the mid 1970s. </li> <li> There is initially a decrease in the boy-to-girl ratio, and then an increase between 1960 and 1970, followed by a decrease. </li> <li> The boy-to-girl ratio has increased over time. </li> <li> There is an initial decrease in the boy-to-girl ratio born but this number appears to level around 1960 and remain constant since then. </li> </ol>
7. 计算每年的男女孩比例,并将这些值存储在一个名为 `prop_boy_girl` 的新变量中,存储在 `present` 数据集中。绘制这些值随时间的变化。以下哪项最能描述趋势? <ol> <li>从 1940 年到 2013 年,男女孩比例似乎没有趋势。</li> <li>最初男女孩比例有所增加,峰值出现在 1960 年左右。1960 年后男女孩比例开始下降,但到 20 世纪 70 年代中期开始增加。</li> <li>最初男女孩比例下降,然后在 1960 年至 1970 年间增加,随后再次下降。</li> <li>男女孩比例随着时间的推移而增加。</li> <li>最初男女孩比例有所下降,但这一数值在 1960 年左右趋于稳定并保持不变。</li> </ol>


```{r prop-boy-girl-over-time} # type your code for Question 7 here, and Knit
```{r prop-boy-girl-over-time} # 在这里输入第 7 题的代码,并编译


```

8. In what year did we see the most total number of births in the U.S.? *Hint:* Sort your dataset in descending order based on the `total` column. You can do this interactively in the data viewer by clicking on the arrows next to the variable names. Or to arrange the data in a descenting order with new function: `descr` (for descending order). <ol> <li> 1940 </li> <li> 1957 </li> <li> 1961 </li> <li> 1991 </li> <li> 2007 </li> </ol>
8. 在哪一年美国的出生总数最多?*提示:*根据 `total` 列对数据集进行降序排序。您可以在数据查看器中通过点击变量名称旁边的箭头来实现这一点。或者,您可以使用 `descr` 函数(用于降序排列)对数据进行降序排列。 <ol> <li>1940</li> <li>1957</li> <li>1961</li> <li>1991</li> <li>2007</li> </ol>


```{r most-total-births} # type your code for Question 8 here # sample code is provided below, edit as necessary, uncomment, and then Knit #present %>% # mutate(total = ?) %>% # arrange(desc(total)) ```
```{r most-total-births} # 在这里输入第 8 题的代码 # 以下为示例代码,请根据需要编辑并取消注释,然后点击编译 #present %>% # mutate(total = ?) %>% # arrange(desc(total)) ```


## Resources for learning R and working in RStudio
## R 和 RStudio 学习资源


That was a short introduction to R and RStudio, but we will provide you with more functions and a more complete sense of the language as the course progresses. You might find the following tips and resources helpful.
这只是一个简短的 R 和 RStudio 介绍,但随着课程的进行,我们将为您提供更多的函数和更完整的语言理解。以下是一些可能对您有帮助的提示和资源。


- In this course we will be using the `dplyr` (for data wrangling) and `ggplot2` (for data visualization) extensively. If you are googling for R code, make sure to also include these package names in your search query. For example, instead of googling "scatterplot in R", google "scatterplot in R with ggplot2".
- 在这门课程中,我们将大量使用 `dplyr`(用于数据整理)和 `ggplot2`(用于数据可视化)。如果你在谷歌中搜索 R 代码,请确保也将这些包名包含在搜索查询中。例如,不要只搜索“R 中的散点图”,而是搜索“使用 ggplot2 的 R 散点图”。


- The following cheathseets may come in handy throughout the course. Note that some of the code on these cheatsheets may be too advanced for this course, however majority of it will become useful as you progress through the course material. - [Data wrangling cheatsheet](http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) - [Data visualization cheatsheet](http://www.rstudio.com/wp-content/uploads/2015/12/ggplot2-cheatsheet-2.0.pdf) - [R Markdown](http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)
- 以下速查表在整个课程中可能会很有用。请注意,这些速查表上的部分代码可能超出了这门课程的范围,但随着你逐步学习课程内容,大部分代码都会变得有用。- [数据整理速查表](http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) - [数据可视化速查表](http://www.rstudio.com/wp-content/uploads/2015/12/ggplot2-cheatsheet-2.0.pdf) - [R Markdown](http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)


- While you will get plenty of exercise working with these packages in the labs of this course, if you would like further opportunities to practice we recommend checking out the relevant courses at [DataCamp](https://www.datacamp.com/courses).
- 虽然在本课程的实验室中,你将有很多机会使用这些包进行练习,如果你希望有更多的练习机会,我们推荐你访问 [DataCamp](https://www.datacamp.com/courses) 上的相关课程。


<div id="license"> This is a derivative of an [OpenIntro](https://www.openintro.org/stat/labs.php) lab, and is released under a [Attribution-NonCommercial-ShareAlike 3.0 United States](https://creativecommons.org/licenses/by-nc-sa/3.0/us/) license. </div>
<div id="license">这衍生自一个 OpenIntro 实验,并根据 Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States 许可发布。</div>