RegressionClick above to start an I

Regression

Click above to start an Interactive Visual Presentation (Plugin Required)
Click here to go to our plugin download and plugin tutorial page

Regression statistics expand on correlation to allow us to use relationships between variables to make predictions. They provide us with tools to write linear equations which can be used to predict the value of a dependent or criterion variable from the value of one or a set of predictor variables.

Linear Functions

Before we start talking about regression we are first going to talk about the general topic of linear functions which some of you may remember from earlier math classes.

Linear Functions Form

The general form of a simple linear function is this equation (Y = a + bX). This equation describes any straight line. The slope of the line is represented by the letter b. The intercept is represented in the equation by the letter a. The value of the intercept, a, is where the line crosses the Y axis. By arbitrarily picking a value for X and using this formula we can determine the the values of Y and therefore draw the line.

Linear Function Example 1

In the blue box at the bottom of the illustration is our equation for a particular line, Y = 2X + 1. From this equation we can determine how the line will appear on a graph. I'll call this graph a "scatterplot" for reasons that we discussed in the lecture on correlation.

Note: In this example, I have reversed the order of the parameters on the right side of the equation, but it is essentially the same. That is, (Y = a + bX) is the same mathematically as (Y = bX + a).

In this example, the slope b = 2 and the intercept a = 1. The next step is to arbitrarily pick at least three X values. For this example I picked the X values of 0, 1, and 2. Using the equation (Y = 2x +1) I can determine that when X = 0 then Y = 1. When X = 1 then Y = 3 and when X = 2 then Y = 5. Try these out for yourself and perhaps try some other values for X as well. Notice that when you draw a line through these points the line crosses the Y axis right at the value of 1. The value "1" is the intercept.

Notice that high values of X give you high values of Y. Later, we'll see that a positive slope corresponds to the idea of a positive relationship in correlation.

Linear Function Example 2
Here's another example. For this line the equation is Y = .5x - 1. This equation looks a little different because the value of the Y-intercept is -1. That means the line will cross the Y axis at -1. The form of our equation remains the same (Y = bX + a) but if you remember your high school algebra the equation Y = .5x + (-1) is the same as Y = .5x -1. This second form is just a simpler way of writing the equation if the value of a is less than 0.

Here I arbitrarily picked the X values of 0, 1, 2, and 3. Then using the equation Y = .5x -1, I determined that the associated Y values would be -1. -.5, 0, and .5. Why don't you try this and check my calculations.

Compare Example 1 and Example 2. What is the effect of changing the slope from 2 (in example 1 to .5 in example 2)? The line with the lower value of slope (.5) is less steep than the line with the higher slope.

What is the effect of changing the intercept, a, from 1 to -1? Notice that the line cuts the Y axis in each case exactly at the value of "a."

Linear Function Example 3
For this line the equation is Y = -1.5x + 2. Here we have a negative value (-1.5) for the slope and the intercept has a value of 2. I picked the X values of 0, 1, and 2 again and then used the equation to determine the Y values.

Compare: What is the effect of making the slope negative? The line slopes the other way. Notice that High values of X now give you Low values of Y. This is true when the slope is negative. We'll see later that a negative slope corresponds to the idea of a negative relationship in correlation.

When the Slope (b) = 0 -- No Relationship
What does the line look like if the slope is 0? It will always look like a flat horizontal line. In this example, regardless of what value you put in for X, Y will always equal 3.

Later, we'll see that a slope of 0 corresponds to the idea of no relationship in correlation.

Y = X
Y = 0 + 1x or simply Y = X is the line that starts at the origin (0, 0) and goes up at a 45 degree angle. By picking the X values of 0, 1, and 2 and using the formula you can determine that the Y values are also 0, 1, and 2, respectively.

Regression Line

In statistics, when we want to predict or estimate one variable, Y, from a second variable, X, we use a procedure called "regression." The "regression line" is the linear function we use to make this prediction. If that doesn't make much sense to you at this point, it's OK. We'll spend a lot of time learning this concept.

NOTATION: When we talk about predicted or estimated values of variable Y, we generally use some symbol like a Y with a little caret or a little hat on the top of it (^), or we use Y prime (Y'). In this class, we will use Y' because it's a lot easier to type an apostrophe for prime than it is to draw one of those little hats in HTML at this time. But in statistics books you'll see several different notations.

In words we'll say "Y prime is equal to a plus bX." In symbols, we'll write Y' = a + bX.

Obviously (except for the prime) Y' = a + bX is very similar to the linear function that we just reviewed.

When you are predicting or estimating values of Y from X, Y is called the criterion variable, and X is called the predictor variable. The criterion variable is often called the dependent variable.

Cigarettes and Health Example
For the purposes of our lecture today we are going to use a made-up example which examines the relationship between cigarettes and health. So, Y might be the number of health problems experienced by an individual between the ages of 65 and 70; and X might be the number of cigarettes he or she smoked per day from the age of 20 until the age of 50. We want to predict Y from X, that is we want to estimate the number of health problems later in life from the number of cigarettes smoked earlier in life. In statistical jargon, we will find the regression line, Y' = a + bX
Health Problems and Smoking
Measurement Operations: Translating a life into the number of cigarettes smoked per day. There are two general methodologies used in such studies. In a RETROSPECTIVE STUDY, we would ask research participants to review their life and report how many cigarettes they smoked per day between the ages of 20 and 50. In a PROSPECTIVE STUDY, we would track people across their lifetime, asking them to record the number of cigarettes they smoke per day. The retrospective study could be done in a few months. The prospective study would take many years. The data from a prospective study is much higher quality because it doesn't rely on the subjects' memory. Either way, the number of cigarettes smoked per day is X; it will be our predictor variable.

Then we're going to predict the number of health problems a participant has between the ages of 65 and 70 from how much they smoked.

Let's say we do a retrospective study. We examine the medical records of participants when they were between 65 and 70 years old, counting the number of health problems they had. Then we give them a questionnaire on how much they've smoked at different times in their life. We want to predict health problems from smoking rate.

Regression Line

Let's say that we're going to have a tiny little sample, usually there are thousands of people in such studies, but we're just going to have a few so that our calculations will be simple. The data is made up.

On the illustration the data is ordered in the table on the left from the lowest X value to the highest; that is, it goes from the least number of cigarettes smoked to the highest. So the person who smoked one per day had three health problems; the person who smoked two packs had ten health problems, and so on.

The table contains the data of individual participants and we've measured two things about each of them. In the general scheme of methodology, a regression study is still a correlational study.

Next we're going to draw a scatterplot. Each dot on the scatterplot represents the data of one person. Perhaps by now you've got enough experience with scatterplots from studying correlations to know that this scatterplot shows a positive relationship. The more smoking the more health problems. The scatterplot shows a pretty high correlation.

We have a scatterplot; but the question is how do we find the linear regression line? How can we draw a line that goes as close as possible to all the points on the graph?

Regression Line

Describing the relationship between X and Y. Little r is one descriptive statistic which can summarize the relationship between cigarettes and health problems in this data. You've studied correlation and know how to calculate r.

There's another descriptive statistic called the regression line. The regression line is the best straight line that we can draw through or between the points on the scatterplot. Obviously a straight line can't connect the all dots because then you'd have to bounce up and down and up and down from one dot to the next, and it wouldn't be a straight line. So we want to be able to draw a single line that comes as close to all the dots as possible.

Least Squares Principle. We'll have to have a criteria for what we mean by "close." The criteria is called the least squares principle. Recall back to our discusssion of Variance. We showed how the variance squares the deviations around the mean. In Regression we will square the deviations around the regression line instead of around the mean. The best fit regression line is the line that has the smallest value for the squared deviations around it, the least squared deviations. That's essentially the whole idea of least squares. But we'll talk about it more later after you are more familiar with t

Regression

Click above to start an Interactive Visual Presentation (Plugin Required)
Click here to go to our plugin download and plugin tutorial page

Regression statistics expand on correlation to allow us to use relationships between variables to make predictions. They provide us with tools to write linear equations which can be used to predict the value of a dependent or criterion variable from the value of one or a set of predictor variables.

Linear Functions

Before we start talking about regression we are first going to talk about the general topic of linear functions which some of you may remember from earlier math classes.

Linear Functions Form

The general form of a simple linear function is this equation (Y = a + bX). This equation describes any straight line. The slope of the line is represented by the letter b. The intercept is represented in the equation by the letter a. The value of the intercept, a, is where the line crosses the Y axis. By arbitrarily picking a value for X and using this formula we can determine the the values of Y and therefore draw the line.

Linear Function Example 1

In the blue box at the bottom of the illustration is our equation for a particular line, Y = 2X + 1. From this equation we can determine how the line will appear on a graph. I'll call this graph a "scatterplot" for reasons that we discussed in the lecture on correlation.

Note: In this example, I have reversed the order of the parameters on the right side of the equation, but it is essentially the same. That is, (Y = a + bX) is the same mathematically as (Y = bX + a).

In this example, the slope b = 2 and the intercept a = 1. The next step is to arbitrarily pick at least three X values. For this example I picked the X values of 0, 1, and 2. Using the equation (Y = 2x +1) I can determine that when X = 0 then Y = 1. When X = 1 then Y = 3 and when X = 2 then Y = 5. Try these out for yourself and perhaps try some other values for X as well. Notice that when you draw a line through these points the line crosses the Y axis right at the value of 1. The value "1" is the intercept.

Notice that high values of X give you high values of Y. Later, we'll see that a positive slope corresponds to the idea of a positive relationship in correlation.

Linear Function Example 2
Here's another example. For this line the equation is Y = .5x - 1. This equation looks a little different because the value of the Y-intercept is -1. That means the line will cross the Y axis at -1. The form of our equation remains the same (Y = bX + a) but if you remember your high school algebra the equation Y = .5x + (-1) is the same as Y = .5x -1. This second form is just a simpler way of writing the equation if the value of a is less than 0.

Here I arbitrarily picked the X values of 0, 1, 2, and 3. Then using the equation Y = .5x -1, I determined that the associated Y values would be -1. -.5, 0, and .5. Why don't you try this and check my calculations.

Compare Example 1 and Example 2. What is the effect of changing the slope from 2 (in example 1 to .5 in example 2)? The line with the lower value of slope (.5) is less steep than the line with the higher slope.

What is the effect of changing the intercept, a, from 1 to -1? Notice that the line cuts the Y axis in each case exactly at the value of "a."

Linear Function Example 3
For this line the equation is Y = -1.5x + 2. Here we have a negative value (-1.5) for the slope and the intercept has a value of 2. I picked the X values of 0, 1, and 2 again and then used the equation to determine the Y values.

Compare: What is the effect of making the slope negative? The line slopes the other way. Notice that High values of X now give you Low values of Y. This is true when the slope is negative. We'll see later that a negative slope corresponds to the idea of a negative relationship in correlation.

When the Slope (b) = 0 -- No Relationship
What does the line look like if the slope is 0? It will always look like a flat horizontal line. In this example, regardless of what value you put in for X, Y will always equal 3.

Later, we'll see that a slope of 0 corresponds to the idea of no relationship in correlation.

Y = X
Y = 0 + 1x or simply Y = X is the line that starts at the origin (0, 0) and goes up at a 45 degree angle. By picking the X values of 0, 1, and 2 and using the formula you can determine that the Y values are also 0, 1, and 2, respectively.

Regression Line

In statistics, when we want to predict or estimate one variable, Y, from a second variable, X, we use a procedure called "regression." The "regression line" is the linear function we use to make this prediction. If that doesn't make much sense to you at this point, it's OK. We'll spend a lot of time learning this concept.

NOTATION: When we talk about predicted or estimated values of variable Y, we generally use some symbol like a Y with a little caret or a little hat on the top of it (^), or we use Y prime (Y'). In this class, we will use Y' because it's a lot easier to type an apostrophe for prime than it is to draw one of those little hats in HTML at this time. But in statistics books you'll see several different notations.

In words we'll say "Y prime is equal to a plus bX." In symbols, we'll write Y' = a + bX.

Obviously (except for the prime) Y' = a + bX is very similar to the linear function that we just reviewed.

When you are predicting or estimating values of Y from X, Y is called the criterion variable, and X is called the predictor variable. The criterion variable is often called the dependent variable.

Cigarettes and Health Example
For the purposes of our lecture today we are going to use a made-up example which examines the relationship between cigarettes and health. So, Y might be the number of health problems experienced by an individual between the ages of 65 and 70; and X might be the number of cigarettes he or she smoked per day from the age of 20 until the age of 50. We want to predict Y from X, that is we want to estimate the number of health problems later in life from the number of cigarettes smoked earlier in life. In statistical jargon, we will find the regression line, Y' = a + bX
Health Problems and Smoking
Measurement Operations: Translating a life into the number of cigarettes smoked per day. There are two general methodologies used in such studies. In a RETROSPECTIVE STUDY, we would ask research participants to review their life and report how many cigarettes they smoked per day between the ages of 20 and 50. In a PROSPECTIVE STUDY, we would track people across their lifetime, asking them to record the number of cigarettes they smoke per day. The retrospective study could be done in a few months. The prospective study would take many years. The data from a prospective study is much higher quality because it doesn't rely on the subjects' memory. Either way, the number of cigarettes smoked per day is X; it will be our predictor variable.

Then we're going to predict the number of health problems a participant has between the ages of 65 and 70 from how much they smoked.

Let's say we do a retrospective study. We examine the medical records of participants when they were between 65 and 70 years old, counting the number of health problems they had. Then we give them a questionnaire on how much they've smoked at different times in their life. We want to predict health problems from smoking rate.

Regression Line

Let's say that we're going to have a tiny little sample, usually there are thousands of people in such studies, but we're just going to have a few so that our calculations will be simple. The data is made up.

On the illustration the data is ordered in the table on the left from the lowest X value to the highest; that is, it goes from the least number of cigarettes smoked to the highest. So the person who smoked one per day had three health problems; the person who smoked two packs had ten health problems, and so on.

The table contains the data of individual participants and we've measured two things about each of them. In the general scheme of methodology, a regression study is still a correlational study.

Next we're going to draw a scatterplot. Each dot on the scatterplot represents the data of one person. Perhaps by now you've got enough experience with scatterplots from studying correlations to know that this scatterplot shows a positive relationship. The more smoking the more health problems. The scatterplot shows a pretty high correlation.

We have a scatterplot; but the question is how do we find the linear regression line? How can we draw a line that goes as close as possible to all the points on the graph?

Regression Line

Describing the relationship between X and Y. Little r is one descriptive statistic which can summarize the relationship between cigarettes and health problems in this data. You've studied correlation and know how to calculate r.

There's another descriptive statistic called the regression line. The regression line is the best straight line that we can draw through or between the points on the scatterplot. Obviously a straight line can't connect the all dots because then you'd have to bounce up and down and up and down from one dot to the next, and it wouldn't be a straight line. So we want to be able to draw a single line that comes as close to all the dots as possible.

Least Squares Principle. We'll have to have a criteria for what we mean by "close." The criteria is called the least squares principle. Recall back to our discusssion of Variance. We showed how the variance squares the deviations around the mean. In Regression we will square the deviations around the regression line instead of around the mean. The best fit regression line is the line that has the smallest value for the squared deviations around it, the least squared deviations. That's essentially the whole idea of least squares. But we'll talk about it more later after you are more familiar with t

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

RegressionClick above to start an Interactive Visual Presentation (Plugin Required)Click here to go to our plugin download and plugin tutorial pageRegression statistics expand on correlation to allow us to use relationships between variables to make predictions. They provide us with tools to write linear equations which can be used to predict the value of a dependent or criterion variable from the value of one or a set of predictor variables.Linear FunctionsBefore we start talking about regression we are first going to talk about the general topic of linear functions which some of you may remember from earlier math classes.Linear Functions FormThe general form of a simple linear function is this equation (Y = a + bX). This equation describes any straight line. The slope of the line is represented by the letter b. The intercept is represented in the equation by the letter a. The value of the intercept, a, is where the line crosses the Y axis. By arbitrarily picking a value for X and using this formula we can determine the the values of Y and therefore draw the line.Linear Function Example 1In the blue box at the bottom of the illustration is our equation for a particular line, Y = 2X + 1. From this equation we can determine how the line will appear on a graph. I'll call this graph a "scatterplot" for reasons that we discussed in the lecture on correlation.Note: In this example, I have reversed the order of the parameters on the right side of the equation, but it is essentially the same. That is, (Y = a + bX) is the same mathematically as (Y = bX + a).In this example, the slope b = 2 and the intercept a = 1. The next step is to arbitrarily pick at least three X values. For this example I picked the X values of 0, 1, and 2. Using the equation (Y = 2x +1) I can determine that when X = 0 then Y = 1. When X = 1 then Y = 3 and when X = 2 then Y = 5. Try these out for yourself and perhaps try some other values for X as well. Notice that when you draw a line through these points the line crosses the Y axis right at the value of 1. The value "1" is the intercept.Notice that high values of X give you high values of Y. Later, we'll see that a positive slope corresponds to the idea of a positive relationship in correlation.Linear Function Example 2Here's another example. For this line the equation is Y = .5x - 1. This equation looks a little different because the value of the Y-intercept is -1. That means the line will cross the Y axis at -1. The form of our equation remains the same (Y = bX + a) but if you remember your high school algebra the equation Y = .5x + (-1) is the same as Y = .5x -1. This second form is just a simpler way of writing the equation if the value of a is less than 0.Here I arbitrarily picked the X values of 0, 1, 2, and 3. Then using the equation Y = .5x -1, I determined that the associated Y values would be -1. -.5, 0, and .5. Why don't you try this and check my calculations.Hãy so sánh các ví dụ 1 và ví dụ 2. Tác dụng của việc thay đổi độ dốc từ 2 (trong ví dụ 1 để.5 trong ví dụ 2) là gì? Phù hợp với giá trị thấp hơn của độ dốc (. 5) là ít dốc hơn dòng với độ dốc cao.Những gì là tác dụng của việc thay đổi đánh chặn, một, từ 1 đến -1? Thông báo rằng dòng cắt giảm các trục Y trong mỗi trường hợp chính xác tại giá trị của "một." Chức năng tuyến tính ví dụ 3Đối với dòng này phương trình là Y = - 1.5 x + 2. Ở đây chúng tôi có một giá trị tiêu cực (-1,5) cho dốc và đánh chặn có giá trị bằng 2. Tôi chọn các giá trị X 0, 1 và 2 một lần nữa và sau đó sử dụng phương trình để xác định giá trị Y.So sánh: Hiệu quả thực hiện dốc phủ định là gì? Đường dốc ngược. Thông báo rằng cao giá trị của X bây giờ cung cấp cho bạn thấp giá trị của Y. Điều này cũng đúng khi dốc là tiêu cực. Chúng ta sẽ thấy sau một sườn tiêu cực tương ứng với ý tưởng của một mối quan hệ tiêu cực trong mối tương quan. Khi dốc (b) = 0 - không có mối quan hệNhững gì nhìn dòng như thế nào nếu dốc là 0? Nó sẽ luôn luôn trông giống như một đường ngang bằng phẳng. Trong ví dụ này, bất kể những gì giá trị mà bạn đặt vào với X, Y sẽ luôn luôn bằng 3.Sau đó, chúng ta sẽ thấy rằng độ dốc 0 tương ứng với ý tưởng không có mối quan hệ trong mối tương quan.Y = XY = 0 + 1 x hoặc chỉ đơn giản là Y = X là dòng mà bắt đầu nguồn gốc (0, 0) và đi lên ở một góc 45 độ. Bằng cách chọn các giá trị X 0, 1 và 2 và sử dụng công thức bạn có thể xác định rằng các giá trị Y cũng 0, 1 và 2, tương ứng.Regression LineIn statistics, when we want to predict or estimate one variable, Y, from a second variable, X, we use a procedure called "regression." The "regression line" is the linear function we use to make this prediction. If that doesn't make much sense to you at this point, it's OK. We'll spend a lot of time learning this concept.NOTATION: When we talk about predicted or estimated values of variable Y, we generally use some symbol like a Y with a little caret or a little hat on the top of it (^), or we use Y prime (Y'). In this class, we will use Y' because it's a lot easier to type an apostrophe for prime than it is to draw one of those little hats in HTML at this time. But in statistics books you'll see several different notations.In words we'll say "Y prime is equal to a plus bX." In symbols, we'll write Y' = a + bX.Obviously (except for the prime) Y' = a + bX is very similar to the linear function that we just reviewed.When you are predicting or estimating values of Y from X, Y is called the criterion variable, and X is called the predictor variable. The criterion variable is often called the dependent variable. Cigarettes and Health ExampleCho các mục đích của bài thuyết trình của chúng tôi vào ngày hôm nay chúng ta sẽ sử dụng một ví dụ thực hiện lên mà kiểm tra mối quan hệ giữa thuốc lá và sức khỏe. Vì vậy, Y có thể là số lượng các vấn đề sức khỏe của một cá nhân tuổi từ 65 và 70; và X có thể là số lượng thuốc lá Anh ta hoặc cô ấy hút thuốc mỗi ngày từ tuổi 20 đến 50 tuổi. Chúng tôi muốn để dự đoán Y từ X, đó là chúng tôi muốn để ước tính số lượng các vấn đề sức khỏe sau này trong cuộc sống từ số lượng thuốc lá hút thuốc trước đó trong cuộc sống. Trong thống kê biệt ngữ, chúng tôi sẽ tìm thấy dòng hồi qui, Y' = một + bXThuốc và vấn đề sức khỏeĐo lường hoạt động: Dịch cuộc sống vào số lượng thuốc lá hút thuốc cho một ngày. Không có hai phương pháp tổng hợp được sử dụng trong các nghiên cứu như vậy. Trong một nghiên cứu quá khứ, chúng tôi sẽ yêu cầu người tham gia nghiên cứu để xem xét cuộc sống của họ và báo cáo thuốc lá bao nhiêu họ hút thuốc một ngày tuổi từ 20 và 50. Trong một nghiên cứu tương lai, chúng tôi sẽ theo dõi người qua đời sống của họ, yêu cầu họ để ghi lại số lượng thuốc lá họ hút thuốc cho một ngày. Việc nghiên cứu quá khứ có thể được thực hiện trong một vài tháng. Nghiên cứu tương lai sẽ phải mất nhiều năm. Dữ liệu từ một nghiên cứu tương lai là nhiều chất lượng cao hơn bởi vì nó không dựa vào các đối tượng bộ nhớ. Dù bằng cách nào, số lượng thuốc lá hút thuốc một ngày là X; nó sẽ là thay đổi dự báo của chúng tôi.Sau đó chúng tôi sẽ dự đoán số lượng các vấn đề sức khỏe một người tham gia có tuổi từ 65 và 70 từ bao nhiêu họ hút thuốc.Let's say we do a retrospective study. We examine the medical records of participants when they were between 65 and 70 years old, counting the number of health problems they had. Then we give them a questionnaire on how much they've smoked at different times in their life. We want to predict health problems from smoking rate.Regression LineLet's say that we're going to have a tiny little sample, usually there are thousands of people in such studies, but we're just going to have a few so that our calculations will be simple. The data is made up.On the illustration the data is ordered in the table on the left from the lowest X value to the highest; that is, it goes from the least number of cigarettes smoked to the highest. So the person who smoked one per day had three health problems; the person who smoked two packs had ten health problems, and so on.The table contains the data of individual participants and we've measured two things about each of them. In the general scheme of methodology, a regression study is still a correlational study.Next we're going to draw a scatterplot. Each dot on the scatterplot represents the data of one person. Perhaps by now you've got enough experience with scatterplots from studying correlations to know that this scatterplot shows a positive relationship. The more smoking the more health problems. The scatterplot shows a pretty high correlation.We have a scatterplot; but the question is how do we find the linear regression line? How can we draw a line that goes as close as possible to all the points on the graph?

Regression Line

Describing the relationship between X and Y. Little r is one descriptive statistic which can summarize the relationship between cigarettes and health problems in this data. You've studied correlation and know how to calculate r.

There's another descriptive statistic called the regression line. The regression line is the best straight line that we can draw through or between the points on the scatterplot. Obviously a straight line can't connect the all dots because then you'd have to bounce up and down and up and down from one dot to the next, and it wouldn't be a straight line. So we want to be able to draw a single line that comes as close to all the dots as possible.

Least Squares Principle. We'll have to have a criteria for what we mean by "close." The criteria is called the least squares principle. Recall back to our discusssion of Variance. We showed how the variance squares the deviations around the mean. In Regression we will square the deviations around the regression line instead of around the mean. The best fit regression line is the line that has the smallest value for the squared deviations around it, the least squared deviations. That's essentially the whole idea of least squares. But we'll talk about it more later after you are more familiar with t

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

Regression Click vào đây để bắt đầu một thuyết trình Visual Interactive (Plugin buộc) Bấm vào đây để đến Plugin tải về và cắm trang hướng dẫn của chúng tôi hồi quy thống kê mở rộng về mối tương quan để cho phép chúng ta sử dụng các mối quan hệ giữa các biến để đưa ra dự đoán. Họ cung cấp cho chúng tôi với các công cụ để viết phương trình tuyến tính mà có thể được sử dụng để dự đoán giá trị của một biến phụ thuộc hoặc có chỉ tiêu từ giá trị của một hoặc một tập hợp các biến dự báo. Chức năng tuyến tính Trước khi chúng tôi bắt đầu nói về hồi quy chúng ta đầu tiên sẽ nói về chủ đề chung của các hàm tuyến tính mà một số bạn có thể nhớ từ lớp toán trước đó. Chức năng tuyến Mẫu Hình thức tổng quát của một hàm tuyến tính đơn giản là phương trình này (Y = a + bX). Phương trình này mô tả bất kỳ đường thẳng. Độ dốc của đường được đại diện bởi chữ b. Các đánh chặn được đại diện trong phương trình bằng chữ một. Giá trị của các đánh chặn, một, là nơi dòng đi qua các trục Y. By tùy tiện chọn một giá trị cho X và sử dụng công thức này, chúng ta có thể xác định các giá trị của Y và do đó vẽ đường. Tuyến tính Chức năng Ví dụ 1 Trong hộp màu xanh ở dưới cùng của hình minh họa là phương trình của chúng tôi cho một dòng cụ thể, Y = 2X + 1. Từ phương trình này chúng ta có thể xác định cách dòng sẽ xuất hiện trên đồ thị. Tôi sẽ gọi đồ thị này là một "phân tán" vì những lý do mà chúng ta đã thảo luận trong bài thuyết trình về tương quan. Lưu ý: Trong ví dụ này, tôi đã đảo ngược thứ tự của các tham số ở phía bên phải của phương trình, nhưng nó là cơ bản giống nhau. Đó là, (Y = a + bX) là giống như toán học (Y = bX + a). Trong ví dụ này, độ dốc b = 2 và trên trục a = 1. Bước tiếp theo là tùy tiện chọn ít nhất ba X giá trị. Đối với ví dụ này tôi chọn giá trị X = 0, 1, và 2. Sử dụng phương trình (Y = 2x +1) tôi có thể xác định rằng khi X = 0 thì Y = 1. Khi X = 1 thì Y = 3 và khi X = 2 thì Y = 5. Hãy thử những hiểu cho chính mình và có thể thử một số giá trị khác cho X là tốt. Chú ý rằng khi bạn vẽ một đường thẳng đi qua các điểm đường đi qua các trục Y đúng với giá trị 1. Các giá trị "1" là đánh chặn. Chú ý rằng giá trị cao của X cung cấp cho bạn những giá trị cao của Y. Sau đó, chúng tôi sẽ thấy rằng một độ dốc dương tương ứng với các ý tưởng của một mối quan hệ tích cực trong mối tương quan. Chức năng tuyến tính Ví dụ 2 Dưới đây là một ví dụ khác. Đối với dòng này là các phương trình Y = .5x - 1. Phương trình này hơi khác một chút bởi vì giá trị của Y-đánh chặn là -1. Điều đó có nghĩa là các dòng sẽ qua trục Y ở -1. Các hình thức của phương trình của chúng tôi vẫn giữ nguyên (Y = bX + a) nhưng nếu bạn nhớ đại số trường trung học phương trình Y = .5x + (-1) là giống như Y = .5x -1. Hình thức thứ hai này chỉ là một cách đơn giản của việc viết phương trình nếu giá trị của a nhỏ hơn 0. Ở đây tôi tự ý chọn các giá trị X = 0, 1, 2, và 3. Sau đó, sử dụng các phương trình Y = .5x -1, Tôi xác định rằng các giá trị Y kết hợp sẽ là -1. -.5, 0, và 0,5. Tại sao bạn không thử này và kiểm tra tính toán của tôi. Hãy so sánh Ví dụ 1 và Ví dụ 2. ảnh hưởng của việc thay đổi độ dốc từ 2 (trong ví dụ 1-0,5 trong ví dụ 2) là gì? Dòng với giá trị thấp hơn của dốc (0,5) là ít dốc hơn so với dòng với độ dốc cao hơn. Ảnh hưởng của việc thay đổi đánh chặn, một, từ 1 đến -1 là gì? Chú ý rằng các đường cắt trục Y trong mỗi trường hợp chính xác theo giá trị của "một." Tuyến tính Chức năng Ví dụ 3 Đối với dòng này là phương trình Y = -1.5x + 2. Ở đây chúng ta có một giá trị âm (-1.5) cho độ dốc và đánh chặn có giá trị là 2. Tôi chọn các giá trị X = 0, 1, 2 và một lần nữa và sau đó sử dụng các phương trình để xác định các giá trị Y. Hãy so sánh: tác dụng làm cho độ dốc âm là gì? Các đường thẳng dốc theo cách khác. Chú ý rằng giá trị cao của X bây giờ cung cấp cho bạn những giá trị thấp của Y. Điều này đúng khi độ dốc là tiêu cực. Chúng ta sẽ thấy sau đó một dốc âm tương ứng với các ý tưởng của một mối quan hệ tiêu cực trong mối tương quan. Khi Slope (b) = 0 - Không có quan hệ gì dòng như thế nào nếu độ dốc là 0? Nó sẽ luôn luôn trông giống như một đường ngang phẳng. Trong ví dụ này, bất kể giá trị những gì bạn đưa vào cho X, Y sẽ luôn luôn bằng 3. Sau đó, chúng ta sẽ thấy một độ dốc từ 0 tương ứng với ý tưởng không có mối quan hệ trong tương quan. Y = X Y = 0 + 1x hoặc chỉ đơn giản là Y = X là dòng bắt đầu tại xứ (0, 0) và đi lên ở một góc 45 độ. Bằng cách chọn các giá trị X = 0, 1, 2 và sử dụng các công thức bạn có thể xác định rằng các giá trị Y cũng là 0, 1, và 2 tương ứng. Regression Dòng Trong thống kê, khi chúng ta muốn dự đoán hoặc ước tính một biến, Y , từ một biến thứ hai, X, chúng tôi sử dụng một thủ tục gọi là "hồi quy." "Đường hồi quy" là hàm tuyến tính, chúng tôi sử dụng để làm cho dự đoán này. Nếu điều đó không làm cho nhiều ý nghĩa với bạn vào thời điểm này, nó là OK. Chúng tôi sẽ dành nhiều thời gian học tập khái niệm này. Ký hiệu: Khi chúng ta nói về các giá trị dự đoán hoặc ước lượng của biến Y, chúng ta thường sử dụng một số biểu tượng như một Y với một dấu mũ nhỏ hoặc một chiếc mũ nhỏ trên đầu trang của nó (^) , hoặc chúng tôi sử dụng Y Thủ (Y '). Trong lớp học này, chúng ta sẽ sử dụng Y 'bởi vì nó dễ dàng hơn rất nhiều để gõ một dấu nháy đơn cho thủ hơn là để vẽ một trong những chiếc mũ nhỏ trong HTML vào thời điểm này. Nhưng trong thống kê sách bạn sẽ thấy các ký hiệu khác nhau. Nói cách chúng tôi sẽ nói "Y thủ bằng một bX cộng." Trong các biểu tượng, chúng ta sẽ viết Y '= a + bX. Rõ ràng (trừ các nguyên tố) Y' = a + bX là rất tương tự như chức năng tuyến tính mà chúng ta chỉ xem xét. Khi bạn đang dự đoán hoặc ước tính giá trị của Y từ X , Y được gọi là biến tiêu chuẩn, và X được gọi là biến dự đoán. Biến tiêu chí thường được gọi là biến phụ thuộc. Thuốc lá và sức khỏe Ví dụ Với mục đích của bài giảng của chúng tôi hôm nay chúng tôi sẽ sử dụng một ví dụ thực hiện lên trong đó xem xét mối quan hệ giữa thuốc lá và sức khỏe. Vì vậy, Y có thể là số của các vấn đề sức khỏe kinh nghiệm của một cá nhân trong độ tuổi từ 65 và 70; và X có thể là số thuốc lá anh ta hoặc cô hút mỗi ngày từ 20 tuổi cho đến tuổi 50. Chúng tôi muốn dự đoán Y từ X, đó là chúng ta muốn ước tính số lượng của các vấn đề sức khỏe sau này trong cuộc sống từ những số điếu thuốc hút trước đó trong cuộc sống. Trong thuật ngữ thống kê, chúng ta sẽ tìm thấy những dòng hồi quy, Y '= a + bX vấn đề sức khỏe và hút thuốc Operations Đo lường: Dịch một cuộc sống vào số điếu thuốc hút mỗi ngày. Có hai phương pháp chung được sử dụng trong nghiên cứu này. Trong một nghiên cứu hồi cứu, chúng tôi sẽ yêu cầu những người tham gia nghiên cứu để xem xét lại cuộc sống của họ và báo cáo bao nhiêu thuốc lá họ hút mỗi ngày trong độ tuổi từ 20 và 50. Trong một nghiên cứu tương lai, chúng tôi sẽ theo dõi mọi người trên cuộc đời của họ, yêu cầu họ ghi lại các số thuốc lá họ hút thuốc mỗi ngày. Các nghiên cứu hồi cứu có thể được thực hiện trong một vài tháng. Các nghiên cứu tương lai sẽ mất nhiều năm. Các dữ liệu từ một nghiên cứu tiền cứu là chất lượng cao hơn nhiều vì nó không dựa trên bộ nhớ của các đối tượng. Dù bằng cách nào, số lượng điếu thuốc hút mỗi ngày là X; nó sẽ biến dự đoán của chúng tôi. Sau đó, chúng ta sẽ dự đoán số vấn đề sức khỏe một người tham gia có độ tuổi từ 65 và 70 từ họ hút bao nhiêu. Hãy nói rằng chúng tôi làm một nghiên cứu hồi cứu. Chúng tôi xem xét các hồ sơ y tế của người tham gia khi họ là giữa 65 và 70 tuổi, đếm số lượng các vấn đề sức khỏe mà họ đã có. Sau đó, chúng tôi cung cấp cho họ một câu hỏi về bao nhiêu họ đã hút thuốc ở thời điểm khác nhau trong cuộc sống của họ. Chúng tôi muốn dự đoán vấn đề sức khỏe từ tỷ lệ hút thuốc. Regression Dòng Hãy nói rằng chúng ta sẽ có một mẫu nhỏ nhỏ, thường có hàng ngàn người trong các nghiên cứu như vậy, nhưng chúng tôi chỉ cần đi để có một vài tính toán của chúng tôi để sẽ được đơn giản. Các dữ liệu được tạo ra. Trên hình minh họa dữ liệu được đặt hàng trong bảng bên trái từ các giá trị X thấp nhất đến cao nhất; nghĩa là, nó đi từ số lượng ít nhất là điếu thuốc hút nước cao nhất. Vì vậy, những người hút một mỗi ngày có ba vấn đề sức khỏe; người hút hai gói có mười vấn đề sức khỏe, và như vậy. Bảng này chứa các dữ liệu cá nhân tham gia và chúng tôi đã đo hai điều về mỗi người trong số họ. Trong đề án chung về phương pháp luận, một nghiên cứu hồi quy vẫn là một nghiên cứu tương quan. Tiếp theo chúng ta sẽ vẽ một phân tán. Mỗi dấu chấm trên thị phân tán đại diện cho dữ liệu của một người. Có lẽ bây giờ bạn đã có đủ kinh nghiệm với tán xạ từ việc nghiên cứu mối tương quan để biết rằng phân tán này cho thấy một mối quan hệ tích cực. Việc hút thuốc nhiều hơn các vấn đề sức khỏe nhiều hơn. . Các phân tán cho thấy một mối tương quan khá cao Chúng tôi có một thị phân tán; nhưng câu hỏi là làm thế nào để chúng tôi tìm đường hồi quy tuyến tính? Làm thế nào chúng ta có thể vẽ một đường mà đi càng gần càng tốt để tất cả các điểm trên đồ thị? Regression Dòng mô tả mối quan hệ giữa X và Y. nhỏ r là một số liệu thống kê miêu tả nào có thể tóm tắt các mối quan hệ giữa thuốc lá và các vấn đề sức khỏe trong các dữ liệu này. Bạn đã nghiên cứu mối tương quan và biết làm thế nào để tính toán r. Có một thống kê mô tả được gọi là đường hồi quy. Đường hồi quy là đường thẳng tốt nhất mà chúng ta có thể rút ra thông qua hoặc giữa các điểm trên thị phân tán. Rõ ràng là một đường thẳng không thể kết nối tất cả các dấu chấm bởi vì sau đó bạn sẽ phải trả lên và xuống, lên xuống từ một chấm nhỏ phía sau, và nó sẽ không phải là một đường thẳng. Vì vậy, chúng tôi muốn để có thể vẽ một đường duy nhất mà đến như là gần với tất cả các dấu chấm càng tốt. Least squares tắc. Chúng tôi sẽ phải có một tiêu chí cho những gì chúng tôi có nghĩa là do "gần gũi". Các tiêu chí được gọi là nguyên tắc bình phương tối thiểu. Nhớ lại trở lại discusssion của phương sai. Chúng tôi đã cho thấy cách các ô vuông đúng độ lệch xung quanh giá trị trung bình. Trong hồi quy, chúng tôi sẽ vuông lệch xung quanh đường hồi quy thay vì xung quanh giá trị trung bình. Đường hồi quy phù hợp nhất là các dòng có giá trị nhỏ nhất cho các độ lệch bình phương xung quanh nó, các độ lệch bình phương tối thiểu. Đó là về cơ bản toàn bộ ý tưởng của phương tối thiểu. Nhưng chúng ta sẽ nói về nó sau này nhiều hơn sau khi bạn đã quen thuộc hơn với t

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.