Andri Faisal R Stat: How to Construct a Panel Dataset from Scratch in R

Managing panel data in RStudio to estimate regression equations and determine the effect of independent variables on dependent variables.

One method for determining the influence of variables is using panel data. This type of influence allows us to estimate the dependent variable. Panel data is aggregated data in the form of

Creating a panel data structure

RStudio differs from other statistical software. To manage any analysis, it requires a data structure in RStudio. While other software simply copy-and-paste spreadsheet data, whether Excel or Google Sheets, to immediately manage the data, RStudio requires converting it into a data model recognized by RStudio.

The steps include importing a spreadsheet file and making some relatively simple adjustments to make your data easier to process. Specifically for panel analysis, the data structure required is pdata.frame, which is short for panel data frame. This differs from a regular data frame in RStudio because it considers both individual and time dimensions. This approach is what makes it different.

On the right, you can click "Import Data Set" and select Excel. There are several other options, such as SPSS, SAS, Stata, Text, and others. If you have a spreadsheet, select Excel.

After that, you will select multiple sheets. If you are working with multiple sheets in one file, you must select one of the sheets. Below that, you can select it. Therefore, you must pay attention to the neatness of your text. For example, if there is a gap between the table title and the data content, the empty table will be marked "NA" (Not Available), meaning the data is not available.

Preparing Excel as Data

To organize data, we can work with data. Because data with a spreadsheet is easier, we can organize it with data, as in the example below.

I uploaded the data in CSV format into RStudio.

tobinq3 <- read.csv2("~/jurnal/tobinq3.csv")

Then I can view the data like this:

View(tobinq3)

The data isn't in a pdataframe format yet, so we do it like this:

ptobinq=pdata.frame(tobinq3,index=c("Comp","Year"),drop.index = TRUE,row.names=TRUE)

The name ptobinq is the name I created to distinguish it from other files. From here, we've transformed the data structure into a panel dataframe. You'll see it look like this:

Classes ‘pdata.frame’ and 'data.frame': 40 obs. of 3 variables:

$ DAR : 'pseries' Named num 0.49 0.44 0.42 0.4 0.4 0.49 0.44 0.42 0.4 0.4 ...

..- attr(*, "names")= chr [1:40] "Adaro-2014" "Adaro-2015" "Adaro-2016" "Adaro-2017" ...

..- attr(*, "index")=Classes ‘pindex’ and 'data.frame': 40 obs. of 2 variables:

.. ..$ Comp : Factor w/ 8 levels "Adaro","ATPK",..: 1 1 1 1 1 2 2 2 2 2 ...

.. ..$ Tahun: Factor w/ 5 levels "2014","2015",..: 1 2 3 4 5 1 2 3 4 5 ...

$ DER : 'pseries' Named num 0.97 0.78 0.72 0.67 0.66 0.97 0.78 0.72 0.67 0.66 ...

..- attr(*, "names")= chr [1:40] "Adaro-2014" "Adaro-2015" "Adaro-2016" "Adaro-2017" ...

..- attr(*, "index")=Classes ‘pindex’ and 'data.frame': 40 obs. of 2 variables:

.. ..$ Comp : Factor w/ 8 levels "Adaro","ATPK",..: 1 1 1 1 1 2 2 2 2 2 ...

.. ..$ Tahun: Factor w/ 5 levels "2014","2015",..: 1 2 3 4 5 1 2 3 4 5 ...

$ Tobin.Q: 'pseries' Named num -0.2702 0.2346 0.2706 0.034 0.0336 ...

..- attr(*, "names")= chr [1:40] "Adaro-2014" "Adaro-2015" "Adaro-2016" "Adaro-2017" ...

..- attr(*, "index")=Classes ‘pindex’ and 'data.frame': 40 obs. of 2 variables:

.. ..$ Comp : Factor w/ 8 levels "Adaro","ATPK",..: 1 1 1 1 1 2 2 2 2 2 ...

.. ..$ Tahun: Factor w/ 5 levels "2014","2015",..: 1 2 3 4 5 1 2 3 4 5 ...

- attr(*, "index")=Classes ‘pindex’ and 'data.frame': 40 obs. of 2 variables:

..$ Comp : Factor w/ 8 levels "Adaro","ATPK",..: 1 1 1 1 1 2 2 2 2 2 ...

..$ Tahun: Factor w/ 5 levels "2014","2015",..: 1 2 3 4 5 1 2 3 4 5 ..

It's clear that the word "dataframe" appears above. Then, there's the company index name and the year. This data is now ready to be converted into panel data analysis.

We can view the data at the top with the head command.

> head(ptobinq)

            DAR  DER     Tobin.Q

Adaro-2014 0.49 0.97 -0.27020301

Adaro-2015 0.44 0.78  0.23455470

Adaro-2016 0.42 0.72  0.27061008

Adaro-2017 0.40 0.67  0.03397098

Adaro-2018 0.40 0.66  0.03363631

ATPK-2014  0.49 0.97  0.30736531

The data appears to be different, so company and year are no longer variables as they are in Excel. With the dataframe model, both the time dimension and the company dimension are taken into account.

Andri Faisal R Stat

Senin, 20 April 2026

How to Construct a Panel Dataset from Scratch in R

Tidak ada komentar:

Posting Komentar

How to Construct a Panel Dataset from Scratch in R

Label