Senin, 20 April 2026

How to Construct a Panel Dataset from Scratch in R

 

Managing panel data in RStudio to estimate regression equations and determine the effect of independent variables on dependent variables.

One method for determining the influence of variables is using panel data. This type of influence allows us to estimate the dependent variable. Panel data is aggregated data in the form of

 

Creating a panel data structure

RStudio differs from other statistical software. To manage any analysis, it requires a data structure in RStudio. While other software simply copy-and-paste spreadsheet data, whether Excel or Google Sheets, to immediately manage the data, RStudio requires converting it into a data model recognized by RStudio.

The steps include importing a spreadsheet file and making some relatively simple adjustments to make your data easier to process. Specifically for panel analysis, the data structure required is pdata.frame, which is short for panel data frame. This differs from a regular data frame in RStudio because it considers both individual and time dimensions. This approach is what makes it different.

On the right, you can click "Import Data Set" and select Excel. There are several other options, such as SPSS, SAS, Stata, Text, and others. If you have a spreadsheet, select Excel.

After that, you will select multiple sheets. If you are working with multiple sheets in one file, you must select one of the sheets. Below that, you can select it. Therefore, you must pay attention to the neatness of your text. For example, if there is a gap between the table title and the data content, the empty table will be marked "NA" (Not Available), meaning the data is not available.

Preparing Excel as Data

To organize data, we can work with data. Because data with a spreadsheet is easier, we can organize it with data, as in the example below.

 




I uploaded the data in CSV format into RStudio.

tobinq3 <- read.csv2("~/jurnal/tobinq3.csv")

Then I can view the data like this:

View(tobinq3)

 

 


The data isn't in a pdataframe format yet, so we do it like this:

 

ptobinq=pdata.frame(tobinq3,index=c("Comp","Year"),drop.index = TRUE,row.names=TRUE)

 

The name ptobinq is the name I created to distinguish it from other files. From here, we've transformed the data structure into a panel dataframe. You'll see it look like this:

 

Classes ‘pdata.frame’ and 'data.frame':      40 obs. of  3 variables:

 $ DAR    : 'pseries' Named num  0.49 0.44 0.42 0.4 0.4 0.49 0.44 0.42 0.4 0.4 ...

  ..- attr(*, "names")= chr [1:40] "Adaro-2014" "Adaro-2015" "Adaro-2016" "Adaro-2017" ...

  ..- attr(*, "index")=Classes ‘pindex’ and 'data.frame':    40 obs. of  2 variables:

  .. ..$ Comp : Factor w/ 8 levels "Adaro","ATPK",..: 1 1 1 1 1 2 2 2 2 2 ...

  .. ..$ Tahun: Factor w/ 5 levels "2014","2015",..: 1 2 3 4 5 1 2 3 4 5 ...

 $ DER    : 'pseries' Named num  0.97 0.78 0.72 0.67 0.66 0.97 0.78 0.72 0.67 0.66 ...

  ..- attr(*, "names")= chr [1:40] "Adaro-2014" "Adaro-2015" "Adaro-2016" "Adaro-2017" ...

  ..- attr(*, "index")=Classes ‘pindex’ and 'data.frame':    40 obs. of  2 variables:

  .. ..$ Comp : Factor w/ 8 levels "Adaro","ATPK",..: 1 1 1 1 1 2 2 2 2 2 ...

  .. ..$ Tahun: Factor w/ 5 levels "2014","2015",..: 1 2 3 4 5 1 2 3 4 5 ...

 $ Tobin.Q: 'pseries' Named num  -0.2702 0.2346 0.2706 0.034 0.0336 ...

  ..- attr(*, "names")= chr [1:40] "Adaro-2014" "Adaro-2015" "Adaro-2016" "Adaro-2017" ...

  ..- attr(*, "index")=Classes ‘pindex’ and 'data.frame':    40 obs. of  2 variables:

  .. ..$ Comp : Factor w/ 8 levels "Adaro","ATPK",..: 1 1 1 1 1 2 2 2 2 2 ...

  .. ..$ Tahun: Factor w/ 5 levels "2014","2015",..: 1 2 3 4 5 1 2 3 4 5 ...

 - attr(*, "index")=Classes ‘pindex’ and 'data.frame':       40 obs. of  2 variables:

  ..$ Comp : Factor w/ 8 levels "Adaro","ATPK",..: 1 1 1 1 1 2 2 2 2 2 ...

  ..$ Tahun: Factor w/ 5 levels "2014","2015",..: 1 2 3 4 5 1 2 3 4 5 ..

 

It's clear that the word "dataframe" appears above. Then, there's the company index name and the year. This data is now ready to be converted into panel data analysis.

We can view the data at the top with the head command.

> head(ptobinq)
            DAR  DER     Tobin.Q
Adaro-2014 0.49 0.97 -0.27020301
Adaro-2015 0.44 0.78  0.23455470
Adaro-2016 0.42 0.72  0.27061008
Adaro-2017 0.40 0.67  0.03397098
Adaro-2018 0.40 0.66  0.03363631
ATPK-2014  0.49 0.97  0.30736531

 

The data appears to be different, so company and year are no longer variables as they are in Excel. With the dataframe model, both the time dimension and the company dimension are taken into account.



Senin, 06 Oktober 2025

SEM PLS with moderation using seminr package

semplscorp

semplscorp

SEM PLS

Structural Equation Model Partial Least Square is a sophistication model to predict the relation of the some variable. Sometimes we are difficult to predict the relation of the some variable.

The Corporate Reputaion Model

we use corporate mode data corporate, but.. in my rstudio software i am difficult to find it. Fortunately, my seminr package version 2.3.2 has corporate_rep_data. the variable of the data similiar so i use it to learn about SEM PLS. According Chidung(n.d), the Semnir package help us to analyze data with SEM PLS mode. First, we should to create measurement model. use construct command to some variable from corporate_rep_data that i got it from the package (seminr). if you have from seminr package, you can use corporate data set that is not different to mine. I create the construct with reflective function.

# Create measurement model: 
library (seminr)

corp_mm <- constructs(
  reflective("COMP", multi_items("comp_", 1:3)),
 reflective("LIKE", multi_items("like_", 1:3)),
  reflective("CUSA", single_item("cusa")),
  reflective("CUSL", multi_items("cusl_", 1:3)))
corp_rep_sm <- relationships(
  paths(from = c("COMP", "LIKE"), to = c("CUSA", "CUSL")),
  paths(from = c("CUSA"), to = c("CUSL")))

After constructing the model, you can create estimate pls. use the estimate_pls function.

my_model <- estimate_pls(data = corp_rep_data2, measurement_model = corp_mm, structural_model = corp_rep_sm, inner_weights = path_weighting, missing = mean_replacement,missing_value = "-99")
Generating the seminr model
All 347 observations are valid.
summary_model <- summary(my_model)

Use the model summary to check the loading factor and reliability. You can also display the plot of the model using plot function.

summary_model$loadings
        COMP  LIKE  CUSA  CUSL
comp_1 0.893 0.000 0.000 0.000
comp_2 0.619 0.000 0.000 0.000
comp_3 0.645 0.000 0.000 0.000
like_1 0.000 0.864 0.000 0.000
like_2 0.000 0.799 0.000 0.000
like_3 0.000 0.733 0.000 0.000
cusa   0.000 0.000 1.000 0.000
cusl_1 0.000 0.000 0.000 0.798
cusl_2 0.000 0.000 0.000 0.879
cusl_3 0.000 0.000 0.000 0.750
summary_model$reliability
     alpha  rhoC   AVE  rhoA
COMP 0.773 0.768 0.532 0.799
LIKE 0.841 0.842 0.641 0.847
CUSA 1.000 1.000 1.000 1.000
CUSL 0.849 0.852 0.658 0.857

Alpha, rhoC, and rhoA should exceed 0.7 while AVE should exceed 0.5
plot(summary_model$reliability)

summary_model$validity$cross_loadings
        COMP  LIKE  CUSA  CUSL
comp_1 0.841 0.638 0.464 0.456
comp_2 0.793 0.475 0.321 0.317
comp_3 0.844 0.528 0.325 0.340
like_1 0.644 0.885 0.542 0.579
like_2 0.536 0.885 0.463 0.568
like_3 0.580 0.842 0.426 0.520
cusa   0.461 0.551 1.000 0.706
cusl_1 0.462 0.604 0.574 0.853
cusl_2 0.418 0.603 0.673 0.925
cusl_3 0.327 0.467 0.608 0.851
summary_model$validity$fl_criteria
      COMP  LIKE  CUSA  CUSL
COMP 0.729     .     .     .
LIKE 0.675 0.800     .     .
CUSA 0.461 0.551 1.000     .
CUSL 0.461 0.639 0.706 0.811

FL Criteria table reports square root of AVE on the diagonal and construct correlations on the lower triangle.
summary_model$validity$htmt
      COMP  LIKE  CUSA CUSL
COMP     .     .     .    .
LIKE 0.817     .     .    .
CUSA 0.507 0.598     .    .
CUSL 0.551 0.752 0.765    .

Then, we draw the plot of the models. with the plot, we know the relation between variable.

plot(my_model)

After see the model you can use the bootstrap model. Bootstrap is a method to know the suit model of the pls model that we have examine before. it is a procedural step in sem pls analysis.

# Store the summary of the bootstrapped model: 
boot_model_htmt <- bootstrap_model(seminr_model = my_model, nboot = 1000)
Bootstrapping model using seminr...
SEMinR Model successfully bootstrapped
sum_boot_model_htmt <- summary(boot_model_htmt, alpha = 0.10)
sum_boot_model_htmt$bootstrapped_HTMT
               Original Est. Bootstrap Mean Bootstrap SD T Stat. 5% CI 95% CI
COMP  ->  LIKE         0.817          0.817        0.035  23.621 0.760  0.873
COMP  ->  CUSA         0.507          0.506        0.057   8.949 0.407  0.595
COMP  ->  CUSL         0.551          0.549        0.060   9.221 0.450  0.648
LIKE  ->  CUSA         0.598          0.599        0.040  14.951 0.528  0.661
LIKE  ->  CUSL         0.752          0.753        0.036  20.711 0.693  0.811
CUSA  ->  CUSL         0.765          0.766        0.032  23.889 0.713  0.818

After the bootstrap we can see the reliability such as alpha, rhoc, AVE, and Rhoa. The reliability model must more than 0,5.

We also look the HTMT test show the result is good. The upper 95% CI shows that the value is not more or equal than the 0,9.

HTMT Test

After the bootstrap we also run the HTMT Test to make sure the model good. all the 95% CI Value is not more than 0,9, meaning the model is good.

How to Construct a Panel Dataset from Scratch in R

  Managing panel data in RStudio to estimate regression equations and determine the effect of independent variables on dependent variables. ...