Assessing Item Validity and Reliability of Shariah Compliant Gold Investment (SCGI) Instrument Using Rasch Measurement Model

Abstract: This paper describes the process of assessing the validity and reliability of a new instrument namely Shariah Compliant Gold Investment (SCGI). The instrument consists of 33 items that are embedded in three dimensions and was administered to 27 Malaysian investors and investment institutions. The Rasch model was used to examine the validity of items by two criteria; (1) point measure correlation (PTMEA CORR) and (2) fit statistics (infit/outfit MNSQ and z-std). The findings indicated that the reliability index for the respondents and items are high with (r=0.91) and (r=0.81) respectively with Cronbach alpha 0.93. At the same time, the item separation is 2.07 while the person separation is a value of 3.15. From the aspect of the item polarity, most of the item contributed to the measurement as all of the PTMEA CORR values are positive values between +0.44logit to +1.66logit except for the A03 item (0.17logit). The fit item testing indicated that the value of the sum of the mean of infit MNSQ and SD was between +0.68logit to +1.30logit. Only one item, A03, falls in the range of elimination due to negative value of PTMEA CORR and z-std>2.0. The results suggested the item to be removed, retaining the balance 32 items.

Assessing Item Validity and Reliability of Shariah Compliant Gold Investment (SCGI) Instrument Using Rasch Measurement Model

Najahudin Lateh, Ghafarullahhuddin Din, Siti Noorbiah Rejab & Amal Hayati Ishak

Keywords: gold investment, rasch model, shariah compliant

  1. Introduction

A shariah-compliant gold transaction has been authentically justified in a few hadith, among them narrated by ‘Ubadah ibn al-Samit in which the Prophet Muhammad SAW said: “Gold (exchanged) with gold, silver for silver, wheat for wheat, barley and barley, salt with salt, and they should be of equal weight scales, and shall be submitted in its entirety. If the types of goods exchanged are different, then sell without delay and submit the goods directly.” (Muslim, 2010). The hadith pointed two conditions for a shariah-compliant gold investment; cash and on-the-spot transaction (al-Sharbini, 1978; al-Saddam, 2006).

In Malaysia, bothcriteria have been gazette as “Gold Investment Parameter”, endorsed by the National Fatwa Association. It functions as guidance for investors as well as investment institutions. However, the parameters are too general. This has urged the Shariah Advisory Council of Malaysian financial institution to call for the parameters to be reviewed (Jakim, 2012). Recently, Najahudin et al. (2014) propose Shariah Compliant Gold Investment (SCGI) as a new guidance. Thus, this research aims to evaluate the validity and reliability of the SCGI via Rasch Measurement Model. The Rasch analyses will be focused on the interpretation of the data reliability, item polarity, fit statistics and the persons-items distribution map.

  • Shariah Compliant Gold Investment (SCGI)

The SCGI has been developed meticulously through systematic procedures involving relevant experts (Najahudin et al. (2014). It is more specific and consists of three dimensions; (i) investor and investment institutions; (ii) products and prices; and (iii) the contracts offered. These three dimensions and a total of 33 items have been unanimously agreed by 13 experts via two rounds of Delphi technique. Each round was implemented using a questionnaire with 4-Point Likert Scale; (1) strongly disagree, (2) disagree, (3) agree, and (4) strongly agree. Data collected from each round were analyzed using the Statistical Package for Social Science (SPSS) in order to attain the agreed dimensions and items. In the second round, the expert consensus had successfully obtained.All items indicated that the consensus were in the interquartile range of (IQR)=0 or 1, median=4.00 and mode=4, above the 95 percentage and the median frequency distribution of 3.8 (Green, 1981). The items which has been agreed upon is shown in Table 1.

Table 1. Dimension dan Items of SCGI.

DimensionsNumber of ItemsTotal
Investor and investment institutionsA01, A02, A03, A04, A055 items
Product and pricesB01, B02, B03, B04, B05, B06, B07, B08, B09, B10, B11, B12, B13, B14, B15, B16, B1717 items
Contract dealC01, C02, C03, C04, C05, C06, C07, C08, C09, C10, C1111 items
  • Method
    • Source of Data

For the purpose of validating 33 items of SCGI, the researcher organizes a special seminar on 4 April 2015. The seminar attracted 27 participants. Fortunately, all of them are gold investors. Prior to administering the SCGI, the researcher thoroughly explained the dimensions and items to ensure the respondents’ understanding correspond to the researcher’s. At the end of the seminar, 27 valid responses were collected.

  • Rasch Measurement Model

The Rasch model is a measurement on the probability of interaction between the person and the item. Each person will be categorized based on their temporary skills whereas the items are categorized based on their difficulty. The Rasch model was formed by taking into consideration the ability of the person answering the questionnaire or the instrument and the difficulty posed by each of the question or the item. The ability of the person and the difficulty of the item was shown in the form of logits through the transformation of ordinal data into ratio measurements. This model would be able to predict the pattern of the response based on the different ability of each of the person and the difficulty of each of the item itself. (Rasch, 1980).The probability to succeed would depend on the difference of the ability of the respondent and the difficulty of the item itself. According to the Rasch model, (i) a smarter person would have a bigger probability to agree with the items; and (ii) items that are less difficult would have a higher probability to be agreed by all of the respondents (Bond & Fox, 2007).

The Rasch model is able to provide the accuracy of the validity and reliability as it focuses on the person and the item. Moreover, this model would be able to show which of the item or construct would fit, misfit, requires further research or eliminated (Azrilah, 2010) based on the established rating scale. This study utilizes the rating scale for the statistic data of the Rasch model as shown in Table 2.

Table 2.Rating Scale Instrument Quality Criteria using Rasch Model.

CriteriaStatistical InfoResults
Item Validitya. Item PolarityPTMEA CORR > 0.4 – 0.8 (Linacre, 2011; Azrilah, 2010)
Itemb. Item FitTotal MNSQinfit and outfit of 0.5 – 1.5 (Linacre, 2011; Linacre, 2002)
Item Misfitc. Separation d. Person Reliability e. Item ReliabilityAll items show ≥ 2.0 (Linacre, 2011, Fisher, 2007) Value> 0.8 (Bond & Fox 2007) Value> 0.8 (Bond & Fox 2007)

Source: Linacre (2011); Azrilah (2010); Bond & Fox (2007); Fisher (2007); Linacre (2002); Wright & Stone (1979).

  • Data analysis

The data were analyzed using the Rasch analysis software, the WINSTEPS 3.72.3. Rasch predicts the probability of a person to evaluate item, and the probability for each item to be evaluated by a person. In Rasch Measurement Model, the validity of the instrument could be identified through several major analysis such as the item polarity, person-item fit, person-item misfit, the person-items distribution map, person-item separation, unidimensionality and scale calibration (Rasch, 1980; Bond & Fox, 2007; Linacre, 2011). Though, this study only reports on the reliability value, item polarity, fit statistics and person-items distribution map (PIDM). Figure 1 summarizes the types of analyses performed.

Shariah-Compliant Gold Investment (SCGI) Instrument
Dimension A: Investor and investment institutions (5 items)
Dimension B: Product and prices (17 items)
Dimension C: Contract deal (11 items)
Summary Statistics: Realibility (Person and item) Cronbach alpha Separation (Person and item)
Fit Statistics: PTMEA CORRinfit/outfit MNSQinfit/outfit z-std (Person fit, item fit, misfit responded and misfit items)
PIDM: Mean, maximum, minimum logit (person and item) Separation(person / item)Logit scale ruler

Figure 1.Analysis and validation process.

  • Results and discussion
    • Realibility

Reliability is the index that indicates the consistency of the position of the person and item in the logit scale. The person reliability index shows the consistency of the position of the respondent when given with another set of items that measures the same construct. Whereas the item reliability index showed the consistency of the set of items when answered by a different respondent who have similar abilities. The coefficient value that is closest to 1.00 denotes a high reliability (Nunnally& Bernstein, 1994).

According to Bond & Fox (2007) and Linarce (2011), the reliability of a person which exceeds the 0.80 (≥0.80) value indicates a strong acceptance towards the respondent or the item.Whereas Fisher (2007) divided the rating scale for the reliability of a person and item into “poor” (<0.67), “fair” (0.67 – 0.80), “good” (0.81 – 0.90), “very good” (0.91 – 0.94) and “excellent” (>0.94). The accepted separation value for a person and item must be at least 2.0 (≥ 2.0) (Linarce, 2011; Fisher, 2007).

Based on Figure 2, the summary statistic displays acceptable person and item reliability values.  On top of that, the Cronbach-α of 0.93 is good, indicating the instrument is a valid measurement and capable of identifying the level of shariah-compliance of gold investment products. The reliability of the item recorded a value of 0.81, which indicates that there are sufficient items to measure what need to be measured (Nunnally & Bernstein, 1994).

The respondent reliability index is 0.91, indicating a strong probability of the items to measure the same goods when given to another similar respondent (Azrilah, 2010). In addition, the separation value for respondent and item were 3.15 and 2.07, respectively. A value of ≥ 2.0 is good, indicating the SCGI ability to segregate respondent ability and item difficulty.

Figure 2.Person and item reliability coefficients.

  • Polarity of the Item

The item polarity is a precondition that must be referred to by reviewing the point measure correlation (PTMEA CORR) coefficient. Items are assumed as able to differentiate the ability of the respondents when the PTMEA CORR values are high. The value must be positive to indicate the item is moving in parallel (Bond & Fox, 2007). When the PTMEA CORR values are negative or zero, this indicates that the response of a person or item conflicts with the variables constructed (Linacre, 2011), an inverse direction of measurement and an uncommon decision making variable (Azrilah, 2010). Nunnally & Bernstein (1994) and Finlayson (2009), both believed that the PTMEA CORR item value of at least +0.30logit would be able to measure a construct systematically, whereas a value of +0.32logit would be able to merely measure in an average manner. However, this study uses the value between +0.4logit and +0.8logit (0.4 < x < 0.8) in order to prove that the constructed items would be able to be measured and to also be able to differentiate the respondents (Linacre, 2011; Fisher, 2007; Azrilah, 2010).

Figure 3 shows that all the items had positive PTMEA CORR values and small mean error measurement of SE (+0.39logit), except item A03, which reported a negative value of -0.17logit (SE=+0.76logit). This particular item was considered to be eliminated, as it did not measure what must be measured (Azrilah, 2010). Most of the values of the other items are between the values of +0.42logit to +0.77logit, except for 2 items that are outside the specified range that is A01 (+0.77logit) and A02 (+0.15logit). However, both items were retained, based on their acceptable infit MNSQ (+1.49logit and +1.48logit respectively) and z-std (1.7 and 0.6 respectively).

Figure 3.Item Point Measure Correlation.

  • Fit statistics

The Rasch model provides fit statistics to detect item or person misfit. The fit statistics refer to; (i) infit and outfit mean square (MNSQ); and (ii) infit and outfit standardized (z-std); for both person and items. MNSQ is the ratio of an observation compared to the expectation. The ideal value for MNSQ is 1, when the observation correspondsthe expectation. The MNSQ value is excluded from the expectation when the total mean value of the MNSQ infit and the SD (mean iMNSQ +/- SD) is out of the specified range.

According to Bond & Fox (2007), the values of the MNSQ infit and outfit for each person and items for the likert scale must be between +0.6logitto +1.4logit. Fisher (2007) established that the fit item has a fair scale of within +0.34logit to +2.9logit, whereas a good scale has a value of within +0.50logit to +2.0logit. However, this study utilizes the range of values recognized by Linacre (2002) in which the values between +0.5logitto +1.5logit (0.5 < y < 1.5) in order to verify the fit and misfit for a person or an item. Usually the outfit would be more sensitive to the response compared to the infit (Linacre, 2002). The detection of the items that are misfit or outlier can be further confirmed with the z-std values that must be between the range values of -2.0 to 2.00 (-2.0 < z <+2.0). The ideal value for z-std would be 1.0 (Azrilah, 2010). The person or items that does not fulfil the criteria range will be considered to be eliminated, except if the PTMEA CORR values for the person and item is between +0.4logit and +0.8logit(0.4 < x < 0.8).

This study will focus on the fit item compared to the fit person. The fit item here means that the given index has an item function and is able to measure the latent trait required. Misfit occurs when (i) the item does not measure the desired traits; (ii) the items are too difficult or too simple for the person; (iii) or there was an unstable response from the person. Figure 3 shows the sum of the MNSQ infit mean and (+/-) SD (0.99logit +(-) 0.31logit) are among the values between +0.68logitto +1.30logit, which is at an acceptable range of (0.5 < y < 1.5).

All of the items were accepted except for item A03 as it was outside the acceptable range of z-std (outfit 2.90) and has a negative PTMEA CORR value of (-0.17logit). Even though the infit MNSQ of item B14 (+1.66logit), C06 (+1.51logit) and B09 (+1.51logit) were beyond the acceptable range (0.5 < y < 1.5), allof them were accepted as their z-std outfit were within the acceptable range (B14=1.8; C06=1.7; B09=1.4). They were also measuring in the right direction as the PTMEA CORR values were positive (B14=+0.42; C06=+0.56; B09=+0.55). Therefore, all of the items (n=33) were retained except item A03.

The index for the statistics analysis after the A03 item was eliminated is as shown in Table 3. Overall, the findings showed that the instrument has a fair item reliability (+0.75logit), mean infit MNSQ (+1.02logit) and mean outfit z-std(0.00).

  Table 3.Summary statistic after removal of misfit items.

StatisticsMeasures (logits)
Before Item RemovalAfter Item Removal
Mean measurePerson+2.77+2.70
Mean infit MNSQPerson+1.02+1.02
Mean outfit MNSQPerson+1.21+0.96
Mean infit z-stdPerson0.000.00
Mean outfit z-stdPerson-0.10-0.10
  • Persons-Items Distribution Map (PIDM)

PIDM is the heart of the Rasch model analysis, which shows the hierarchical relationship of the ability of the person and the difficulty of the item (Bond & Fox, 2007). The person with a higher ability and a more difficult item is placed at the top, whereas a person with a lower ability and an easy item is placed at the bottom. Based on Figure 2, the mean value for the evaluation of a person is +2.77logit and for the item is 0.00logit. The minimum value for a person is +0.23logit whereas the maximum value is +5.08logit. The minimum value for the item is +2.90logit whereas its maximum value is -3.19logit. This makes the total ruler length of a person to be 5.31logit against the item value of 6.09logit. The gap that is lacking between the scale of the person compared to the item to be measured is about 0.78logit(6.09 – 5.31). This hierarchical value is shown in Figure 4.

Figure 4.Hierarchy of relationship.

The PIDM above shows the ability of the item to separate the respondents into three categories that is the person free item, the person above the mean and the person below the mean. The items were divided into two categories, that is difficult and easy with the item mean (0.00logit) as the separation line. This division is aligned with the data separation of the person (3.15) and the item (2.07) as shown in Table 3. Out of the 27 respondents, Group 1 (excellent) contains 12 people that are located within the maximum item location at a range of values between +2.90logit to +5.08logit. Whereas the Group 2 (good) has 7 people located within below, at a range of +2.90logit to +2.07logit and the rest of the respondents are in Group 3 (mediocre) as they are within the range of below the +2.07logit to an item mean value of 0.00logit.

The map proves most of the respondents are item-free (person free item). More items are required to measure them. The respondents have high evaluation ratings, and had no problem to agree with most of items in the instrument. The item only measures the person in Group 2 and 3, whereas there was no complicated item to be used to measure the people in Group 1. Most of the items are easy and were below the respondent mean (+2.77logit). There were no respondents under the mean item (0.00logit). This isaligned with the view of Bond & Fox (2007) in which an easier item is more likely to be agreed upon by all of the respondents.

The easiest item to be agreed by most of the respondents was the A03 (-3.19logit) and the most difficult item to be agreed together was the A05 (+2.90logit). There are also large voids in two places that are between the items A01 and A05, and also the items B01 and A03. This made the item reliability to be at a value of 0.81.

  • Conclusion

The analysis of the Rasch model has proven that SCGI can be accepted and has a high reliability (person = 91; item = 81). Nevertheless, most of the items are too simple and easy compared to the high evaluation done by the respondents. There were no items too difficult enough to be used to measure most of the respondents. Out of the original 33 items, 1 of them was a misfit and was required to be eliminated in order to obtain a valid instrument under the Rasch model. The item A03 item (infit MNSQ=+1.27logit; z-std=2.9; PTMEA CORR= -0.17logit) was identified as a misfit and had to be eliminated, as it did not fulfil the validity requirements. The rest of the items were maintained due to their high validity characteristics (positive PTMEA CORR values;infit MNSQ=+0.44logit to +1.66logit; and z-std> -/+2). Therefore, the final instrument contained only 32 items.


The authors wish to thank the Ministry of Education Malaysia for providing the funding under research scholarship, and the Academy of Contemporary Islamic Studies, UniversitiTeknologi MARA (UiTM) for supporting this research.


Al-Sharbini, M. K. (1978). Mughni al-muhtaj ila ma’rifat ma’ani alfaz al-minhaj. Beirut: Dar al-Fikr.

Azrilah, A.A (2010).Rasch measurement fundamentals: Scale construct and measurement structure. Kuala Lumpur: Integrated Advance Publishing.

Bond, T. G. & Fox, C. M. (2007).Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). New Jersey: Lawrence Erlbaum Publishers.

Finlayson, M. L., Peterson, E. W., Fujimoto, K. A. and Plow, M. A. (2009).Raschvalidation of the falls prevention strategies survey.Journal Archives of Physical Medicine and Rehabilitation, 90 (2), 2039–2046.

Fisher, W. P.  (2007). Rating scale instrument quality criteria.Rasch measurement transactions, 21:1, 1095.

Green, P. J. (1981). The content of a college-level outdoor leadership course for land-based outdoor pursuits in the Pacific Northwest: A Delphi consensus. Oregon: University of Oregon.

Jakim, J.K. (2012). Summary Discussion Syariah Advisory Council Members Financial Institutions in Malaysia was the 8th. From

Linacre J. M. (2002). What do infit and outfit, mean-square and standardized mean?.Rasch measurement transactions, 16 (2), 878.

Linacre, J.M.  (2011).  Winsteps® Raschmeasurement  computer  program  user’s  guide.  Beaverton,  Oregon:

Muslim, H. Q. (2010). Al-musnad al-sahih al-mukhtasar bi naql al-‘adl ila rasulallah. Beirut: Dar Ihya’ al-Turath al-‘Arabi.

Najahudin, L., Ghafarullahhuddin, D., Rahimi, O., Ezani, Y., Noorbiah, R. (2014). Application of the Delphi Technique in the formation of Shariah-Compliant Gold Investment (SCGI), 2nd International Halal Conference. Istanbul, Turkey, paper #70.

Nunnally J.C. & Bernstein I.H. (1994).Psychometric theory (3rd ed.). New York: McGraw Hill.

Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: TheUniversity of Chicago Press.

Saddam A. A. (2006).  Bay’ al-dhahab wa al-fiddah wa tatbiqatu al-mua’asirah fi al-fiqh al-Islami. Beirut : Dar al-Nafa’is.

Wright, B. D. & Stone, M. H. (2004).Making measures. Chicago: Phaneron Press.