PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia ISSN: 1983-9456 (Impressa) ISSN: 2317-0123 (On-line) Editor: Fauze Najib Mattar Sistema de avaliação: Triple Blind Review Idiomas: Português e Inglês Publicação: ABEP – Associação Brasileira de Empresas de Pesquisa Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Precisão de Resultados Numéricos em Pesquisa Quantitativa: Reflexões nos Campos de Ciências Sociais e Comportamentais Submission: Oct./26/2014 - Approval: Mar./7/2015 Francisco José da Costa Dr. Francisco José da Costa has a doctorate in Business Administration from Fundação Getulio Vargas – School of Business Administration of São Paulo/FGV-SP, and a master degree in Business Administration from the State University of Ceará – UECE. He earned both his bachelor’s degree on Business Administration and his bachelor’s degree on Statistics at the Federal University of ParaíbaUFPB. Dr. Costa is a professor at the Department of Administration in the Federal University of Paraíba. E-mail: [email protected] Professional address: Departamento de Administração, Centro de Ciências Sociais Aplicadas – DADM/CCSA, Campus Universitário I/UFPB, Cidade Universitária – 58051-900 - João Pessoa/PB – Brasil. Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa ABSTRACT This article aims to examine the procedures for presentation of results in quantitative research, maintaining focus specifically on rounding practices of numerical results. The methodological approach consisted of computer simulation procedures to assess the level of information to rounding measures of descriptive statistics and estimated parameters of regression models. The results show that for different sample sizes, the first and second digits have levels of information approximate to the true measurement results. The third digit is randomly distributed within the possible values (from 0 to 9, as a result of the natural numbers), except when the sample size is sufficiently large (samples sizes of 5000 or greater). The results are in line with others published internationally, and show that, in researches with samples, the procedures for rounding with more than two decimal places have no useful content information, although it seems to signal more precision. KEYWORDS: Quantitative research, rounding practices, simulation. RESUMO Este artigo tem por objetivo analisar os procedimentos de apresentação de resultados em pesquisas quantitativas, mantendo o foco especificamente nas práticas de arredondamento de resultados numéricos. O procedimento metodológico consistiu na realização de procedimentos de simulação computacional para avaliação de nível de informação de arredondamentos de medidas de estatística descritiva e de parâmetros estimados de modelos de regressão. Os resultados mostram que, para diferentes tamanhos de amostra, o primeiro e o segundo dígitos possuem níveis de informação que se aproximam dos resultados verdadeiros das medidas. Já o terceiro dígito se distribui de forma aleatória nos valores possíveis (de 0 a 9, na sequência dos números naturais), salvo quando o tamanho da amostra é suficientemente grande (acima de 5.000). Os resultados se alinham com outras publicações internacionais e mostram que, em pesquisas com amostras, os procedimentos de arredondamento com mais de duas casas decimais não têm conteúdo informativo útil, embora pareça sinalizar precisão. PALAVRAS-CHAVE: Pesquisa quantitativa, arredondamento, simulação. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 46 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa 1 INTRODUCTION This article aims to reflect on the accuracy of the presentation of numerical results in research in the area of Applied Social Sciences. We analyzed some practices and alternatives of results presentation, and we simulated some rounding procedures of numerical results in quantitative research. The study provides an important contribution to the debate on quantitative research in the field of Applied Social Sciences, which is gradually maturing their methodological procedures with the inclusion of classical quantitative techniques and the latest modern innovations in their researches (WILCOX, 2012; CUMMING, 2013). Considering the publications in the area from the 2000s, it can be said that, in Brazil, application of quantitative method in the field of Social Sciences in general (and particularly in Administration) is still in an early stage, except for the Financial Management and Marketing areas. Indeed, the methodological advances experienced in the field, since the year 2000, included the adoption of greater methodological rigor to qualitative researches, and the incorporation of advanced quantitative methods and tools. Therefore, the condition in which the field of Social and Behavioral Sciences was in the 2010s provided an opportunity to more careful consideration of the procedures implemented with the aim of enabling continuous improvements on research and knowledge generated (BOTELHO; ZOUAIN, 2006; AGUINIS; EDWARDS, 2014; BEDEIAN, 2014). Such reflection motivates analyses and the most diverse possibilities of studies, from the review of the main techniques used to the most commonly employed procedures of fieldwork design, but also measurement and report production strategies (AGUINIS et al. 2011; CORDEIRO et al., 2014). In this study, the focus was more specifically on the exposure of numerical results and the tendency to seek the accuracy of results. To achieve this purpose, some considerations on practices and alternatives of results presentation are exposed in the second item; details of simulation procedures carried out and the results found are in the third item; and, by the end, the main findings of the study and the theoretical and practical recommendations, besides the weaknesses of the study are presented. 2 CONVENTIONAL PROCEDURES Exposure of numerical results with the use of decimal places is something universal in quantitative research, but there is not a universally accepted rule about presentating this type of result. The vast majority of statistical packages adopts a particular pattern and there is no uniformity among software applications, neither there is within each software itself. Let’s take as reference the most used ones: Statistical Package for Social Sciences (SPSS), Microsoft Excel and the free software R. According to exploratory evaluation carried out and to the author's experience, SPSS is the most widely used package in the area of Social Sciences in Brazil, while the R package is the most used in the field of Exact Sciences, but its use is becoming spread, possibly due to higher availability of tools for modeling new problems. Finally, Excel is a less used package, but it is extremely useful as a complementary tool to other types of software. For example, SPSS provides default values of tabulation with two decimal places in most of its outputs; however, the extraction of results of the application of statistical techniques varies with the presentation of results with two and three decimal places. The R package presents results with seven decimal places and it usually uses scientific PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 47 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa notation, with use of the base-10 exponent to very high or very small values. Excel has a standard presentation to six decimal places, and, like SPSS, also adopts the scientific notation for very large or very small values. The alternatives posed by the most used programs for research in the Social Sciences draw attention. In fact, specifically considering SPSS, there seems to be a sign that a maximum of three decimal places is sufficient for a first exposure of results (it is noteworthy that the software allows access to results with more decimal places). The immediate question comes from that: Is the third decimal place enough to present accurate results in research in the field of the Social Sciences? Wouldn’t results with more decimal places show greater accuracy in quantitative research? These questions bring back debates that are advanced in the methodological discussions in the Social and Behavioral Sciences, ongoing and still unfinished discussions and which stress the need to, at least, recognize that the field must take issues of precision and accuracy into account (AGUINIS; EDWARDS, 2014; BEDEIAN, 2014). There is a central conclusion in academia, largely heeded in discussions, that research – in any field – must, first of all, be informative and meet the interest of use of which it is directed, considering all the contextual variables that affect the research results. Considering what guides authors and relevant institutions of international reputation, it is observed that the total of decimal places, more than a minimum number, is an illustration of excess of information rather than accuracy. For example, the Academy of Management (2011), a US institution that publishes the prestigious Academy of Management Journal, states that, in that journal, any numerical results should be presented with no more than two decimal places. This signals the understanding that, probably, numbers in the third decimal place, and beyond, do not bring any informative content, although it appears to be an indication of accuracy. The claim that it has meaning or not would go through a specific assessment of digit behavior in their respective decimal places. That was what Bedeian, Sturman and Streiner (2009) did specifically for the correlation measurements. In their construction, the authors used a simulation device in the generation of 10.000 pairs of variable samples, normally distributed, with a predetermined correlation (of 0.150), and they tested the stability of the first three digits for sample sizes 60, 100, 200, 500, 1.000, 10.000 and 100.000. Obviously, the digits (of the decimal place) of sample correlation of each pair generated may vary, and, on the other hand, being expected that they are concentrated around the known values (that is, for the first digit ‘1’, for the second ‘5’, and for the third, ‘0’). If not, there is the sign that, for that sample size, the presentation of the digit may not have sense and not bring any precision indication. In other words, in sample sizes which the digits do not follow the expected distribution (concentrated around the known value), the presentation of the digits does not bring any additional information, let alone precision. In the case of correlation, the simulation procedures of Bedeian, Sturman and Streiner (2009) showed that: For the first decimal place, there is a concentration of the digits around the known value (1) with samples of size 60 onwards. In samples of size 10.000 and beyond, the known is the only digit that appears, that is, in samples of this size the first digit is always equal to the known one; For the second decimal place, the authors observed that the second digit only focuses around the known value (5) for sample sizes 1.000 onwards, and sample sizes up to 500 showed an uniforme distribution of 10 possible digits (from 0 to 9); As for the third decimal place, the selection by truncation of digits positioned beyond the third PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 48 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa place, even for samples of size 10.000, there is an uniform distribution of possible digits (from 0 to 9); and in the selection by rounding, in this sample size there is only a marginal signaling that the digits are concentrated around the known value (which is 0). The results of Bedeian, Sturman and Streiner (2009) point out that, for correlation measurements, when the sample is small (around 60 units), only the first digit is informative; the others are unstable, with equal probability of any of the 10 possible digits (0-9) emerging. The second decimal place is meaningless for the analysis and provision of information only for sample sizes of 1.000 onwards; for sample sizes of 500 or less, the second digit is unstable and there is equal probability of the emergency of any possible digits (from 0 to 9). Finally, there is evidence that the third decimal place does not bring any informative content (therefore, it does not signal any precision) in sample sizes smaller than 10.000. Considering that there is little research on the Social and Behavioral Sciences regarding sample sizes larger than 10.000, this result suggests that the presentation of correlation results with more than two decimal places means presenting useless information. That is, the position of the Academy of Management, quoted above, seems well justified, at least for correlation measurements. What is left to answer, however, is whether it occurs in other research results as, for example, the results of descriptive statistics, regression parameters, time series, structural equations modeling or techniques applied to experimental procedures (analysis of variance, Kruskal-Wallis test, among others). 3 SIMULATION AND RESULTS This section advances in the simulation analysis from the application of procedures that are similar to those of Bedeian, Sturman and Streiner (2009), although focusing on two categories of measures: descriptive measures and regression analysis measures. The choice for these two techniques results from its frequent usage in Social and Behavioral Sciences research. Procedures and results are detailed in the following two sub-items. 3.1 RESULTS SIMULATION OF DESCRIPTIVE MEASURES To evaluate the stability of decimal places of descriptive measures, an algorithm in the statistical package R (see Appendix A) was developed for a simulation of 2.000 samples of a variable with normal distribution with mean 3 and variance 1. The 2.000 samples were repeated for sample sizes 30, 300, 500, 1.000, 5.000 and 10.000, and these procedures were applied for the extraction of the following measures: mean, variance, Pearson's coefficients of skewness and kurtosis. The choice of sample sizes was based on practices observed in published research, which indicate the most frequent sizes, except for the sizes 5.000 and 10.000, which are rare in research published in the main journals. Also, by using recurrence checking, it was decided to evaluate the position (mean), dispersion (variance) and format (skewness and kurtosis). Thus, in each sample size, the 2.000 estimated values of each of the measures (mean, variance...) were recorded in a matrix, and ‘rounded’ to three decimal places. Aftewards, the decimal places were isolated, and then the number of times that the digits 0 to 9 (in the order of whole numbers) appear at each position for examination of their adhesion to a standard of uniformly distributed quantity was counted. The assumption is that, if the number of digits is distributed evenly among the possible values, each digit is equally probable; that is, any value that may emerge in a result might be randomly assigned to that position. As an illustration, Table 1 presents the number of times that each of the ten PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 49 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa digits appeared in the extraction of the sample variance – in the six sample sizes used. TABLE 1 Results illustration. n = 30 n = 200 2nd 3rd 1st 2nd 0 191 212 632 183 1 176 175 282 198 2 207 228 49 183 3 196 225 7 205 4 208 187 0 209 5 212 184 0 194 6 195 193 0 205 7 219 201 29 208 8 200 199 296 205 9 196 196 705 210 n = 1.000 n = 5.000 DIGIT 1st 2nd 3rd 1st 2nd 0 954 191 239 1.026 411 1 36 177 167 0 320 2 0 204 246 0 159 3 0 206 166 0 101 4 0 193 245 0 34 5 0 205 154 0 36 6 0 198 233 0 99 7 0 205 174 0 196 8 24 215 206 0 308 9 986 206 170 974 336 Source: Simulation carried out according to Appendix A algorithm. DIGIT 1st 277 234 168 146 91 99 166 238 290 291 3rd 217 195 201 166 186 203 219 186 212 215 1st 846 144 6 0 0 0 0 1 103 900 3rd 274 89 286 117 277 140 275 143 263 136 1st 996 0 0 0 0 0 0 0 0 1.004 n = 500 2nd 198 224 207 185 217 206 185 208 164 206 n = 10.000 2nd 523 312 132 25 4 5 28 115 321 535 3rd 225 171 241 181 208 184 204 184 219 183 3rd 320 88 335 93 288 109 267 120 248 132 In the reference variable for simulation (normally distributed with mean 3.000 and variance 1.000), it is known that the skewness is 0.000 and the kurtosis is 3.000. Therefore, for each measure, the true decimal place values are all known and equal to zero. It is understood that, ideally, the sample extraction should generate decimal values that are equal to the actual value, or, for a round, are set in the real value. Obviously, for smaller samples, knowing that the variation of sampled values is higher, it is expected that this does not occur in the same way for all three decimal places; however, on larger samples, it is expected that there is a direct convergence or rounding to the true value. If this does not occur in one of the decimal places, the evidence is that the presentation of that value is not informative. In the variance shown in Table 2, this fact can be clearly observed on the first digit from samples size 30 – being more clear in samples size 200 or larger. It is also observed that, for the second digit, the concentration only occurs in samples size 5.000 onwards and, in the third digit, the evidence is that there is no concentration of sample values around the real ones, not even for the samples size 10.000. In fact, it is observed that the distribution of values in the third digit is approximately uniform in any of the sample sizes and, for the second digit, the same happens for samples up to size 1.000. To perform this analysis, there was a choice for exposing the way within which the values of the digits behave to see if there is a concentration on specific values (close to the actual value), or if they are distributed randomly among the available options (from digit 0 to 9). It was considered, therefore, whether the digits follow a discrete uniform distribution with values between 0 and 9 (the sequence PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 50 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa of whole numbers) by means of a chi-square test for adhesion assessment. Here, only the significance data of the test for each measurement and each sample size are presented. TABLE 2 Results of the tests for regression models. SAMPLE SIZE MEAN 2nd 3rd 30 0.504 0.699 200 0.469 0.169 500 0.060 0.246 1.000 0.000 0.447 5.000 0.000 0.896 10.000 0.000 0.403 SAMPLE SIZE SKEWNESS 1st 2nd 3rd 30 0.055 0.340 0.378 200 0.000 0.925 0.262 500 0.000 0.966 0.048 1.000 0.000 0.793 0.755 5.000 0.000 0.000 0.027 10.000 0.000 0.000 0.839 Source: Simulation carried out according to Appendix A algorithm. 1st 0.000 0.000 0.000 0.000 0.000 0.000 1st 0.000 0.000 0.000 0.000 0.000 0.000 1st 0.000 0.000 0.000 0.000 0.000 0.000 VARIANCE 2nd 0.672 0.860 0.122 0.823 0.000 0.000 KURTOSIS 2nd 0.608 0.713 0.825 0.837 0.435 0.222 3rd 0.147 0.162 0.005 0.000 0.000 0.000 3rd 0.315 0.608 0.367 0.572 0.806 0.614 The results are shown in Table 2 and the reference interpretation is as follows: significance level less than 0.05 indicates that the hypothesis that digits are evenly distributed between 0 and 9 can be refuted (that is, the values 0, 1, ..., 9 are not equally probable as values for that decimal place and there is possibly a concentration around some values). On the other hand, the evidence is that the digits are distributed uniformly (what indicates that any digit is equally probable). Table 2 shows the following evidence: Regarding the mean, the first digit is stabilized around a few values (observing the simulation tables, these are values that, when rounded, approach 0, which is the actual value), even in small samples. The second digit is distributed in a randomized way up to the sample size 500, but in larger samples the digits follow the same behavior of the first digit. Finally, the third digit has a random behavior among the available options (0-9), even in sufficiently large samples; Concerning the variance, the first digit is already concentrated around the true value even in smaller samples, but the second digit will only follow this behavior in larger samples (sizes 5.000 and 10.000). The third digit follows a different behavior, as the distribution of the 2.000 values is no longer uniform in the existing options (0 to 9) from samples of size 500. However, the data observation (Table 1) signals that the non-uniformity is derived not from a concentration near real values by rounding (which is zero), but from an oscillation in the various options. Therefore, the evidence is that, although the result signals non-uniformity, the third digit does not seem to be informative as for the actual known value; For skewness, there is a similar signaling to other measures for the first digit (except for sample size 30, which was slightly close to the uniformity pattern of the distribution of values). The second digit indicated that there is a concentration of values from large samples. The third digit, on the other hand, was kept in standard uniformity in all sample sizes, except for the sample size 5.000 (the verification of the results table indicates behavior similar to that observed in the case of variance); Finally, for the kurtosis, evidence shows that only the first digit is concentrated around the true value, even in smaller samples. The remaining digits are evenly distributed on the options PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 51 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa available (0-9), also in large samples. As a general conclusion, for the mostly used descriptive measures, the presentation of decimal results reaches informational level only in the first digit that stabilizes around values close to the real ones, even in smaller samples. The second digit, in turn, presents an informational level depending on the extent and size of the sample. In general, only larger samples (size 1.000 onwards) generate some sense in the values of the second digit. Finally, the conclusion is that the third digit adds virtually no meaningful information, even for large samples, no matter what the reference measurement is. Thus, we conclude that the presentation of descriptive measures results with more than two decimal places is a false rigor. Therefore, the presentation of results with two decimal places only makes sense, in terms of information content, for samples larger than 1.000, depending on the measure. Accordingly, evidence found is that research results are sufficiently informative when they have only one decimal place, especially in smaller samples. 3.2 SIMULATION OF RESULTS FOR REGRESSION ANALYSIS For the stability evaluation of decimal places of estimators of the regression parameters, an algorithm was again developed in the statistical package R (see Appendix B), with 2.000 simulations for six sample sizes (30, 200, 500, 1.000, 5.0000 and 10.000). Since the estimators of regression have their accuracy associated with the error variance of the model, it is considered the simulation for different levels of variance (indeed, considering the accuracy of the estimators, it is expected that minor variances models will have greater accuracy in decimal values than in models with increasing variance). Therefore, we have decided to adopt a model in which the level of variance is very small in relation to the model (generating models with level of explanation R² - around 99%), and which grows to a very high level of variance (with explanation levels between 50% and 70%). Thus, for simulating, a model in which the explanatory variable (X) would be uniformly distributed in the range 0 and 1 was set, with the dependent variable (Y) equal to 5 plus 10 times the explanatory variable, plus an error term ) normally distributed with expected value equal do zero and with a constant variance. In other words, the true and known model was Y=5+10X+with ²). We simulated 9 standard deviation levels (: 0.01, 0.05, 0.10, 0.50, 1.00, 1.25, 1.50, 1.75 and 2.00. Therefore, at each level of error variance, the 2.000 estimated coefficients were recorded in a matrix, and then rounded to three decimal places. Also in this phase, the decimal places were isolated for analysis of their adhesion to a discrete uniform distribution with values between 0 and 9 (in the sequence of integers) by means of a chi-square test for adhesion analysis similarly to what was done in the first simulation. The results of the p-value of the tests applied in the digits for each sample size and at every level of standard significance level less than 0.05 indicate that the hypothesis that the digits are evenly distributed between 0 and 9 can be refuted; otherwise, the evidence is that the digits are distributed uniformly, what indicates that any digit is equally probable. Taking this interpretation as reference: In small samples (size 30), the first digit of the decimal places is concentrated around values that are rounded to zero (the known real value) to levels of variance of low errors (the model), but for growing levels, this digit becomes random to numbers in the sequence 0 to 9; the second digit PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 52 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa does not show random behavior in very small levels of variation of the errors; and the third digit is distributed uniformly in all of the levels of errors variance; For reasonable sample size (sizes 200 and 500), the first digit quickly stabilizes around a few values, but in samples of size 200 there is a tendency for the randomness of higher levels of error discrepancy; the second digit stabilizes around some values for low variances; and the third digit is concentrated around some values for very small variance levels, yet the trend is randomness; For sample sizes 1.000 and 5.000, the behavior is similar to that observed in sample size 500, with a small signal of stability of the second and third digits around a few values when the levels of error variance are low; Finally, for large samples (size 10.000), it is observed that the third digit only stabilizes around some values when the error variance is small, the same happening with the second digit. In general, the conclusion reached is that the first digit is stabilized around a few values, which are those that, when rounded, go to the actual value, which, in this case, was zero since the real parameter of multiplication of the response variable was 10.000; for space optimization, we chose not to expose the illustrative results as those in Table 1. The second digit exhibits oscillating behavior with concentration around values close to the actual value when the error level of variation is small; but this digit tends to be uniformly distributed between 0 and 9 to higher levels of error variance. Finally, the third digit is randomly distributed between 0 and 9 behavior in most extractions and for higher levels of error variance. This behavior is manifested even for very large samples. TABLE 3 Results of the tests for regression models. ERROR n* = 30 n = 200 DEVIATION 1st 2nd 3rd 1st 2nd 0,01 0.000 0.000 0.947 0.000 0.000 0,05 0.000 0.000 0.328 0.000 0.000 0,10 0.000 0.310 0.720 0.000 0.000 0,50 0.000 0.277 0.738 0.000 0.045 1,00 0.121 0.122 0.086 0.000 0.214 1,25 0.241 0.687 0.295 0.000 0.598 1,50 0.464 0.932 0.458 0.360 0.113 1,75 0.490 0.935 0.802 0.465 0.489 2,00 0.964 0.648 0.253 0.863 0.121 ERROR n = 1.000 n = 5.000 DEVIATION 1st 2nd 3rd 1st 2nd 0,01 0.000 0.000 0.000 0.000 0.000 0,05 0.000 0.000 0.346 0.000 0.000 0,10 0.000 0.000 0.544 0.000 0.000 0,50 0.000 0.718 0.078 0.000 0.000 1,00 0.000 0.896 0.076 0.000 0.738 1,25 0.000 0.485 0.256 0.000 0.075 1,50 0.000 0.234 0.863 0.000 0.624 1,75 0.000 0.414 0.251 0.000 0.343 2,00 0.000 0.108 0.947 0.000 0.580 *n is the sample size. Source: Simulation carried out according to Appendix B algorithm. 3rd 0.000 0.825 0.910 0.060 0.448 0.097 0.208 0.439 0.324 1st 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3rd 0.000 0.000 0.629 0.305 0.966 0.526 0.666 0.373 0.064 1st 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 n = 500 2nd 0.000 0.000 0.000 0.891 0.445 0.364 0.967 0.447 0.487 n = 10.000 2nd 0.000 0.000 0.000 0.000 0.000 0.867 0.823 0.518 0.789 3rd 0.000 0.913 0.883 0.467 0.333 0.623 0.918 0.417 0.423 3rd 0.000 0.000 0.001 0.876 0.658 0.155 0.439 0.385 0.495 PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 53 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa The result confirms that the presentation of results showing more than two decimal places in regression extractions does not indicate any precision, even when samples are large. Even the second decimal place is only stable around the real values when the level of error variance is small and the samples are larger. Finally, the first digit always has increased stability around the real values when large samples are taken, but in smaller samples this digit is also likely to be disposed randomly among the possible digits (0 to 9) if the variability of errors is large. 4 FINAL CONSIDERATIONS In an extensive research developed by Antonakis et al. (2014), the authors concluded that the effort to build high-impact articles (which generate higher levels of citations) necessarily requires a solid methodological construction, and accurate and robust techniques when approaching the classical problems in the area of Social and Behavioral Sciences (presence of extreme values, skewness of the variables, indirect measurement of latent constructs etc.). The authors’ conclusion reinforces the argument commonly observed, namely, that researchers need to strengthen accuracy and precision of their research, even in limited contexts. Nonetheless, the accuracy goal cannot be pursued without a real concern about what is generated in the statistical operation through a software. Possibly, the increased use of statistical packages by researchers, at the same time that it has facilitated the use of techniques and it has perfected the results of research, also brought the potential for defects in the results derived from the simple transfer of software output results to research reports. Indeed, such procedures, when performed without due reflection on the meanings and values, generate a false impression of precision and accuracy, besides false conclusions. This seems to be the problem of the contemporary debate about statistical significance developed into a well advanced discussion, as shown by Curtain and Landis (2011), Kelly and Preacher (2012) and Orlitzky (2012) In this article, there was a pursuit to contribute to this debate; specifically on the issue of results in alignment with studies that focus on specific aspects of quantitative research reports and that has motivated publications in reputable journals in the area of Social and Behavioral Sciences. Results of simulations in this article indicate, therefore, quite specific recommendations, but these add up to many others, to subsequently be aggregated in editorial policies of journals (ACADEMY OF MANAGEMENT, 2011) or more general recommendations of research practices, as the example of the texts of Aguinis and Edwards (2014) and Bedeian (2014). Although the requirements of statistical and operational process simulations have been met, the article has its own limitations of research based on this type of technique. Real data procedures would be possible and recommended as, for example, with the application of bootstrapping procedures for calculating specific measures. The study presented here has brought results for a specific set of measures (descriptive and estimators of regression parameters) converging with findings of Bedeian, Sturman and Streiner (2009) for correlation coefficients. Space limitations prevented the exploration of other measures, it is, therefore, recommended that further studies or exercises evaluate other measures (even by using or improving the algorithm developed for this application). Finally, it is advisable that researchers in the field of Social and Behavioral Sciences in Brazil undertake efforts of analysis of these research and methodological practices, besides the opening of national journals for this type of study, as it has been a tendency in some international studies. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 54 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa Although there are actions in this direction (BARBOZA et al., 2013), it is understood that there is still a large space for the development of this type of research. 5 REFERENCES ACADEMY OF MANAGEMENT. Style guide for authors. Academy of Management Journal, v. 54, n. 5, p. 1081–1084, 2011. AGUINIS, H.; BOYD, B. K.; PIERCE, C. A.; SHORT, J. C. Walking new avenues in management research methods and theories: bridging micro and macro domains. Journal of Management, v. 37, n. 2, p. 395-403, 2011. AGUINIS, H.; EDWARDS, J. R. Methodological wishes for the next decade and how to make wishes come true. Journal of Management Studies, v. 51, n. 1, p. 143-174, 2014. ANTONAKIS, J.; BASTARDOZ, N.; LIU, Y.; SCHRIESHEIM, C. A. What makes articles highly cited? The Leadership Quarterly, v. 25, n. 1, 152-179, 2014. BARBOZA, S. I. S.; CARVALHO, D. L. T.; SOARES NETO, J. B.; COSTA, F. J. Variações de Mensuração pela Escala de Verificação: uma análise com escalas de 5, 7 e 11 pontos. TPA-Teoria e Prática em Administração, v. 3, n. 2, p. 99-120, 2013. BEDEIAN, A. G. More than meets the eye: a guide to interpreting the descriptive statistics and correlation matrices reported in management research. Academy of Management Learning & Education, v. 13, n. 1, p. 121-135, 2014. BEDEIAN, A. G.; STURMAN, M. C.; STREINER, D. L. Decimal dust, significant digits, and the search for stars. Organizational Research Methods, v. 12, n. 4, p. 687-694, 2009. BOTELHO, D.; ZOUAIN, D. M. Pesquisa quantitativa em administração. São Paulo: Atlas, 2006. CORDEIRO, R. A.; SANCHES, P. L. B.; CAVALCANTE, K. O.; PEIXOTO, A. F.; LEITE, J. C. L. Pesquisa quantitativa em finanças: uma análise das técnicas estatísticas utilizadas por artigos científicos publicados em periódicos qualificados no triênio 2007-2009. Revista de Administração da UFSM, v. 7, n. 1, p.117-134, 2014. CORTINA J. M.; LANDIS, R. S. The earth is not round (p=.00). Organizational Research Methods, v. 14, n. 2, p. 332-349, 2011. CUMMING, G. Understanding the new statistics: effect sizes, confidence intervals, and metaanalysis. New York: Routledge, 2013. KELLY, K.; PREACHER, K. J. On effect size. Psychological Methods, v. 17, n. 2, p. 137-152, 2012. ORLITZKY, M. How can significance tests be deinstitutionalized? Organizational Research Methods, v. 15, n. 2, p. 199-229, 2012. WILCOX, R. Modern statistics for the social and behavioral sciences: a practical introduction. Kentucky: CRC Press, 2012. PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 55 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa Appendix A - Algorithm implemented in R package for the simulation of descriptive measures library(moments) t_amostra=c(30, 200, 500, 1000, 5000, 10000) n=t_amostra pvalues=matrix(0,3,9) p_val1=c() p_val2=c() p_val3=c() k=2000 #Number of samples for(j in 1:length(t_amostra)){ M=matrix(0,k,n[j]+6) for (i in 1:k){ set.seed(i+11) M[i,1:n[j]]=rnorm(n[j],3,1) M[i,n[j]+1]=var(M[i,1:n[j]]) M[i,n[j]+2]=round(M[i,n[j]+1], 3) M[i,n[j]+3]=1000*M[i,n[j]+2] M[i,n[j]+4]=floor(M[i,n[j]+3]/100)%%10 M[i,n[j]+5]=floor(M[i,n[j]+3]/10)%%10 M[i,n[j]+6]=floor(M[i,n[j]+3])%%10 } #reference distribution #sample statistic, to be iterated # rounding of decimal numbers #separation of decimals #isolating the first decimal # isolating the second decimal # isolating the third decimal ## ALGORITHM FOR DIGITS MATRIX CONSTRUCTION Freq_dígito=matrix(0, 3, 10) for(l in 1:3){ Freq_observada=c() var=M[, l+n[j]+3] for(i in 1:10){ i1<-i-1 Freq_observada[i]<-length(var[var==i1]) } Freq_dígito[l, 1:10]<-Freq_observada } D=as.data.frame(Freq_dígito); D #ELEMENTS OF THE TABLE OF RESULTS Díg.=seq(0:9)-1 Fr_esp=rep(k/10, 10) Cum_esp=cumsum(c(rep(k/10,10))) Fr_ob_Dig1=t(D)[,1]; Fr_ob_Dig2=t(D)[,2]; Fr_ob_Dig3=t(D)[,3] Cum_ob_Dig1=cumsum(t(D)[,1]); Cum_ob_Dig2=cumsum(t(D)[,2]); Cum_ob_Dig3=cumsum(t(D)[,3]) PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 56 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa # RESULTS FREQUENCY TABLE, CUMULATIVE, EXPECTED AND OBSERVED Tabela=cbind(Díg., Fr_esp, Fr_ob_Dig1, Fr_ob_Dig2,Fr_ob_Dig3, Cum_esp, Cum_ob_Dig1, Cum_ob_Dig2, Cum_ob_Dig3); Tabela # QUANTILE-QUANTILE GRAPHS FOR THE UNIFORM DISTRIBUTION par(mfrow=c(3,2)) plot(Cum_esp, Cum_ob_Dig1, col=2); x=runif(10000, 0, 1000); abline(lm(y~x));hist(M[, 5]) y=x; plot(Cum_esp, Cum_ob_Dig2, col=3); x=runif(10000, 0, 1000); y=x; abline(lm(y~x)); hist(M[, 6]) plot(Cum_esp, Cum_ob_Dig3, col=4); x=runif(10000, 0, 1000); y=x; abline(lm(y~x)); hist(M[, 7]) ## UNIFORM DISTRIBUTION TEST (CHI-SQUARED) #FIRST DIGIT ob1=Tabela[,3] #Frequency observed value es1=Tabela[,2] #Expected quantile value ct1=chisq.test(ob1, p=es1/k) p_val1[j]=ct1$p.value #SECOND DIGIT ob2=Tabela[,4] es2=Tabela[,2] ct2=chisq.test(ob2, p=es2/k) p_val2[j]=ct2$p.value #THIRD DIGIT ob3=Tabela[,5] es3=Tabela[,2] ct3=chisq.test(ob3, p=es3/k) p_val3[j]=ct3$p.value } Tabela # Frequency observed value # Expected quantile value # Frequency observed value # Expected quantile value p_val=round(cbind(t_amostra, p_val1, p_val2, p_val3), 3); p_val PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 57 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa Appendix B - Algorithm implemented in R package for simulation of regressions tamanho<#Sample sizes: 30, 200, 500, 1000, 5000, 10000. k=2000 M=matrix(0, k, 7) Sigma = c(.01,.05,.1, .5, 1, 1.25, 1.5, 1.75, 2) nsigma = length(Sigma) pvalues = matrix(0,3,9) p_val1=c() p_val2=c() p_val3=c() for (j in 1:nsigma) { for (i in 1:k){ num=runif(tamanho) x=sample(num, tamanho, replace=TRUE) y=5+10*x+rnorm(tamanho, 0, Sigma[j]) reg<-lm(y~x) M[i,1:2]=reg$coefficients M[i,3]=round(M[i,2], 3) M[i,4]=1000*M[i,3] M[i,5]=floor(M[i,4]/100)%%10 M[i,6]=floor(M[i,4]/10)%%10 M[i,7]=floor(M[i,4])%%10 } #Reference distribution #Rounding of decimal numbers #Separation of decimals #Isolating the first decimal #Isolating the second decimal #Isolating the third decimal ## ALGORITHM FOR DIGITS MATRIX CONSTRUCTION Freq_dígito=matrix(0, 3, 10) for(l in 1:3){ Freq_observada=c() var=M[, l+4] for(i in 1:10){ i1<-i-1 Freq_observada[i]<-length(var[var==i1]) } Freq_dígito[l, 1:10]<-Freq_observada } D=as.data.frame(Freq_dígito); D # ELEMENTS OF THE TABLE OF RESULTS Díg.=seq(0:9)-1 Fr_esp=rep(k/10, 10) Cum_esp=cumsum(c(rep(k/10,10))) Fr_ob_Dig1=t(D)[,1]; Fr_ob_Dig2=t(D)[,2]; Fr_ob_Dig3=t(D)[,3] Cum_ob_Dig1=cumsum(t(D)[,1]); Cum_ob_Dig2=cumsum(t(D)[,2]); Cum_ob_Dig3=cumsum(t(D)[,3]) PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 58 Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences Francisco José da Costa # RESULTS FREQUENCY TABLE, , CUMULATIVE, EXPECTED AND OBSERVED Tabela=cbind(Díg., Fr_esp, Fr_ob_Dig1, Fr_ob_Dig2,Fr_ob_Dig3, Cum_esp, Cum_ob_Dig1, Cum_ob_Dig2, Cum_ob_Dig3); Tabela # QUANTILE-QUANTILE GRAPHS FOR THE UNIFORM DISTRIBUTION par(mfrow=c(3,2)) plot(Cum_esp, Cum_ob_Dig1, col=2); x=runif(10000, 0, 1000); abline(lm(y~x));hist(M[, 5]) y=x; plot(Cum_esp, Cum_ob_Dig2, col=3); x=runif(10000, 0, 1000); y=x; abline(lm(y~x)); hist(M[, 6]) plot(Cum_esp, Cum_ob_Dig3, col=4); x=runif(10000, 0, 1000); y=x; abline(lm(y~x)); hist(M[, 7]) ## UNIFORM DISTRIBUTION TEST (CHI-SQUARED) # FIRST DIGIT ob1=Tabela[,3] # Frequence observed value es1=Tabela[,2] # Expected quantile value ct1=chisq.test(ob1, p=es1/k) p_val1[j]=ct1$p.value #SECOND DIGIT ob2=Tabela[,4] es2=Tabela[,2] ct2=chisq.test(ob2, p=es2/k) p_val2[j]=ct2$p.value #THIRD DIGIT ob3=Tabela[,5] es3=Tabela[,2] ct3=chisq.test(ob3, p=es3/k) p_val3[j]=ct3$p.value } # Frequence observed value # Expected quantile value # Frequence observed value # Expected quantile value p_val=round(cbind(Sigma, p_val1, p_val2, p_val3), 3); p_val PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V. 16, p. 45-59, abril, 2015 - www.revistapmkt.com.br 59