PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia
ISSN: 1983-9456 (Impressa)
ISSN: 2317-0123 (On-line)
Editor: Fauze Najib Mattar
Sistema de avaliação: Triple Blind Review
Idiomas: Português e Inglês
Publicação: ABEP – Associação Brasileira de Empresas de Pesquisa
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social
and Behavioral Sciences
Precisão de Resultados Numéricos em Pesquisa Quantitativa: Reflexões nos Campos de
Ciências Sociais e Comportamentais
Submission: Oct./26/2014 - Approval: Mar./7/2015
Francisco José da Costa
Dr. Francisco José da Costa has a doctorate in Business Administration from Fundação Getulio
Vargas – School of Business Administration of São Paulo/FGV-SP, and a master degree in Business
Administration from the State University of Ceará – UECE. He earned both his bachelor’s degree on
Business Administration and his bachelor’s degree on Statistics at the Federal University of ParaíbaUFPB. Dr. Costa is a professor at the Department of Administration in the Federal University of
Paraíba.
E-mail: [email protected]
Professional address: Departamento de Administração, Centro de Ciências Sociais Aplicadas –
DADM/CCSA, Campus Universitário I/UFPB, Cidade Universitária – 58051-900 - João Pessoa/PB
– Brasil.
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
ABSTRACT
This article aims to examine the procedures for presentation of results in quantitative research,
maintaining focus specifically on rounding practices of numerical results. The methodological
approach consisted of computer simulation procedures to assess the level of information to rounding
measures of descriptive statistics and estimated parameters of regression models. The results show
that for different sample sizes, the first and second digits have levels of information approximate to
the true measurement results. The third digit is randomly distributed within the possible values (from
0 to 9, as a result of the natural numbers), except when the sample size is sufficiently large (samples
sizes of 5000 or greater). The results are in line with others published internationally, and show that,
in researches with samples, the procedures for rounding with more than two decimal places have no
useful content information, although it seems to signal more precision.
KEYWORDS:
Quantitative research, rounding practices, simulation.
RESUMO
Este artigo tem por objetivo analisar os procedimentos de apresentação de resultados em pesquisas
quantitativas, mantendo o foco especificamente nas práticas de arredondamento de resultados
numéricos. O procedimento metodológico consistiu na realização de procedimentos de simulação
computacional para avaliação de nível de informação de arredondamentos de medidas de estatística
descritiva e de parâmetros estimados de modelos de regressão. Os resultados mostram que, para
diferentes tamanhos de amostra, o primeiro e o segundo dígitos possuem níveis de informação que se
aproximam dos resultados verdadeiros das medidas. Já o terceiro dígito se distribui de forma aleatória
nos valores possíveis (de 0 a 9, na sequência dos números naturais), salvo quando o tamanho da
amostra é suficientemente grande (acima de 5.000). Os resultados se alinham com outras publicações
internacionais e mostram que, em pesquisas com amostras, os procedimentos de arredondamento com
mais de duas casas decimais não têm conteúdo informativo útil, embora pareça sinalizar precisão.
PALAVRAS-CHAVE:
Pesquisa quantitativa, arredondamento, simulação.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
46
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
1
INTRODUCTION
This article aims to reflect on the accuracy of the presentation of numerical results in research in the
area of Applied Social Sciences. We analyzed some practices and alternatives of results presentation,
and we simulated some rounding procedures of numerical results in quantitative research.
The study provides an important contribution to the debate on quantitative research in the field of
Applied Social Sciences, which is gradually maturing their methodological procedures with the
inclusion of classical quantitative techniques and the latest modern innovations in their researches
(WILCOX, 2012; CUMMING, 2013). Considering the publications in the area from the 2000s, it can
be said that, in Brazil, application of quantitative method in the field of Social Sciences in general
(and particularly in Administration) is still in an early stage, except for the Financial Management
and Marketing areas.
Indeed, the methodological advances experienced in the field, since the year 2000, included the
adoption of greater methodological rigor to qualitative researches, and the incorporation of advanced
quantitative methods and tools. Therefore, the condition in which the field of Social and Behavioral
Sciences was in the 2010s provided an opportunity to more careful consideration of the procedures
implemented with the aim of enabling continuous improvements on research and knowledge
generated (BOTELHO; ZOUAIN, 2006; AGUINIS; EDWARDS, 2014; BEDEIAN, 2014).
Such reflection motivates analyses and the most diverse possibilities of studies, from the review of
the main techniques used to the most commonly employed procedures of fieldwork design, but also
measurement and report production strategies (AGUINIS et al. 2011; CORDEIRO et al., 2014). In
this study, the focus was more specifically on the exposure of numerical results and the tendency to
seek the accuracy of results.
To achieve this purpose, some considerations on practices and alternatives of results presentation are
exposed in the second item; details of simulation procedures carried out and the results found are in
the third item; and, by the end, the main findings of the study and the theoretical and practical
recommendations, besides the weaknesses of the study are presented.
2
CONVENTIONAL PROCEDURES
Exposure of numerical results with the use of decimal places is something universal in quantitative
research, but there is not a universally accepted rule about presentating this type of result. The vast
majority of statistical packages adopts a particular pattern and there is no uniformity among software
applications, neither there is within each software itself. Let’s take as reference the most used ones:
Statistical Package for Social Sciences (SPSS), Microsoft Excel and the free software R.
According to exploratory evaluation carried out and to the author's experience, SPSS is the most
widely used package in the area of Social Sciences in Brazil, while the R package is the most used in
the field of Exact Sciences, but its use is becoming spread, possibly due to higher availability of tools
for modeling new problems. Finally, Excel is a less used package, but it is extremely useful as a
complementary tool to other types of software. For example, SPSS provides default values of
tabulation with two decimal places in most of its outputs; however, the extraction of results of the
application of statistical techniques varies with the presentation of results with two and three decimal
places. The R package presents results with seven decimal places and it usually uses scientific
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
47
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
notation, with use of the base-10 exponent to very high or very small values. Excel has a standard
presentation to six decimal places, and, like SPSS, also adopts the scientific notation for very large
or very small values.
The alternatives posed by the most used programs for research in the Social Sciences draw attention.
In fact, specifically considering SPSS, there seems to be a sign that a maximum of three decimal
places is sufficient for a first exposure of results (it is noteworthy that the software allows access to
results with more decimal places). The immediate question comes from that: Is the third decimal
place enough to present accurate results in research in the field of the Social Sciences? Wouldn’t
results with more decimal places show greater accuracy in quantitative research?
These questions bring back debates that are advanced in the methodological discussions in the Social
and Behavioral Sciences, ongoing and still unfinished discussions and which stress the need to, at
least, recognize that the field must take issues of precision and accuracy into account (AGUINIS;
EDWARDS, 2014; BEDEIAN, 2014). There is a central conclusion in academia, largely heeded in
discussions, that research – in any field – must, first of all, be informative and meet the interest of use
of which it is directed, considering all the contextual variables that affect the research results.
Considering what guides authors and relevant institutions of international reputation, it is observed
that the total of decimal places, more than a minimum number, is an illustration of excess of
information rather than accuracy. For example, the Academy of Management (2011), a US institution
that publishes the prestigious Academy of Management Journal, states that, in that journal, any
numerical results should be presented with no more than two decimal places. This signals the
understanding that, probably, numbers in the third decimal place, and beyond, do not bring any
informative content, although it appears to be an indication of accuracy.
The claim that it has meaning or not would go through a specific assessment of digit behavior in their
respective decimal places. That was what Bedeian, Sturman and Streiner (2009) did specifically for
the correlation measurements. In their construction, the authors used a simulation device in the
generation of 10.000 pairs of variable samples, normally distributed, with a predetermined correlation
(of 0.150), and they tested the stability of the first three digits for sample sizes 60, 100, 200, 500,
1.000, 10.000 and 100.000. Obviously, the digits (of the decimal place) of sample correlation of each
pair generated may vary, and, on the other hand, being expected that they are concentrated around
the known values (that is, for the first digit ‘1’, for the second ‘5’, and for the third, ‘0’). If not, there
is the sign that, for that sample size, the presentation of the digit may not have sense and not bring
any precision indication.
In other words, in sample sizes which the digits do not follow the expected distribution (concentrated
around the known value), the presentation of the digits does not bring any additional information, let
alone precision. In the case of correlation, the simulation procedures of Bedeian, Sturman and Streiner
(2009) showed that:
 For the first decimal place, there is a concentration of the digits around the known value (1) with
samples of size 60 onwards. In samples of size 10.000 and beyond, the known is the only digit
that appears, that is, in samples of this size the first digit is always equal to the known one;
 For the second decimal place, the authors observed that the second digit only focuses around the
known value (5) for sample sizes 1.000 onwards, and sample sizes up to 500 showed an uniforme
distribution of 10 possible digits (from 0 to 9);
 As for the third decimal place, the selection by truncation of digits positioned beyond the third
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
48
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
place, even for samples of size 10.000, there is an uniform distribution of possible digits (from 0
to 9); and in the selection by rounding, in this sample size there is only a marginal signaling that
the digits are concentrated around the known value (which is 0).
The results of Bedeian, Sturman and Streiner (2009) point out that, for correlation measurements,
when the sample is small (around 60 units), only the first digit is informative; the others are unstable,
with equal probability of any of the 10 possible digits (0-9) emerging. The second decimal place is
meaningless for the analysis and provision of information only for sample sizes of 1.000 onwards;
for sample sizes of 500 or less, the second digit is unstable and there is equal probability of the
emergency of any possible digits (from 0 to 9). Finally, there is evidence that the third decimal place
does not bring any informative content (therefore, it does not signal any precision) in sample sizes
smaller than 10.000.
Considering that there is little research on the Social and Behavioral Sciences regarding sample sizes
larger than 10.000, this result suggests that the presentation of correlation results with more than two
decimal places means presenting useless information. That is, the position of the Academy of
Management, quoted above, seems well justified, at least for correlation measurements. What is left
to answer, however, is whether it occurs in other research results as, for example, the results of
descriptive statistics, regression parameters, time series, structural equations modeling or techniques
applied to experimental procedures (analysis of variance, Kruskal-Wallis test, among others).
3
SIMULATION AND RESULTS
This section advances in the simulation analysis from the application of procedures that are similar
to those of Bedeian, Sturman and Streiner (2009), although focusing on two categories of measures:
descriptive measures and regression analysis measures. The choice for these two techniques results
from its frequent usage in Social and Behavioral Sciences research. Procedures and results are
detailed in the following two sub-items.
3.1 RESULTS SIMULATION OF DESCRIPTIVE MEASURES
To evaluate the stability of decimal places of descriptive measures, an algorithm in the statistical
package R (see Appendix A) was developed for a simulation of 2.000 samples of a variable with
normal distribution with mean 3 and variance 1. The 2.000 samples were repeated for sample sizes
30, 300, 500, 1.000, 5.000 and 10.000, and these procedures were applied for the extraction of the
following measures: mean, variance, Pearson's coefficients of skewness and kurtosis. The choice of
sample sizes was based on practices observed in published research, which indicate the most frequent
sizes, except for the sizes 5.000 and 10.000, which are rare in research published in the main journals.
Also, by using recurrence checking, it was decided to evaluate the position (mean), dispersion
(variance) and format (skewness and kurtosis).
Thus, in each sample size, the 2.000 estimated values of each of the measures (mean, variance...)
were recorded in a matrix, and ‘rounded’ to three decimal places. Aftewards, the decimal places were
isolated, and then the number of times that the digits 0 to 9 (in the order of whole numbers) appear at
each position for examination of their adhesion to a standard of uniformly distributed quantity was
counted. The assumption is that, if the number of digits is distributed evenly among the possible
values, each digit is equally probable; that is, any value that may emerge in a result might be randomly
assigned to that position. As an illustration, Table 1 presents the number of times that each of the ten
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
49
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
digits appeared in the extraction of the sample variance – in the six sample sizes used.
TABLE 1
Results illustration.
n = 30
n = 200
2nd
3rd
1st
2nd
0
191
212
632
183
1
176
175
282
198
2
207
228
49
183
3
196
225
7
205
4
208
187
0
209
5
212
184
0
194
6
195
193
0
205
7
219
201
29
208
8
200
199
296
205
9
196
196
705
210
n = 1.000
n = 5.000
DIGIT
1st
2nd
3rd
1st
2nd
0
954
191
239
1.026
411
1
36
177
167
0
320
2
0
204
246
0
159
3
0
206
166
0
101
4
0
193
245
0
34
5
0
205
154
0
36
6
0
198
233
0
99
7
0
205
174
0
196
8
24
215
206
0
308
9
986
206
170
974
336
Source: Simulation carried out according to Appendix A algorithm.
DIGIT
1st
277
234
168
146
91
99
166
238
290
291
3rd
217
195
201
166
186
203
219
186
212
215
1st
846
144
6
0
0
0
0
1
103
900
3rd
274
89
286
117
277
140
275
143
263
136
1st
996
0
0
0
0
0
0
0
0
1.004
n = 500
2nd
198
224
207
185
217
206
185
208
164
206
n = 10.000
2nd
523
312
132
25
4
5
28
115
321
535
3rd
225
171
241
181
208
184
204
184
219
183
3rd
320
88
335
93
288
109
267
120
248
132
In the reference variable for simulation (normally distributed with mean 3.000 and variance 1.000),
it is known that the skewness is 0.000 and the kurtosis is 3.000. Therefore, for each measure, the true
decimal place values are all known and equal to zero. It is understood that, ideally, the sample
extraction should generate decimal values that are equal to the actual value, or, for a round, are set in
the real value. Obviously, for smaller samples, knowing that the variation of sampled values is higher,
it is expected that this does not occur in the same way for all three decimal places; however, on larger
samples, it is expected that there is a direct convergence or rounding to the true value. If this does not
occur in one of the decimal places, the evidence is that the presentation of that value is not
informative.
In the variance shown in Table 2, this fact can be clearly observed on the first digit from samples size
30 – being more clear in samples size 200 or larger. It is also observed that, for the second digit, the
concentration only occurs in samples size 5.000 onwards and, in the third digit, the evidence is that
there is no concentration of sample values around the real ones, not even for the samples size 10.000.
In fact, it is observed that the distribution of values in the third digit is approximately uniform in any
of the sample sizes and, for the second digit, the same happens for samples up to size 1.000.
To perform this analysis, there was a choice for exposing the way within which the values of the
digits behave to see if there is a concentration on specific values (close to the actual value), or if they
are distributed randomly among the available options (from digit 0 to 9). It was considered, therefore,
whether the digits follow a discrete uniform distribution with values between 0 and 9 (the sequence
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
50
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
of whole numbers) by means of a chi-square test for adhesion assessment. Here, only the significance
data of the test for each measurement and each sample size are presented.
TABLE 2
Results of the tests for regression models.
SAMPLE SIZE
MEAN
2nd
3rd
30
0.504
0.699
200
0.469
0.169
500
0.060
0.246
1.000
0.000
0.447
5.000
0.000
0.896
10.000
0.000
0.403
SAMPLE SIZE
SKEWNESS
1st
2nd
3rd
30
0.055
0.340
0.378
200
0.000
0.925
0.262
500
0.000
0.966
0.048
1.000
0.000
0.793
0.755
5.000
0.000
0.000
0.027
10.000
0.000
0.000
0.839
Source: Simulation carried out according to Appendix A algorithm.
1st
0.000
0.000
0.000
0.000
0.000
0.000
1st
0.000
0.000
0.000
0.000
0.000
0.000
1st
0.000
0.000
0.000
0.000
0.000
0.000
VARIANCE
2nd
0.672
0.860
0.122
0.823
0.000
0.000
KURTOSIS
2nd
0.608
0.713
0.825
0.837
0.435
0.222
3rd
0.147
0.162
0.005
0.000
0.000
0.000
3rd
0.315
0.608
0.367
0.572
0.806
0.614
The results are shown in Table 2 and the reference interpretation is as follows: significance level less
than 0.05 indicates that the hypothesis that digits are evenly distributed between 0 and 9 can be refuted
(that is, the values 0, 1, ..., 9 are not equally probable as values for that decimal place and there is
possibly a concentration around some values). On the other hand, the evidence is that the digits are
distributed uniformly (what indicates that any digit is equally probable). Table 2 shows the following
evidence:
 Regarding the mean, the first digit is stabilized around a few values (observing the simulation
tables, these are values that, when rounded, approach 0, which is the actual value), even in small
samples. The second digit is distributed in a randomized way up to the sample size 500, but in
larger samples the digits follow the same behavior of the first digit. Finally, the third digit has a
random behavior among the available options (0-9), even in sufficiently large samples;
 Concerning the variance, the first digit is already concentrated around the true value even in
smaller samples, but the second digit will only follow this behavior in larger samples (sizes 5.000
and 10.000). The third digit follows a different behavior, as the distribution of the 2.000 values is
no longer uniform in the existing options (0 to 9) from samples of size 500. However, the data
observation (Table 1) signals that the non-uniformity is derived not from a concentration near real
values by rounding (which is zero), but from an oscillation in the various options. Therefore, the
evidence is that, although the result signals non-uniformity, the third digit does not seem to be
informative as for the actual known value;
 For skewness, there is a similar signaling to other measures for the first digit (except for sample
size 30, which was slightly close to the uniformity pattern of the distribution of values). The
second digit indicated that there is a concentration of values from large samples. The third digit,
on the other hand, was kept in standard uniformity in all sample sizes, except for the sample size
5.000 (the verification of the results table indicates behavior similar to that observed in the case
of variance);
 Finally, for the kurtosis, evidence shows that only the first digit is concentrated around the true
value, even in smaller samples. The remaining digits are evenly distributed on the options
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
51
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
available (0-9), also in large samples.
As a general conclusion, for the mostly used descriptive measures, the presentation of decimal results
reaches informational level only in the first digit that stabilizes around values close to the real ones,
even in smaller samples. The second digit, in turn, presents an informational level depending on the
extent and size of the sample. In general, only larger samples (size 1.000 onwards) generate some
sense in the values of the second digit. Finally, the conclusion is that the third digit adds virtually no
meaningful information, even for large samples, no matter what the reference measurement is.
Thus, we conclude that the presentation of descriptive measures results with more than two decimal
places is a false rigor. Therefore, the presentation of results with two decimal places only makes
sense, in terms of information content, for samples larger than 1.000, depending on the measure.
Accordingly, evidence found is that research results are sufficiently informative when they have only
one decimal place, especially in smaller samples.
3.2 SIMULATION OF RESULTS FOR REGRESSION ANALYSIS
For the stability evaluation of decimal places of estimators of the regression parameters, an algorithm
was again developed in the statistical package R (see Appendix B), with 2.000 simulations for six
sample sizes (30, 200, 500, 1.000, 5.0000 and 10.000).
Since the estimators of regression have their accuracy associated with the error variance of the model,
it is considered the simulation for different levels of variance (indeed, considering the accuracy of the
estimators, it is expected that minor variances models will have greater accuracy in decimal values
than in models with increasing variance). Therefore, we have decided to adopt a model in which the
level of variance is very small in relation to the model (generating models with level of explanation R² - around 99%), and which grows to a very high level of variance (with explanation levels between
50% and 70%).
Thus, for simulating, a model in which the explanatory variable (X) would be uniformly distributed
in the range 0 and 1 was set, with the dependent variable (Y) equal to 5 plus 10 times the explanatory
variable, plus an error term ) normally distributed with expected value equal do zero and with a
constant variance. In other words, the true and known model was Y=5+10X+with ²). We
simulated 9 standard deviation levels (: 0.01, 0.05, 0.10, 0.50, 1.00, 1.25, 1.50, 1.75 and 2.00.
Therefore, at each level of error variance, the 2.000 estimated coefficients were recorded in a matrix,
and then rounded to three decimal places. Also in this phase, the decimal places were isolated for
analysis of their adhesion to a discrete uniform distribution with values between 0 and 9 (in the
sequence of integers) by means of a chi-square test for adhesion analysis similarly to what was done
in the first simulation.
The results of the p-value of the tests applied in the digits for each sample size and at every level of
standard significance level less than 0.05 indicate that the hypothesis that the digits are evenly
distributed between 0 and 9 can be refuted; otherwise, the evidence is that the digits are distributed
uniformly, what indicates that any digit is equally probable. Taking this interpretation as reference:
 In small samples (size 30), the first digit of the decimal places is concentrated around values that
are rounded to zero (the known real value) to levels of variance of low errors (the model), but for
growing levels, this digit becomes random to numbers in the sequence 0 to 9; the second digit
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
52
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa



does not show random behavior in very small levels of variation of the errors; and the third digit
is distributed uniformly in all of the levels of errors variance;
For reasonable sample size (sizes 200 and 500), the first digit quickly stabilizes around a few
values, but in samples of size 200 there is a tendency for the randomness of higher levels of error
discrepancy; the second digit stabilizes around some values for low variances; and the third digit
is concentrated around some values for very small variance levels, yet the trend is randomness;
For sample sizes 1.000 and 5.000, the behavior is similar to that observed in sample size 500, with
a small signal of stability of the second and third digits around a few values when the levels of
error variance are low;
Finally, for large samples (size 10.000), it is observed that the third digit only stabilizes around
some values when the error variance is small, the same happening with the second digit.
In general, the conclusion reached is that the first digit is stabilized around a few values, which are
those that, when rounded, go to the actual value, which, in this case, was zero since the real parameter
of multiplication of the response variable was 10.000; for space optimization, we chose not to expose
the illustrative results as those in Table 1.
The second digit exhibits oscillating behavior with concentration around values close to the actual
value when the error level of variation is small; but this digit tends to be uniformly distributed between
0 and 9 to higher levels of error variance.
Finally, the third digit is randomly distributed between 0 and 9 behavior in most extractions and for
higher levels of error variance. This behavior is manifested even for very large samples.
TABLE 3
Results of the tests for regression models.
ERROR
n* = 30
n = 200
DEVIATION
1st
2nd
3rd
1st
2nd
0,01
0.000
0.000
0.947
0.000
0.000
0,05
0.000
0.000
0.328
0.000
0.000
0,10
0.000
0.310
0.720
0.000
0.000
0,50
0.000
0.277
0.738
0.000
0.045
1,00
0.121
0.122
0.086
0.000
0.214
1,25
0.241
0.687
0.295
0.000
0.598
1,50
0.464
0.932
0.458
0.360
0.113
1,75
0.490
0.935
0.802
0.465
0.489
2,00
0.964
0.648
0.253
0.863
0.121
ERROR
n = 1.000
n = 5.000
DEVIATION
1st
2nd
3rd
1st
2nd
0,01
0.000
0.000
0.000
0.000
0.000
0,05
0.000
0.000
0.346
0.000
0.000
0,10
0.000
0.000
0.544
0.000
0.000
0,50
0.000
0.718
0.078
0.000
0.000
1,00
0.000
0.896
0.076
0.000
0.738
1,25
0.000
0.485
0.256
0.000
0.075
1,50
0.000
0.234
0.863
0.000
0.624
1,75
0.000
0.414
0.251
0.000
0.343
2,00
0.000
0.108
0.947
0.000
0.580
*n is the sample size.
Source: Simulation carried out according to Appendix B algorithm.
3rd
0.000
0.825
0.910
0.060
0.448
0.097
0.208
0.439
0.324
1st
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
3rd
0.000
0.000
0.629
0.305
0.966
0.526
0.666
0.373
0.064
1st
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
n = 500
2nd
0.000
0.000
0.000
0.891
0.445
0.364
0.967
0.447
0.487
n = 10.000
2nd
0.000
0.000
0.000
0.000
0.000
0.867
0.823
0.518
0.789
3rd
0.000
0.913
0.883
0.467
0.333
0.623
0.918
0.417
0.423
3rd
0.000
0.000
0.001
0.876
0.658
0.155
0.439
0.385
0.495
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
53
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
The result confirms that the presentation of results showing more than two decimal places in
regression extractions does not indicate any precision, even when samples are large. Even the second
decimal place is only stable around the real values when the level of error variance is small and the
samples are larger. Finally, the first digit always has increased stability around the real values when
large samples are taken, but in smaller samples this digit is also likely to be disposed randomly among
the possible digits (0 to 9) if the variability of errors is large.
4
FINAL CONSIDERATIONS
In an extensive research developed by Antonakis et al. (2014), the authors concluded that the effort
to build high-impact articles (which generate higher levels of citations) necessarily requires a solid
methodological construction, and accurate and robust techniques when approaching the classical
problems in the area of Social and Behavioral Sciences (presence of extreme values, skewness of the
variables, indirect measurement of latent constructs etc.). The authors’ conclusion reinforces the
argument commonly observed, namely, that researchers need to strengthen accuracy and precision
of their research, even in limited contexts.
Nonetheless, the accuracy goal cannot be pursued without a real concern about what is generated in
the statistical operation through a software. Possibly, the increased use of statistical packages by
researchers, at the same time that it has facilitated the use of techniques and it has perfected the results
of research, also brought the potential for defects in the results derived from the simple transfer of
software output results to research reports. Indeed, such procedures, when performed without due
reflection on the meanings and values, generate a false impression of precision and accuracy, besides
false conclusions. This seems to be the problem of the contemporary debate about statistical
significance developed into a well advanced discussion, as shown by Curtain and Landis (2011),
Kelly and Preacher (2012) and Orlitzky (2012)
In this article, there was a pursuit to contribute to this debate; specifically on the issue of results in
alignment with studies that focus on specific aspects of quantitative research reports and that has
motivated publications in reputable journals in the area of Social and Behavioral Sciences. Results
of simulations in this article indicate, therefore, quite specific recommendations, but these add up to
many others, to subsequently be aggregated in editorial policies of journals (ACADEMY OF
MANAGEMENT, 2011) or more general recommendations of research practices, as the example of
the texts of Aguinis and Edwards (2014) and Bedeian (2014).
Although the requirements of statistical and operational process simulations have been met, the article
has its own limitations of research based on this type of technique. Real data procedures would be
possible and recommended as, for example, with the application of bootstrapping procedures for
calculating specific measures. The study presented here has brought results for a specific set of
measures (descriptive and estimators of regression parameters) converging with findings of Bedeian,
Sturman and Streiner (2009) for correlation coefficients. Space limitations prevented the exploration
of other measures, it is, therefore, recommended that further studies or exercises evaluate other
measures (even by using or improving the algorithm developed for this application).
Finally, it is advisable that researchers in the field of Social and Behavioral Sciences in Brazil
undertake efforts of analysis of these research and methodological practices, besides the opening of
national journals for this type of study, as it has been a tendency in some international studies.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
54
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
Although there are actions in this direction (BARBOZA et al., 2013), it is understood that there is
still a large space for the development of this type of research.
5
REFERENCES
ACADEMY OF MANAGEMENT. Style guide for authors. Academy of Management Journal, v. 54,
n. 5, p. 1081–1084, 2011.
AGUINIS, H.; BOYD, B. K.; PIERCE, C. A.; SHORT, J. C. Walking new avenues in management
research methods and theories: bridging micro and macro domains. Journal of Management, v. 37,
n. 2, p. 395-403, 2011.
AGUINIS, H.; EDWARDS, J. R. Methodological wishes for the next decade and how to make wishes
come true. Journal of Management Studies, v. 51, n. 1, p. 143-174, 2014.
ANTONAKIS, J.; BASTARDOZ, N.; LIU, Y.; SCHRIESHEIM, C. A. What makes articles highly
cited? The Leadership Quarterly, v. 25, n. 1, 152-179, 2014.
BARBOZA, S. I. S.; CARVALHO, D. L. T.; SOARES NETO, J. B.; COSTA, F. J. Variações de
Mensuração pela Escala de Verificação: uma análise com escalas de 5, 7 e 11 pontos. TPA-Teoria e
Prática em Administração, v. 3, n. 2, p. 99-120, 2013.
BEDEIAN, A. G. More than meets the eye: a guide to interpreting the descriptive statistics and
correlation matrices reported in management research. Academy of Management Learning &
Education, v. 13, n. 1, p. 121-135, 2014.
BEDEIAN, A. G.; STURMAN, M. C.; STREINER, D. L. Decimal dust, significant digits, and the
search for stars. Organizational Research Methods, v. 12, n. 4, p. 687-694, 2009.
BOTELHO, D.; ZOUAIN, D. M. Pesquisa quantitativa em administração. São Paulo: Atlas, 2006.
CORDEIRO, R. A.; SANCHES, P. L. B.; CAVALCANTE, K. O.; PEIXOTO, A. F.; LEITE, J. C. L.
Pesquisa quantitativa em finanças: uma análise das técnicas estatísticas utilizadas por artigos
científicos publicados em periódicos qualificados no triênio 2007-2009. Revista de Administração da
UFSM, v. 7, n. 1, p.117-134, 2014.
CORTINA J. M.; LANDIS, R. S. The earth is not round (p=.00). Organizational Research Methods,
v. 14, n. 2, p. 332-349, 2011.
CUMMING, G. Understanding the new statistics: effect sizes, confidence intervals, and metaanalysis. New York: Routledge, 2013.
KELLY, K.; PREACHER, K. J. On effect size. Psychological Methods, v. 17, n. 2, p. 137-152, 2012.
ORLITZKY, M. How can significance tests be deinstitutionalized? Organizational Research
Methods, v. 15, n. 2, p. 199-229, 2012.
WILCOX, R. Modern statistics for the social and behavioral sciences: a practical introduction.
Kentucky: CRC Press, 2012.
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
55
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
Appendix A - Algorithm implemented in R package for the simulation of descriptive
measures
library(moments)
t_amostra=c(30, 200, 500, 1000, 5000, 10000)
n=t_amostra
pvalues=matrix(0,3,9)
p_val1=c()
p_val2=c()
p_val3=c()
k=2000
#Number of samples
for(j in 1:length(t_amostra)){
M=matrix(0,k,n[j]+6)
for (i in 1:k){
set.seed(i+11)
M[i,1:n[j]]=rnorm(n[j],3,1)
M[i,n[j]+1]=var(M[i,1:n[j]])
M[i,n[j]+2]=round(M[i,n[j]+1], 3)
M[i,n[j]+3]=1000*M[i,n[j]+2]
M[i,n[j]+4]=floor(M[i,n[j]+3]/100)%%10
M[i,n[j]+5]=floor(M[i,n[j]+3]/10)%%10
M[i,n[j]+6]=floor(M[i,n[j]+3])%%10
}
#reference distribution
#sample statistic, to be iterated
# rounding of decimal numbers
#separation of decimals
#isolating the first decimal
# isolating the second decimal
# isolating the third decimal
## ALGORITHM FOR DIGITS MATRIX CONSTRUCTION
Freq_dígito=matrix(0, 3, 10)
for(l in 1:3){
Freq_observada=c()
var=M[, l+n[j]+3]
for(i in 1:10){
i1<-i-1
Freq_observada[i]<-length(var[var==i1])
}
Freq_dígito[l, 1:10]<-Freq_observada
}
D=as.data.frame(Freq_dígito); D
#ELEMENTS OF THE TABLE OF RESULTS
Díg.=seq(0:9)-1
Fr_esp=rep(k/10, 10)
Cum_esp=cumsum(c(rep(k/10,10)))
Fr_ob_Dig1=t(D)[,1]; Fr_ob_Dig2=t(D)[,2]; Fr_ob_Dig3=t(D)[,3]
Cum_ob_Dig1=cumsum(t(D)[,1]);
Cum_ob_Dig2=cumsum(t(D)[,2]);
Cum_ob_Dig3=cumsum(t(D)[,3])
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
56
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
# RESULTS FREQUENCY TABLE, CUMULATIVE, EXPECTED AND OBSERVED
Tabela=cbind(Díg., Fr_esp, Fr_ob_Dig1, Fr_ob_Dig2,Fr_ob_Dig3,
Cum_esp, Cum_ob_Dig1, Cum_ob_Dig2, Cum_ob_Dig3); Tabela
# QUANTILE-QUANTILE GRAPHS FOR THE UNIFORM DISTRIBUTION
par(mfrow=c(3,2))
plot(Cum_esp,
Cum_ob_Dig1,
col=2);
x=runif(10000,
0,
1000);
abline(lm(y~x));hist(M[, 5])
y=x;
plot(Cum_esp, Cum_ob_Dig2, col=3); x=runif(10000, 0, 1000); y=x; abline(lm(y~x));
hist(M[, 6])
plot(Cum_esp, Cum_ob_Dig3, col=4); x=runif(10000, 0, 1000); y=x; abline(lm(y~x));
hist(M[, 7])
## UNIFORM DISTRIBUTION TEST (CHI-SQUARED)
#FIRST DIGIT
ob1=Tabela[,3]
#Frequency observed value
es1=Tabela[,2]
#Expected quantile value
ct1=chisq.test(ob1, p=es1/k)
p_val1[j]=ct1$p.value
#SECOND DIGIT
ob2=Tabela[,4]
es2=Tabela[,2]
ct2=chisq.test(ob2, p=es2/k)
p_val2[j]=ct2$p.value
#THIRD DIGIT
ob3=Tabela[,5]
es3=Tabela[,2]
ct3=chisq.test(ob3, p=es3/k)
p_val3[j]=ct3$p.value
}
Tabela
# Frequency observed value
# Expected quantile value
# Frequency observed value
# Expected quantile value
p_val=round(cbind(t_amostra, p_val1, p_val2, p_val3), 3); p_val
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
57
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
Appendix B - Algorithm implemented in R package for simulation of regressions
tamanho<#Sample sizes: 30, 200, 500, 1000, 5000, 10000.
k=2000
M=matrix(0, k, 7)
Sigma = c(.01,.05,.1, .5, 1, 1.25, 1.5, 1.75, 2)
nsigma = length(Sigma)
pvalues = matrix(0,3,9)
p_val1=c()
p_val2=c()
p_val3=c()
for (j in 1:nsigma) {
for (i in 1:k){
num=runif(tamanho)
x=sample(num, tamanho, replace=TRUE)
y=5+10*x+rnorm(tamanho, 0, Sigma[j])
reg<-lm(y~x)
M[i,1:2]=reg$coefficients
M[i,3]=round(M[i,2], 3)
M[i,4]=1000*M[i,3]
M[i,5]=floor(M[i,4]/100)%%10
M[i,6]=floor(M[i,4]/10)%%10
M[i,7]=floor(M[i,4])%%10
}
#Reference distribution
#Rounding of decimal numbers
#Separation of decimals
#Isolating the first decimal
#Isolating the second decimal
#Isolating the third decimal
## ALGORITHM FOR DIGITS MATRIX CONSTRUCTION
Freq_dígito=matrix(0, 3, 10)
for(l in 1:3){
Freq_observada=c()
var=M[, l+4]
for(i in 1:10){
i1<-i-1
Freq_observada[i]<-length(var[var==i1])
}
Freq_dígito[l, 1:10]<-Freq_observada
}
D=as.data.frame(Freq_dígito); D
# ELEMENTS OF THE TABLE OF RESULTS
Díg.=seq(0:9)-1
Fr_esp=rep(k/10, 10)
Cum_esp=cumsum(c(rep(k/10,10)))
Fr_ob_Dig1=t(D)[,1]; Fr_ob_Dig2=t(D)[,2]; Fr_ob_Dig3=t(D)[,3]
Cum_ob_Dig1=cumsum(t(D)[,1]);
Cum_ob_Dig2=cumsum(t(D)[,2]);
Cum_ob_Dig3=cumsum(t(D)[,3])
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
58
Precision of Numeric Results in Quantitative Research: Reflections for the Fields of Social and Behavioral Sciences
Francisco José da Costa
# RESULTS FREQUENCY TABLE, , CUMULATIVE, EXPECTED AND OBSERVED
Tabela=cbind(Díg., Fr_esp, Fr_ob_Dig1, Fr_ob_Dig2,Fr_ob_Dig3,
Cum_esp, Cum_ob_Dig1, Cum_ob_Dig2, Cum_ob_Dig3); Tabela
# QUANTILE-QUANTILE GRAPHS FOR THE UNIFORM DISTRIBUTION
par(mfrow=c(3,2))
plot(Cum_esp,
Cum_ob_Dig1,
col=2);
x=runif(10000,
0,
1000);
abline(lm(y~x));hist(M[, 5])
y=x;
plot(Cum_esp, Cum_ob_Dig2, col=3); x=runif(10000, 0, 1000); y=x; abline(lm(y~x));
hist(M[, 6])
plot(Cum_esp, Cum_ob_Dig3, col=4); x=runif(10000, 0, 1000); y=x; abline(lm(y~x));
hist(M[, 7])
## UNIFORM DISTRIBUTION TEST (CHI-SQUARED)
# FIRST DIGIT
ob1=Tabela[,3]
# Frequence observed value
es1=Tabela[,2]
# Expected quantile value
ct1=chisq.test(ob1, p=es1/k)
p_val1[j]=ct1$p.value
#SECOND DIGIT
ob2=Tabela[,4]
es2=Tabela[,2]
ct2=chisq.test(ob2, p=es2/k)
p_val2[j]=ct2$p.value
#THIRD DIGIT
ob3=Tabela[,5]
es3=Tabela[,2]
ct3=chisq.test(ob3, p=es3/k)
p_val3[j]=ct3$p.value
}
# Frequence observed value
# Expected quantile value
# Frequence observed value
# Expected quantile value
p_val=round(cbind(Sigma, p_val1, p_val2, p_val3), 3); p_val
PMKT – Revista Brasileira de Pesquisas de Marketing, Opinião e Mídia (ISSN 1983-9456 Impressa e ISSN 2317-0123 On-line), São Paulo, Brasil, V.
16, p. 45-59, abril, 2015 - www.revistapmkt.com.br
59
Download

- Revista Brasileira de Pesquisas de Marketing, Opinião e