Linguistic results

Preliminary Version,
January 2016

Quantitative outlook

A few experimental queries with Corpus Search were conducted over a specific set of data – nouns phrases projected by nouns, in high argumental positions (i.e., subjects and accusative objects), in main clauses – to observe the patterns of order of the phrases in combination with their referential properties.

The quantitative results are exposed below in a series of graphs. We start here with a brief summary of what the graphs show, and we’ll debate them further on.

As regards the order of arguments only, the general picture in this text is not surprising, following what we expected in previous research (when the reference patterns were not accounted for): there is an even rate of pre-verbal and post-verbal argumentos (50%-50%), which is inverted when we consider subjects (67% pre-verbal) and objects (17% pre-verbal) separately. As mentioned, none of this is unexpected.

However, when we try to refine this results according to some aspects pertaining the reference patterns, a few interesting patterns already may be observed. The criteria used for this refinement were of two main kinds:

  1. First, the data on argument order was separated according to the status of the noun phrases – weather they were new referents (‘heads’) or further references to already established referents (‘mentions’);
  2. Second,the data on argument order was separated according to the patterns of determination in each NP – mainly, the kind of determiner used (or not), if definite or indefinite, or if a demonstrative determiner is used.

In brief terms, the first results show the following scenario. The seaparation of the data according to reference status does not show very promising results if the only criterion used is wether the phrases are ‘heads’ or ‘mentions. The refinement by patterns of determination, however, showed more intriguing observations: we can see different behaviours in subjects and objects, and also, different behaviours in each ordering pattern (pre-verbal or post-verbal). Finally, when we combine  reference status and pattern of  determination, we start to see really interesting behaviours.

The full set of results is exposed in the graphs below; please click on each of them to se a larger picture with better quality.

(1) Order of arguments

The general rate of pre-verbal and post-verbal arguments in the text was absolutely even, with 50% pre-verbal, 50% post verbal:

Considering subjects only, the rates change to 67% pre-verbal and 33% post-verbal:

Considering objects only, the rates are reverted, with 17% pre-verbal object and 83% post-verbal objects:

 

(2) Order of arguments by reference status (‘heads’ or ‘mentions’)

The first refinement over this raw data was the separation of phrases by their status as ‘heads’ or ‘mentions’. The graphs in this sub-section show the results, in the following way: each graph has two columns of data, and both refer to the ‘pre-verbal’ order rates in each environment (all arguments; subject; objects). The gray column repeats the general rates already shown above; the columns in different colours presents the rate in the specific pattern (‘head’ or ‘mention’).

The graph below shows the first result for all arguments (i.e., subjects and objects combined) that are first references (‘heads’). The rate is 44%, slightly lower than the general rate of 50% mentioned above:

The rate of pre-verbal subjects that are ‘heads’ is 66%, virtually the same as the general rate of 67% seen above; and the rate of pre-verbal objects that are ‘heads’ is 17%, identical to the general rate of 17% seen above:

 

The graphs below show the same patterns, but in reverse, with the rates or arguments that are ‘mentions’ compared to the general rates:

 

 

(3) Patterns of determination

The second criterion used to observe the data was the pattern of determination in each phrase – first of all, the kind of determiner. The graphs below show that in a first outlook, independent of the order – i.e., they present the different types of determiners in all arguments, then in all subjects, then in all objects, all of them pre-verbal and post-verbal. Each graph is a simple column with the percentages of:

  • Not determined (‘Bare noun phrases’ in the graphs)
  • Phrases with indefinite determiners
  • Phrases with definite determiners
  • Phrases with demonstratives

Considering all high arguments, subjects and objects, pre-verbal and post-verbal, the distribution is as below – 32% of arguments are not determined; 13% have an indefinite determiner;  35% have a definite determiner; and 20% have a demonstrative. This of course doesn’t mean much, and is exposed only as a basis for the comparisons with the separated data below, for subjects and for objects:

The pattern for subjects is similar to the general pattern; the main differences are a lower rate of not-determined phrases (22% versus 32%) and phrases with indefinite determiners (10% versus 13%), compensated by a higher rate of  phrases with definite determiners (41% versus 35%) and phrases with demonstratives (27% versus 20%); but the general proportion is similar. However, in objects, the pattern is remarkably different, with a much higher rate of non-determined objects (53%, versus 32%) and a much lower rate of phrases with demonstratives (7% versus 20%):

 

(3.1) Patterns of determination and reference status (‘heads’ or ‘mentions’)

If we now separate the data, in each group, according to the reference status of the phrases as well, the patterns change considerably.

The first set of graphs show the patterns of determination considering only phrases that are ‘heads’ (new referents).

In the general case, i.e., for all arguments,the most important difference from the general rate is that in the subset of phrases that are heads, there are no phrases with demonstratives. This is true for all the environments and is not surprising at all, as a phrase with a demonstrative was never interpreted as a new referent in my reading.

The second difference is more striking: for heads, the proportion of not-determined phrases is considerably higher than the general rate: 47% of new referents are undetermined, in contrast with 32% in general (as seen above).

The same is true for for subjects and objects that are heads (no demonstratives, and relatively higher rates of undetermined phrases):

 

The graphs below show the patterns of determination considering only phrases that are ‘mentions’ (not new referents). Note that the data is not complementary to what is shown above for heads; the universe is different, as it now includes the phrases with demonstratives.

The most important difference as compared to the general rates in this group is precisely the proportion of phrases with demonstratives: for all arguments, 55% of phrases that are mentions have a demonstrative, as opposed to 27% in the general scenario. In contrast, the proportion of non-determined phrases is lower for mentions than in the general scenario:

For subjects and objects, similar relative patterns are seen: elevation in the rates of demonstratives, and lowering in the rates of undetermined phrases. Notice the case of undetermined objects, with a rate of 22% – a relatively high rate when compared to subjects (3%) and all arguments (6%), but lower than the rate of undetermined phrases in all kinds of objects (32%):

 

(4) Patterns of determination and reference status in different argument orders

The next group of graphs show the results when we combine (3) above with (2) further above, presenting the patterns of determination in each of the types of order – pre-verbal and post-verbal – for all arguments, only subjects, and only objects.

The form of the graph is similar to the ones in (3) above, with each column exposing the full distribution of the five considered patterns, but with two columns in each graph, one for pre-verbal arguments one for post-verbal arguments.

The first graphs presents the distribution of each type of determination pattern for all arguments (pre-verbal versus post-verbal). The first observation is that the patterns are considerably different in each order: in pre-verbal arguments, the proportion of undetermined phrases is higher than in post-verbal arguments (23% versus 41%), and the proportion of phrases with indefinite determiners is also lower (8% versus 18%); in contrast, the proportion of phrases with definite determiners is higher (40% versus 29%), and so is the proportion of phrases with demonstratives (29% versus 12%):

In the subsets subjects only (pre-verbal versus post-verbal), and objects only (pre-verbal versus post-verbal), the patterns are also different according to the order. However, they are different in different ways. The patterns for subjects are not distant from what is seen for arguments in general; the patterns for objects are characteristic, with a higher proportion of indetermination in pre-verbal objects than in post-verbal objects (instead of lower, as with subjects and in general):

 

The next group of graphs combine the patterns of determination in each order with the criterion status of referent, ‘heads’ or ‘mentions’.

The first subgroup are phrases that are mentions (again notice the absence of demonstratives). Notice that here, as with the general patterns above, post-verbal arguments tend to present a higher proportion of non-determination and indetermination than pre-verbal arguments – and this time, this is true for the three cases, all arguments, subjects only, and objects only:

 

 

For ‘mentions’, the data is as follows (no including demonstratives). Again the patterns of determination are different in each order, pre-verbal or post verbal. For all arguments considered together, the proportion of phrases with demonstratives is considerably higher in pre-verbal position than in post verbal, affecting all other possible patterns:

For subjects, the relative change is similar to the general pattern. For objects, the change is different. The proportion of demonstratives is lower in pre-verbals than in post-verbals, and the proportion of phrases with definite determiners is higher in pre-verbals than in post-verbals:

 

(5) Order of arguments by different patterns of determination and reference status

We shall now see the combination of patterns of determination with reference status in a different light.

The first group of graphs below compare the rates of pre-verbal arguments in different environments defined by patterns of determination. This of course repeats part of the data shown above, but now valuing the contrast within the pre-verbal cases only.

In the first graph, for instance, we can see the general rate of pre-verbal/post verbal arguments (in gray) contrasted with the rate of pre-verbal/postverbal arguments in the pattern ‘phrases with no determiner’. As can be seen, the rate of pre-verbal subjects in this environment, at 36%, is lower than the general rate of 50%:

This translates into a similar pattern or subject arguments only, with a rate of 54% pre-verbals in the environment ‘non-determined’, as compared to the 67% general rate:

For objects, the opposite happens: the rate of pre-verbal objects in the pattern ‘non-determined’ is 22% slightly higher than the general rate of OV, 17%:

Something similar happens in the pattern ‘phrases with indefinite determiners’, with the rates of pre-verbal subjects lower in this group than in the general rate for subjects only; for objects, however, the rate is stable:

When we take the group ‘phrases with definite determiners’, for subjects only the rates of pre-verbals are slightly higher than in the general scenario – 71% versus 67%; the rates for objects do not change:


In the group ‘phrases with demonstratives’, a more remarkable contrast is seen. For subjects only, the rates of pre-verbals are considerably higher than in the general scenario, 80% versus 67%; for objects, on the other hand, the rates of pre-verbals is lower in this group (8% versus 17%):

(6) Further data – the ‘specificity’ factor

Another way to look into this data is by separating phrases that are ‘more specific’ and ‘less specific’ (as will be justified below) – this ‘specificity’ being defined, here, by the depth of modification material present in the phrase (in the form of modifiers and complements).

The smaller group of graphs below show this, contrasting the rates of pre-verbal arguments that are ‘heads’ and that are ‘mentions’ in the general rates already seen above with the rates of pre-verbal arguments (also ‘heads’ and ‘mentions’) in constructions without modifiers and complements, and then in constructions with modifiers or complements.

The first sub-set shows the environment ‘non-modified phrases’, only for ‘heads’. For subjects, the rate of pre-verbals here (76%) is higher than for all heads (67%) – as for objects, there are no cases of pre-verbal objects that are at the same time heads, not-modified phrases:

If we now move to the opposite side of the spectrum – phrases that at the same time are ‘mentions’ and contain modified materials – we will see that both with subjects and with objects the proportion of pre-verbals is higher with this type of phrase than in general. In objects, however, the difference is more striking, with a rate of 50% pre-verbals in this environment, contrasting with 17% with objects more generally considered:

The first conclusions that may be taken from this data are discussed below.

First conclusions

Technical consequences of the first survey on the data

The first conclusion that can be taken after the study of this first set of data is of a more technical nature. It has to do with the contrast between the results I had expected from the annotation and the results that actually showed. In fact, when the project started, my expectations, because of my preliminary hypothesis, was that the contrast between pre-verbal and post-verbal subjects in this text was connected to their ‘discoursive prominence’, and that this ‘prominence’ would be captured in the annotation simply by marking referents as ‘new’ or ‘old’ (or ‘heads’ and ‘mentions’ in my terminology). Nothing could be further from the truth.

As the preliminary data already shows, this criteria yields very little insights into the data. There is no respectable difference in the patterns of pre-verbal and post-verbal subjects that are ‘new’ or ‘old’ information, at 66% and 70% respectively (the relevant graphs are repeated here – again, please click for better quality):
image047 image053

This has, first of all, an impact in the annotation: it means that the markup cannot be limited to marking ‘new’ and ‘old’. Thanks to the fact that some queries with Corpus Search were applied as tests in the very beginning of the process, the annotation was changed accordingly. In broad terms, the annotation had to be deeper, marking not only the status of ‘novelty’ of each referent, but also relevant aspects of their internal construction.

Those aspects were the ones that I deemed as influential, as the annotation proceeded – and that are covered in the graphs shown above: the patterns of determination, and the depth into which each phrase is modified and complemented. The final results with the data seem to show this was a good decision, as the more detailed markup provides the possibility of more refined queries and better results.

Main conclusions

In linguistic terms, the data studied so far leads to one general conclusion, in my view.

The main factor in the relation between the order of arguments in the clause and their referential patterns in this text is what I would call ‘specificity‘.

By this I mean a combination of two factors: the pattern of determination of the phrase as regards the kind of determiner that is present or not; and the presence of modifying materials (adjectives, quantifiers) or complements.

In this sense, a phrase with no determiners and no modifying material would be on one side of a continuum – as in, ‘-specified‘ – and a phrase with determiners, modifiers and/or complements, on the opposite side – as in, ‘+specified‘:

fruto‘,
(‘fruit’)

in
De outras plantas e ervas que não dão fruto , nem se sabe o para que prestam se podia escrever muitas coisas de que aqui não faço menção , porque meu intento , não foi senão dar notícia  ( como disse d @estas de cujo fruto se aproveitam os moradores d@ @a terra . (ID G_008,17.265)

vs.

o fruto desta doutrina‘,
(‘the fruit of this doctrine’)

in
E para que o fruto d@ @esta doutrina se não perdesse ,  antes de cada vez fosse em mais crescimento , determinaram os mesmos Padres de atalhar todas as ocasiões que lhe podiam d@ @a nossa parte ser  impedimento ,  causa de escândalo , e  prejuízo a@  @as consciências d@  @os moradores d@  @a terra . (…)  (ID G_008,45.838)

This is, of course, only a very tentative and preliminary typology. But the patterns exposed above and the experience of annotation show, in my view, that this is the correct way to follow in order to continue the research into the relation between the referential properties of the constituents and their order in the texts.

This seems to be important, first and foremost, to understand the patterns for the placement of subjects – the main concern of this research as regards its syntactic aims.

In this regard, what can be generalized in this text, in my view, is the following:

The more ‘specific’ the subject,
the more it tends to be pre-verbal.

This is what I see in the results from the searches – as we saw above,  for instance, the rates of pre-verbal subjects are higher in constructions with determiners, at 71%, than with constructions without determiners, at 54% (see first two selected graphs repeated below); and are at their highest, at 80%, in constructions with demonstratives (third selected graph):

image059image065image077

To begin a tentative generalization – pre-verbal subjects tend to present internal structures such as the following:

With determiners, and modifiers or complements

‘E os índios da terra que ali se ajuntaram ouviam tudo com muita quietação…’
And the indians of the land who there gathered listened to all with much quietness…

( (IP-MAT (CONJ e)
	  (NP-SBJ (ID ID.0134.H-AC-DNC=5055.0079/000) 
		  (D-P os)
		  (NPR=0079/000 Índios)
		  (PP (P d@)
		      (NP  (ID ID.0135.M-RR-DNN=0046/003)  (D-F @a) (N=0080:0046/003 terra)))
		  (CP-REL (WNP-1  (ID ID.0136.M-EE-PPP=5055/001)  (WPRO que))
			  (IP-SUB (NP-SBJ *T*-1)
				  (ADVP (ADV ali))
				  (NP-ACC  (ID ID.0137.M-EE-PPP=5055/002)  (CL se))
				  (VB-D ajuntaram))))
	  (VB-D ouviam)
	  (NP-ACC  (ID ID.0138.M-EC-QQQ=8401#/000)  (Q tudo))
	  (PP (P com)
	      (NP  (ID ID.0139.H-AC-NNQ=8303.0081/000)  (Q-F muita) (N=0081/000 quietação)))
	  (, ,)
	  (IP-GER (VB-G usando)
		  (PP (P de)
		      (NP (ID ID.0140.H-BB-NAN=9300.0082.0083/000) 
			  (Q-P todos)
			  (D-P os)
			  (N-P=0082/000 atos)
			  (CONJP (CONJ e)
				 (NX (N-P=0083/000 cerimônias)))
			  (CP-REL (WNP-2  (ID ID.0141.M-EE-PPP=9300/001)  (WPRO que))
				  (IP-SUB (NP-SBJ (ID ID.0142.M-EE-PPP=5055/003) 
						  (*pro* *pro*))
					  (VB-D viam)
					  (IP-INF (NP-ACC *T*-2)
						  (VB fazer)
						  (PP-SBJ (P a@)
							  (NP  (ID ID.0143.H-EC-DSS=0000$/000)  (D-P @os) (PRO$ nossos)))))))))
	  (. .))
  (ID G_008,6.9))

With modifiers or complements:

‘Outros animais ha nesta província muito feros…’
Other animals are there in this province very fierce…’

( (IP-MAT (NP-SBJ *exp*)
	  (NP-ACC (ID H-RC-NNO=7533.1473/000) 
		  (OUTRO-P Outros)
		  (N-P=1734:1473/010 animais)
		  (CP-REL *ICH*-1))
	  (HV-P há)
	  (PP (P n@)
	      (NP  (ID M-RR-ENN=0040/037)  (D-F @esta) (N=1735:0040/024 província)))
	  (ADJP (Q muito)
		(ADJ-P feros)
		(, ,)
		(CONJP (CONJ e)
		       (ADJX (ADJ-G-P prejudiciais)
			     (PP (PP (P a)
				     (NP  (ID M-RC-ENQ=8220.0856##5400/000)  (Q-F toda) (D-F esta) (N=1736:0856/002 caça)))
				 (, ,)
				 (CONJP (CONJ e)
					(PP (P a@)
					    (NP (ID H-RC-DNC=5049.1596/000) 
						(D @o)
						(N=1737:1596/001 gado)
						(PP (P d@)
						    (NP  (ID M-RR-DNN=0458/024)  (D-P @os) (N-P=1738:0458/027 moradores))))))))))
	  (. ;)
	  (CP-REL-1 (WPP-2 (P a@)
			   (NP  (ID M-EE-DPP=7533/001)  (D-P @os) (WPRO-P quais)))
		    (IP-SUB (NP-SBJ (ID M-EE-PPP=8888/066) 
				    (*pro* *pro*))
			    (VB-P chamam)
			    (IP-SMC (PP-SBJ *T*-2)
				    (NP-ACC (ID H-BB-NOM=7533:1739/000) 
					    (N-P=1739/000 Tigres)))
			    (, ,)
			    (ADVP (ADV ainda)
				  (CP-ADV (C que)
					  (IP-SUB (PP (P n@)
						      (NP  (ID M-RR-DNN=0046/077)  (D-F @a) (N=1740:0046/079 terra)))
						  (NP-SBJ (ID H-EC-DCC=5536/000) 
							  (D-F a)
							  (ADVP (ADV-R mais)
								(PP (P d@)
								    (NP  (ID H-RR-DNN=0051/000)  (D-F @a) (N=1741:0051/011 gente)))))
						  (NP-ACC  (ID M-EE-PPP=7533/001)  (CL os))
						  (VB-P nomeia)
						  (PP (P por)
						      (NP (ID H-BB-NOM=7533:1742/000) 
							  (N-P=1742/000 Onças))))))))
	  (. :))
  (ID G_008,21.316))

With definite determiners and complements:

‘As fontes que há na terra, são infinitas’
The fountains that there is in the land, are infinite

( (IP-MAT (NP-SBJ (ID H-AC-DNC=5149.0333/000) 
		  (D-F-P As)
		  (N-P=0333/000 fontes)
		  (CP-REL (WNP-1  (ID M-EE-PPP=0333/001)  (WPRO que))
			  (IP-SUB (NP-ACC *T*-1)
				  (NP-SBJ *exp*)
				  (HV-P há)
				  (PP (P n@)
				      (NP  (ID M-RR-DNN=0046/021)  (D-F @a) (N=0334:0046/014 terra)))))
		  (CP-REL *ICH*-3))
	  (, ,)
	  (SR-P são)
	  (ADJP (ADJ-F-P infinitas))

‘A segunda capitania que adiante se segue se chama Paranambuco’
The second capitainship that follows ahead is called Paranambuco

( (IP-MAT (NP-SBJ-3 (ID H-RC-DNC=5501.0608:0112##8500/000) 
		    (D-F A)
		    (ADJ-F segunda)
		    (N=0621:0608/001 capitania)
		    (CP-REL (WNP-1  (ID M-EE-PPP=5501/001)  (WPRO que))
			    (IP-SUB (NP-SBJ *T*-1)
				    (PP (P a)
					(ADVP (ADV diante)))
				    (NP-SE  (ID M-EE-PPP=5501/002)  (CL se))
				    (VB-P segue)))
		    (CP-REL *ICH*-4))
	  (NP-3 (ID COSE) (CL se))
	  (VB-P chama)
	  (IP-SMC (NP-SBJ *-3)
		  (NP-ACC (ID H-BB-NOM=5501:0622/000) 
			  (NPR=0622/000 Paranambuco)))
	  (. :)
	  (CP-REL-4 (WNP-1  (ID M-EE-DPP=5501/003)  (D-F a) (WPRO qual)) ...
		    )
	  (. .))
  (ID G_008,11.84)

With demonstratives and nouns:

‘Esta província é a vista muito deliciosa e fresca em grande maneira’
This province is to the sight very delicious and fresh in great way

( (IP-MAT (NP-SBJ  (ID M-RR-ENN=0040/013)  (D-F Esta) (N=0309:0040/008 província))
	  (SR-P é)
	  (ADJP (ADJP (PP (P a@)
			  (NP (ID H-RR-DNN=0050/000) (D-F @a) (N=0310:0050/002 vista)))
		      (Q muito)
		      (ADJ-F deliciosa))
		(CONJP (CONJ e)
		       (ADJP (ADJ-F fresca)
			     (PP (P em)
				 (NP  (ID M-RC-NNC=5081/001)  (ADJ-G grande) (N=0311:0113/004 maneira))))))
	  (. ;))
  (ID G_008,8.35))

With demonstratives and nouns that are ‘equivalents’ or ‘synonyms’:

‘Este rio tem na entrada muitas ilhas que o dividem em diversas partes’
This river has in its entrance many islands that divide it in many parts

( (IP-MAT (NP-SBJ (ID M-AA-ENN=0362:0336#6010/002) 
		  (D Este)
		  (N=0362/000 rio))
	  (TR-P tem)
	  (PP (P n@)
	      (NP  (ID H-AA-DNN=0363$6010/000)  (D-F @a) (N=0363/000 entrada)))
	  (NP-ACC (ID H-RC-NNQ=8003.0019/000) 
		  (Q-F-P muitas)
		  (N-P=0364:0019/001 ilhas)
		  (CP-REL (WNP-1  (ID M-EE-PPP=8003/001)  (WPRO que))
			  (IP-SUB (NP-SBJ *T*-1)
				  (NP-ACC  (ID M-EE-PPP=6010/003)  (CL o))
				  (VB-P dividem)
				  (PP (P em)
				      (NP  (ID H-RC-NNC=5096.0013/000)  (ADJ-F-P diversas) (N-P=0365:0013/005 partes))))))
	  (, ,))
  (ID G_008,9.41))

‘Este óleo não se acha todo ano perfeitamente nestas árvores’
This oil is not fount all the year round perfectly in those trees

( (IP-MAT (NP-SBJ-1  (ID M-AA-ENN=1245~1439#5200/007)  (D Este) (N=1245/000 óleo))
	  (NEG não)
	  (NP-SE-1 (ID COSE) (CL se))
	  (VB-P acha)
	  (NP-ADV  (ID H-RC-NNQ=8038.0018/000)  (Q todo) (N=1455:0018/005 ano))
	  (ADVP (ADV perfeitamente))
	  (PP (P n@)
	      (NP  (ID M-RR-ENN=6076/003)  (D-F-P @estas) (N-P=1456:1288/004 árvores)))
	  (, ,))
  (ID G_008,17.252))

With demonstratives/definite determiners and elipsis:

‘Os louros têm um cabelo muito fino’
The blond (ones) have very fine hair

‘Os pardos se acham daí para o Norte em todas as outras capitanias’
The brown (ones) are found from there to the North in all other capitainships

( (IP-MAT (IP-MAT-PRN (NP-SBJ *exp*)
		      (VB-P convém)
		      (PP (P a)
			  (IP-INF (VB saber))))
	  (, ,)
	  (NP-SBJ *exp*)
	  (HV-P há)
	  (IP-SMC (IP-SMC (NP-SBJ  (ID H-EE-UUU=6110##6109/000)  (D-UM-P uns))
			  (ADJP (ADJ-P louros)))
		  (, ,)
		  (CONJP (CONJ e)
			 (IP-SMC (NP-SBJ  (ID H-EC-OOO=7467##6109/000)  (OUTRO-P outros))
				 (ADJP (ADJ-P pardos)))))
	  (. .))
  (ID G_008,23.370))

( (IP-MAT (NP-SBJ  (ID M-EC-DCC=6110/001) (D-P Os) (ADJ-P louros))
	  (TR-P têm)
	  (NP-ACC (ID H-RC-UNC=6111.1752/000) 
		  (D-UM um)
		  (N=1935:1752/001 cabelo)
		  (ADJP (Q muito) (ADJ fino)))
	  (, ,))
  (ID G_008,23.371))

(...)

( (IP-MAT (NP-SBJ-1  (ID M-EC-DCC=7467/001)  (D-P Os) (ADJ-P pardos))
	  (NP-SE-1 (ID COSE) (CL se))
	  (VB-P acham)
	  (PP (PP (P d@)
		  (ADVP (ADV @aí)))
	      (P para)
	      (NP  (ID M-RR-DNN=0219/013)  (D o) (NPR=1943:0219/012 Norte)))
	  (PP (P em)
	      (NP  (ID H-RC-DNQ=8867.0112/000)  (Q-F-P todas) (D-F-P as) (ADV-R mais) (N-P=1944:0112/014 capitanias)))
	  (. .))
  (ID G_008,23.375))

With demonstratives only:

‘Isto geralmente se costuma nestas partes’
That generally is used in this parts

( (IP-MAT (NP-SBJ-1  (ID M-EE-EEE=9910#/000)  (DEM Isto))
	  (ADVP (ADV geralmente))
	  (NP-SE-1 (ID COSE) (CL se))
	  (VB-P costuma)
	  (PP (P n@)
	      (NP  (ID M-RR-ENN=0013/008)  (D-F-P @estas) (N-P=1102:0013/015 partes)))
	  (, ,))
  (ID G_008,15.161))

A first idea for future analyses – ‘syntactic topics

In brief, my partial conclusion is that typical pre-verbal subjects in this text are phrases with a ‘+ specific’ trace, be them new or old referents.

My first hypothesis, outlined here, was that the pre-verbal position in Classical Portuguese was reserved for ‘prominent’ constituents. Having finished the first stage of the research and being able to study their results and make this first generalizations, I do not think the present conclusion is in contradiction with my preliminary hypothesis.

In fact, I believe this ‘specificity’  property I am trying to define is deeply connected with my first intuition, which was that the subjects in pre-verbal position had a ‘prominent’ referential property – that could be defined as ‘topicality‘.

What is, after all, a topic? The more general way to describe it is as ‘old information’ (a worn and torn term I was trying to avoid). In pragmatic terms, this may mean a piece of shared knowledge or an already mentioned referent –  in any case, a ‘given‘, something that is taken as ‘understood’ in the (textual or broader) context. The more theoretical ways to describe ‘topics’ are varied, as we know; but most of them do rely on this basic assumption – that ‘topicality’ is related to ‘given-ness‘.

The biggest question here is – how can this general description help syntactic research? In other words, how does ‘given-ness’ translate syntactically? Is there any way we can observe it, other than by intuition and interpretation of the referential properties of a text?

My tentative answer would be that, in syntactic terms, ‘old information’ can come in many guises – and one of them is in the form of very clearly ‘specified’ constructions, where the ‘specification’ material serves the purpose of establishing the set of ‘given-ness’ of the referent  (such as in the specified, construction-rich phrase ‘the fruit of this doctrine‘, above, in contrast with the un-specific, construction-poor phrase ‘fruit‘). This is what I am calling ‘specificity’ – a constructed specificity.

I would venture that such ‘specific’, ‘richly constructed’ phrases could be called syntactic topics.

Syntactic topics‘ would be constructions that may be identified as carrying a strong component of ‘given-ness’ – even before any interpretation or resource to the broader context. They would be classified as ‘topics’ for the simple reason that the common knowledge presupposed in the context of communication is translated into formal material that explicitly point out, make reference, and specify : materials such as determiners, modifiers, and complements:

‘E os índios da terra que ali se ajuntaram ouviam tudo com muita quietação…’
And the indians of the land who there gathered listened to all with much quietness…

As fontes que há na terra, são infinitas’
The fountains that there is in the land, are infinite

A segunda capitania que adiante se segue se chama Paranambuco’
The second capitainship that follows ahead is called Paranambuco

Esta província é a vista muito deliciosa e fresca em grande maneira’
This province is to the sight very delicious and fresh in great way

Este rio tem na entrada muitas ilhas que o dividem em diversas partes’
This river has in its entrance many islands that divide it in many parts

Este óleo não se acha todo ano perfeitamente nestas árvores’
This oil is not fount all the year round perfectly in those trees

Os louros têm um cabelo muito fino’
The blond (ones) have very fine hair

Os pardos se acham daí para o Norte em todas as outras capitanias’
The brown (ones) are found from there to the North in all the other capitainships

Isto geralmente se costuma nestas partes’
That generally is used in this parts

In this context, the ‘profile’ of phrases that tend to appear in the positions that I was calling ‘prominent’ is of high specificity, so that I could call them now syntactic topics. They do tend to be ‘old information’ (‘mentions’ in the terminology of my annotation) – but more than that: they tend to be constructed in such a way so that all indication of this ‘given-ness’ is explicit in the construction.

I do not believe this is a generalisation that would have any bearing on a more general definition of ‘topic’ in an informational or pragmatic perspective. It is also not an absolute statement, as ‘weakly’ constructed phrases of course may also be ‘topics’ – as given-ness does not, absolutely, need to rely on structure. It may rely, for instance, on knowledge of the world, which explains why poorly constructed phrases may appear as clear ‘givens’ (the prominent case would be bare noun phrases with proper nouns, or expressions such as ‘The King‘). It is all, first of all, a matter of scales in a continuum.

But in this continuum, it would be interesting to be able to mark some discrete category to which we may ‘anchor’ research – in particular, in the case of comparative syntax research. The idea of ‘syntactic topics‘ aims to walk towards this anchoring.


Final remarks on the first linguistic results

Combining the linguistic aspects and the technical aspects mentioned above,  my main conclusion over these first empirical results is that it will be necessary to make this annotation better in the future by conducting further queries with Corpus Search and thereby trying to find the exact factors that are at play; but that, nevertheless, the stage of the technique up to this point is a good start.

Mainly, because it is allowing for patterns to be found in the text that are at the same time independent of intuitive interpretation, and do not entirely clash with it; rather, they make it more precise, and allow it to be reproduced for future texts (as was, in fact, the main idea in starting the annotation).



Gallery of graphs

Graphs gallery – (1) Order of arguments, simple:

Graphs gallery – (2) Order of argument by reference status (‘heads’ or ‘mentions’):

Graphs gallery – (3) Patterns of determination and reference status (‘heads’ or ‘mentions’):

Graphs gallery – (4) Patterns of determination and reference status in different argument orders:

Graphs gallery – (5) Order of arguments by different patterns of determination (type of determiner) and reference status:

Graphs gallery – (6) Order of arguments by different patterns of determination (NP modification) and reference status:

Anúncios