Histories of Brazil:
A linked-data repository
of three 16th century
This project is building an annotated corpus composed by three 16th century Portuguese narratives about Brazil – “História da Província Santa Cruz”, by Pero Magalhães de Gandavo (1576), “Tratados da terra e gente do Brasil”, by Fernão Cardim (1584), and “Notícias do Brasil”, by Gabriel Soares de Sousa (1584) – forming a comprehensive collection of the eighteen editions of the texts produced between 1576 and 1925. The motivation for the project came from the recognition of the linguistic relevance of the study of those texts, the realization of the challenges they pose in the philological aspect, and the observation of the potential for interesting computational work residing in the conjunction of such linguistic value and such philological challenges.
16th century Portuguese narratives about “the New World” have been traditional objects of scholarly interest in the fields of social, political and cultural history for over a century – in fact, the profusion and the solid tradition of studies founded on those writings forces any new historiographical approaches to argue for their relevance not in terms of the importance of the narratives themselves, but rather, in terms of the relevance and novelty of fresh approaches.
In the area of Portuguese linguistics, however, these documents have not traditionally been subjected to comparable interest. This scenery has changed considerably only in recent years, when research on the history of Portuguese in Brazil begun to confer greater importance to the written documentation of the language spoken by the Portuguese settlers in the context of the first stage of colonial history (16th and 17th centuries). Diachronic research has recently taken early colonial Portuguese documents as essential pieces in the puzzle of the upsurge of the marked grammatical changes that gave rise to contemporary Brazilian Portuguese, particularly thanks to the important collective projects started in the years 1990, such as Mattos e Silva (1991), Galves (1998), and Castilho (1998). In this context, the group formed by the first narratives written by the Portuguese on arrival at the conquered continent gained particular relevance, as we can clearly see in the emblematic case of the pioneer work of Prof. Rosa Virgínia Mattos e Silva on the Carta by Pero Vaz de Caminha, the inaugural document on the ‘discovery’ of Brazil by the Portuguese (Mattos e Silva, 1996). In more recent years, the linguistic community’s interest in early colonial documents may be seen in the lexicographical studies conducted by F. Gonçalves and C. Murakawa about the texts by Fernão Cardim (Goçalves & Murakawa, 2009, 2012; Gonçalves, 2007). My own works about the syntax of Classical Portuguese, most of them in the context of the collective projects mentioned above – more consistently, with Galves (1998, 2002, 2012) – have only been possible thanks to the access to key 16th century narratives, most notably those by M. de Gândavo and F. Cardim, whose texts ground the analysis of the change from Classical to Brazilian Portuguese I have suggested in Paixão de Sousa (2008 a/b, 2009, 2012).
The amplification of linguistic investigation founded on 16th century Portuguese narratives, however, is currently hindered by important challenges in the philological realm. The three texts selected for this project, in particular, present unbelievably complex editorial trajectories – to the point, in fact, that their authorship cannot be safely affirmed. In a preliminary survey for the present project, finding critical editions, dedicated critiques, and original manuscripts for the three titles proved a daunting task. The survey did find myriad versions for the three titles – twenty-six editions in all, among manuscript copies, XVI to XX century printed editions, and relevant translations – but, unfortunately, only one instance of reasonably trustworthy original manuscripts (for Cardim’s text). The survey also showed that good part of this material can be found in digital format with free access in the world wide web; however, the overall organisation of the different versions and editions leave much to be desired, and the quality of the digital editions (with only one exception) is well below what would be acceptable for linguistic studies.
We are faced, therefore, with the following problem: we observe the existence of documents with immense value and relevance for linguistic investigation, most of them in digital format; however, their digital versions have not been scholarly edited, and are dispersed around the web. We propose, with this project, to develop a computational treatment for this group of documents, by which the disperse versions would be collected into a logical net, composed by scholarly digital editions, in which relevant linguistic structures in each text would be mapped and codified, opening the possibility for comparative studies of the different texts among each other, and of the different versions of each text along time.
The project begins in this year of 2015 with a detailed linguistic study on one of the titles of the corpus, the “História da Província Santa Cruz” by M. de Gândavo, taken as a pilot instance for the development of the linguistic annotation to be later extended to the other texts.