Original file was afterFeedback tex


Case study: the Contemporary Wayang Archive



Yüklə 49,24 Kb.
səhifə4/5
tarix22.07.2018
ölçüsü49,24 Kb.
#57620
1   2   3   4   5

5 Case study: the Contemporary Wayang Archive


The mission of CWA is to annotate, translate and preserve contemporary wayang kulit (Javanese shadow puppetry) audiovisual documents (Escobar Varela 2016). Wayang is not text-based, but language is an essential feature of this artform. It is a verbose practice, where words are very important (Arps and Soeroto 2016; Cohen 2010; Emerson and Asmoro 2013a, 2013b; Mrázek 2005).

The best practices within language documentation target a range of different goals from collection to representation to archiving. First, that language material collected is of high-enough quality. Second, that the language and its metadata are represented in such a way that they are useful for a wide range of users. Third, that language material is properly archived into perpetuity. These are but some of the primary concerns of those engaged in language documentation; other concerns include ethical ones, as well as considerations on the types of data that one should be collecting.

The CWA was originally conceived for theatre scholars. However, given the linguistic aspect of the performances (a unique combination of Javanese and Indonesian words), we imagine that this archive could also be of use for language researchers. In presentations and discussions with language scholars, several recommendations were made to improve CWA. In the following paragraphs, we describe the suggestions, the extent to which they are relevant and practical, and the current state of their implementation. Our aim here it to show that while not all suggestions from language documentation are necessary or realistic, they help uncover blind spots in theatre archives.The suggestions are the following:


  1. Better Metadata.

  2. Enable better search and retrieval of data.

  3. More visualizations and corpus-based search features.

  4. Automatic dictionaries.

  5. Modular documentation.

  6. Transcript features.

  7. Easy citation guidelines.


Better metadata. Metadata, or "data about data" serves several crucial functions. It makes resources more easily findable, helps migrate files across generations of software and hardware and makes resources compatible across different collections. The problem is finding an appropriate metadata schema. A comprehensive schema based on the CIDOC-CRM would be useful in describing the minutia of the wayang collection. However, the CIDOC-CRM (CIDOC 2006) is costly to implement and few other theatre archives current use it, making the potential for interoperability lower. Language archives often implement IMDI (ISLE Metadata Initiative) metadata schema, developed by the Language Archive is "a metadata standard to describe multi-media and multi-modal language resources" (Max Planck Institute for Psycholinguistics 2003). iv. This standard is very well maintained but its emphasis on sessions and projects make it less applicable theatre archives, where this vocabulary would need to be adapted. Another option is the TEI (Text Encoding Initiative) Guidelines, widely used in literary digital humanities projects. The TEI Guidelines include a section on theatre (Text Encoding Initiative 2017), but they are narrowly focused on performance texts and are ill-suited for the improvisatory, highly musical aspects of wayang.

The option we have chosen is to limit the metadata to Dublin Core (DC) records, for the time being. DC records capture very general information about a resource (Dublin Core Metadata Initiative 1995-2017). At its most simple DC includes only 6 elements (Title, Creator, Subject, Description, Publisher, Contributor, Date and Type). They are aimed at any type of record and this is a double-edged sword. Their generality makes them widely applicable (they are the most commonly used metadata standard by a far share (Berners-Lee and Kagal 2008)) but they cannot be used to make specific assertions about the contents of a record. Qualified models (i.e., extended versions) of DC exist and we are looking into implementing this in the future, perhaps in conjunction with a metadata standard that better describes the context. Many language documentation projects include both DC records for each item (a recording) and more granular level descriptions of the record in IMDI. This combination of general and specific metadata schemas is known as a fractal approach and it might be the best solution in the future (Berners-Lee and Kagal 2008). For the time being, we have implemented DC metadata records for all the videorecordings in CWA, with the hope that this will increase the findability of our items.



Enable better search and retrieval of data. Data is only truly usable when researchers can find what it is they are looking for. A good search and retrieval system ensures that data within the archive will be useful to a wider audience. At the moment CWA enables two kinds of search. A visual search function (based on a local metadata schema) enables users to find recordings according to several criteria which is only directly relevant to wayang (types of puppets used, performance space, performers, music, language and story source). The second search function is directly inspired from linguistics. It is a concordance search function that allows users to find specific strings of text anywhere in the recordings. The results are returned as a list of sentences where the string is used, as well as the time-code where this text is found in a given recording.v

More visualizations and corpus-based search features. The corpus-based search feature mentioned above opens the door for more possibilities aimed at researchers with an interest in language. We are exploring the possibility of including visual search features as those found in Sinclair and Rockwell (2017) and culturomics.org (2016), both of which use intuitive yet powerful visual idioms to display linguistic information about a collection. For the time being, we have implemented only one new feature. Each performance page includes an interactive table with all the words uttered in the performances. The words can be sorted by frequency and users can search within this table. We will test the reception of this feature before deploying more comprehensive linguistic features.

Automatic dictionaries. An intriguing possible use of our linguistic data in three languages (Indonesian, Javanese, English) is the creation of an automatic dictionary. The theatre scholars behind CWA had not originally considered this possibility, until seeing the potential and available tools. They are exploring using FieldWorks Language Explorer (FLEx) developed by the non-profit language organization SIL for this purpose. There are very few Javanese-English dictionaries and none of them exists in a machine-readable format. Creating this dictionary would be a clear contribution of CWA to the language community. Perhaps this application is narrowly premised on the specific context of CWA, but we imagine that other theatre archives also deal with language from places (or historical times) where machine-readable dictionaries are not readily available. These dictionaries can be of enormous help to both language learners and to digital humanities applications.

Modular documentation. As suggested by the linguist, the individual components of the documentation (transcripts, audio, images) could be taken apart and analyzed more efficiently. If this suggestion were implemented, the best practice would be to distribute these files in uncompressed non-proprietary formats. Uncompressed formats enable different kinds of analysis and are more future-proof. For example, an uncompressed 96kHz version of a sound file (as opposed to 44.1kHz) may enable some future phonological analysis currently not possible. Non-proprietary files are more sustainable and their usage is less restricted. However, the challenge for video is that non-proprietary files cannot be streamed online across all browsers and that uncompressed formats are heavy and expensive to maintain. Our current solution is to stream compressed versions online and to keep high-quality recordings in a physical archive at our local university library. Due to copyright restrictions, audio and video files are only available for online viewing. But we are exploring the possibility of alternative licenses for modular distribution in future contributions to the archive.

Transcript features. A transcript separate from the subtitles could be useful for those interested in language research. However, a theatre audience would perhaps be interested in transcripts that include information not found in the subtitles, such as descriptions of scene transitions and character names. The exquisitely detailed project by Kathryn Emerson on the wayang kulit performances of Purbo Asmoro follows this practice (Emerson and Asmoro 2013a, 2013b). The DVDs of the project include condensed subtitles and the companion books include transcripts and translations with descriptions and character names. We would like to follow this practice in CWA and are thus working towards these kinds of transcripts, one for language researchers and another for theatre scholars, to be made available in the future.

Easy citation guidelines. Clear citation guidelines are one of the best ways to encourage depositors to contribute to an archive. They are also fundamental for showing how the archive is being used when making a case for funding. Editors, teachers and researchers should cite more often from online archives and encourage others to do so as well. According to Thieberger (2004), this is one of the most important ways in which the archives can be preserved. In the case of CWA, we have made a change to each video page to prominently display citation recommendations. We have also generated standard URLs and updated our citation metadata to make sure that Google Scholar and other platforms correctly represent citation to CWA material.

One aspect we also discussed was the reusability policy of the archive. From a research perspective, an open-access, Creative Commons policy would be ideal since it would be mean that anyone could use the archive and reuse the material in any way they see fit. Although we recognize the potential of this approach, we also recognize that this might not be what all stakeholders in the archive want. After all, the ownership of the intellectual property rights remains with the artists who donated their material to CWA, as is the case in most digital archives. One suggestion for bigger archives is to implement a tiered access and copyright model, rather than a blanket copyright and access policy. Some language archives implement such a policy and grant depositors more flexibility when choosing who can access, use and reuse their data.



Yüklə 49,24 Kb.

Dostları ilə paylaş:
1   2   3   4   5




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə