This readme.txt file was generated on 20210408 by Frank Weij

-------------------
GENERAL INFORMATION
-------------------

Author Information:
- Frank Weij, Erasmus University Rotterdam, frankweij@gmail.com

Principal Investigator: 
- Frank Weij

Co-investigators:
- Prof.dr. Koen van Eijc
- dr. Pauwke Berkers 
- dr. Jiska Engelbert
	
Date of data collection: 
- 01/09/2015 until 31/12/2019

Geographic location of data collection: 
- Rotterdam, the Netherlands 

Information about funding sources or sponsorship that supported the collection of the data: 
- NWO, dossier number 322-45-006


--------------------------
SHARING/ACCESS INFORMATION
-------------------------- 

Licenses/restrictions placed on the data, or limitations of reuse:
- CC-BY-NC 4.0: others are free to copy and redistribute the material in any medium or format and remix, transform, and build upon the material, as long as appropriate credit is given and the material is not used for commercial purposes.


--------------------
DATA & FILE OVERVIEW
--------------------

File list:
- tweets_final: data consisting of Twitter messages / tweets for the period 20100101 - 20150101 that mention Ai Weiwei, Banksy, Hans Haacke, Jafar Panahi, Jonas Staal, Pussy Riot. 
- dataset_kranten: data consisting of newspaper articles for the period 20100101 - 20150101 that mention Ai Weiwei, Banksy, Hans Haacke, Jafar Panahi, Jonas Staal, Pussy Riot.
  The following newspapers are included in the data collection process: 
	- Netherlands: NRC, Volkskrant, Algemeen Dagblad, Telegraaf
	- United States: New York Times, Daily News, Washington Post, USA Today
	- United Kingdom: Daily Mirror, Daily Telegraph, Daily Star, Evening Standard
- interview_data: interviews with curators in the cultural sector, conducted during the period 20190101 - 20193112 for those interviewees consenting to secondary use of data.
- tweets_cleaning_code.txt: Python code used to clean and process Twitter data.
- newspaper_cleaning_code: Python code used to clean and process newspaper data.


Relationship between files, if important for context:
- Datasets - tweets_final and dataset_kranten - exclusively include text documents (as units of analysis) pertaining to the same cases mentioned above.


--------------------------
METHODOLOGICAL INFORMATION
--------------------------

Description of methods used for collection/generation of data: 
- tweets_final: scraped from Twitter.com.
- dataset_kranten: derived from Lexis Nexis.

Methods for processing the data: 
- tweets_final: Python code was used to clean and process data, see tweets_cleaning_code.txt.
- dataset_kranten: Python code was used to clean and process data, see newspaper_cleaning_code.txt.

Software- or Instrument-specific information needed to interpret the data, including software and hardware version numbers:
- The Twitter dataset (tweets_final) extends beyond 1 million rows. Import to excel is therefore not possible. Python (e.g. with the Pandas package) or R (e.g. with the GGPLOT library) recommended. For the other two datasets, any regular quantitative/qualitative data processing software is sufficient. 


--------------------------
DATA-SPECIFIC INFORMATION 
--------------------------


1. Twitter data in tweets_final.txt 492.18 mb
- Number of variables: 6
- Number of rows: 2380285, each row represents one individual tweet
- Variable list:
   - Name: twitter account name and screen name, categorical values
   - Date: e.g. 31 dec. 2010
   - Tweet: original tweet text
   - Replies: number of replies for tweet in absolute numbers
   - Retweets: number of retweets for tweet in absolute numbers
   - Likes: number of likes for tweet in absolute numbers


2. Newspaper data in dataset_kranten.xlsx 3.73 mb
- Number of variables: 9
- Number of rows: 2351, each row represents one individual news article
- Variable list:
   - Artivist: activist case mentioned in article text, categorical values
   - Newspaper: newspaper in which news articles is published
   - Country: country from which newspaper originates, in country codes
   - Page: page number of published article within newspaper
   - Section: section (recoded) in which newspaper article is published
   - Length: length of newspaper article in words
   - Date: date of article in years, yyyy
   - Type: type of article
   - Headline: headline of newspaper article text
   - Article: full article text


3. Interview data in interview_data.txt 471.85 kb
- Entails qualitative data. Variables and rows do not apply. Interviews are separated by recurring line, e.g. '---------- interview 1 ---------'. 
- Data is anonymized
- Each interview contains lines by interviewer (1) and interviewee (2).
- Dataset contains only interviews of candidates agreeing to third party use of interview data.