E-health requires the sharing of patient-related data when and where necessary. Electronic health records (EHR) promise to improve communication between health care providers, thus improving the quality of patients’ treatment and reducing costs. EHRs allow the structured and expandable collection of medical data needed for clinical research studies and thereby not only enable the optimization of clinical studies, but also results in higher statistical significance due to a larger number of samples. While the digitization of medical data and the organization of this data within EHRs have been introduced in some areas, massive amounts of paper-based health records are still produced on a daily basis. This data has to be stored for 30 years for legal reasons but is of no benefit for research organizations, as the unstructured medical data in paper-based health records cannot be efficiently used for clinical studies. Furthermore, legal regulations prohibit the use of documents containing both personal and medical data for clinical studies, which leads to expensive data acquisition phases and limited samples. This project develops a system for the recognition and pseudonymization of personal data in paper-based health records with the overall goal to (i) provide clinical studies with medical data gained from existing paper-based health records while (ii) ensuring patients’ privacy by pseudonymizing the (digitized) data. The product integrates unique methods for (i) automatically identifying personal and medical data, (ii) automatically annotating the optical character recognition (OCR) output data of paper-based health records with standard-compliant metadata, and (iii) automatically pseudonymizing the personal data. With the project results, (Austrian) health care organizations profit by (i) strengthen clinical research resulting in faster and more reliable results and reduced costs, and (ii) providing an environment of trust for its patients and employees that guarantees privacy.