I am using autocoding as the first step in my analysis of a range of literature (.pdf Internals), and running the autocoding both within sources from a journal and then across multiple journals to check the consistency of coding and variation between journals.
I have 4 journals, each with a number of articles (ranging from 4 to 47 per journal) which totals 100 internals for the analysis across all the journals. All records were exported from Endnote as .xml and then imported to Nvivo through the endnote import functionality, and classified and coded accordingly.
I have run the autocoding on the items from 3 of the journals without any issue. With the last journal (Journal of Organisational Change Management), I am getting the following error if I try to autocode all 38 items in the list for this journal: "Unable to identify themes - An error occurred when coding your sources. Please try again.". Having tried several times, it appears there is an issue with one or two of the .pdfs and that Nvivo cannot read and therefore code them, so the whole autocoding stops.
I've run the autocoding on all but the items except for the offending 2, and then the coding will complete, so it appears to be an issue with these .pdfs. So in trying to diagnose why this is happening I have:
1. checked the .pdfs are not secure/locked etc in adobe acrobat - correct, no security is applied to them.
2. opened the .pdfs in Nvivo and tried to manually code sections of the text in these documents - yes this is possible and the coding appears in the coding stripes immediately.3. tried to run the autocoding from within the .pdf/internal by opening it, right-click and selecting autocode and then choosing the use existing coding option and selecting the codes created when the autocoding succeeded when these two internals were excluded from the set - the autocoding still fails (all codes in the set come up under "no coding" in the results of analysis list) and then I get an error that says "The autocoding wizard was unable to code the selected source(s) based on existing coding patterns. Make sure that your sources contain text and select nodes that already contain sufficient content coding references to determine new coding".
4. to check if in Nvivo the .pdf are text sources I copied and pasted text from the .pdf internal into my project memo - successful, copy and paste worked, the section appeared correctly in the memo and so would seem to be text.
5. run the autocode on one of the offending sources, using the existing codes from the autocoding of the remainder of the set from that journal, but this time only with the 3 of the 5 top level nodes (which have 160-760 references each coded to them) and none of the child nodes (which have many with less than 10 references coded to them) - coding failed and got same error as in step 4 above.
6. converted the .pdfs to .docx and imported as internals and then repeated 5. above - same result, no coding and same error.
7. ran autocode on entire .pdf data set (ie across all journal items together) - same "unable to identify themes" error resulted.
8. ran autocode on entire .pdf data set less the two offending sources - autocodes fine when the offending two sources are excluded.
So by now, I am about out of ideas as to what is going on here, especially because the same happened in the .pdf and the .docx. Appreciate suggestions on what might be wrong with these two .pdfs/internals and what to do about it so I can do the analysis on my full data set.