Monday, May 20, 2024
HomeMicrosoft ExchangePython - how do I bulk replace file properties

Python – how do I bulk replace file properties


Hello,

 

I’ve a requirement to bulk replace 1.2 million information. I’ve the properties that should be replace in csv information and I’ve loaded these to pandas dataframe. I’m at the moment capable of finding the proper file within the SharePoint doc library and replace the identical separately. However that is extraordinarily time consuming and I’m solely capable of replace 200 data per minute. 

 

I’m unable to determine methods to load the file objects in bulk (into a listing or dict) and to name a bulk replace of all of the information inside one case (the 1.2 million information are distributed over 20,000 instances). I’ve used 

execute_query_with_incremental_retry() to keep away from throttling difficulty however nonetheless the efficiency of replace is fairly unhealthy. the code snippet. Among the code is to permit me to restart from some extent in case the replace fails.

 

alreadyupdatedfiles =[]
for i in vary(0,len(df),10):
    dfMasterFileFolderDocumentList_cin = dfMasterFileFolderDocumentList[(dfMasterFileFolderDocumentList['Case Number New']==df.loc[i, 'Case Number New'])]
    dfMasterFileFolderDocumentList_cin = dfMasterFileFolderDocumentList_cin.reset_index(drop= True)
    SharePointCaseListName = dfCaseLocation.loc[dfCaseLocation['Case Number New']==df.loc[i, 'Case Number New']]['SharePointListName'].merchandise()
    SharePointDocumentListName = SharePointCaseListName.change("Shared Instances","Shared Instances Paperwork")
    SharePointListInternalName = dfCaseLocation.loc[dfCaseLocation['Case Number New']==df.loc[i, 'Case Number New']]['SharePointListInternalName'].merchandise()
    SharePointListInternalName = SharePointListInternalName.change("Shared_Cases","SharedCasesDoc")
    if (df.loc[i, 'Case Completion Status']) == 'Accomplished':
        root = r'Lists/{0}/Accomplished'.format(SharePointListInternalName)
    else:
        root = r'Lists/{0}'.format(SharePointListInternalName)
    ctxCaseItem = ClientContext(site_url, context_auth)
    caseURL = dfCaseURL.loc[dfCaseURL['Title']==df.loc[i, 'Case Number New']]['Case Url'].merchandise()
    caseFileName = dfCaseURL.loc[dfCaseURL['Title']==df.loc[i, 'Case Number New']]['Case File Name'].merchandise()  
    
    for j in vary(len(dfMasterFileFolderDocumentList_cin)):
        ctxFiles = ClientContext(site_url, context_auth)
        file_folder = base64.b64decode(str(dfMasterFileFolderDocumentList_cin.iloc[j,2])).decode('UTF-16', "ignore")
        file_folder = str(file_folder).change(' ','').change("https://techcommunity.microsoft.com/",'_').change('#','_').change(':','_').change('*','_').change('*','_').change('?','_').change('|','_').change('%','_').change('>','_').change('<','_').change('"','_')
        if file_folder[len(file_folder)-1] =='.':
            file_folder = file_folder[:-1]
        fileNameActual = base64.b64decode(str(dfMasterFileFolderDocumentList_cin.loc[j,'FullFileName'])).decode('UTF-16', "ignore")
        fileNameActual =  str(fileNameActual).rstrip().change("https://techcommunity.microsoft.com/",'_').change('#','_').change(':','_').change('*','_').change('*','_').change('?','_').change('|','_').change('%','_').change('>','_').change('<','_').change('','_').change('"','_').change('t','_')
        fullpath = r"{0}/{1}/{2}".format(root,file_folder,fileNameActual)
       
        if fullpath not in alreadyupdatedfiles:
            print(r"{0}/{1}/{2}".format(root,file_folder,fileNameActual))
            target_file = ctxFiles.internet.get_file_by_server_relative_url(r"{0}/{1}/{2}".format(root,file_folder,fileNameActual)).listItemAllFields
            ctxFiles.load(target_file)
            ctxFiles.execute_query()
            DocumentTitle = base64.b64decode(str(dfMasterFileFolderDocumentList_cin.loc[j,'doc_title'])).decode('UTF-16', "ignore")
            ctxFileItem = ClientContext(site_url, context_auth)
            target_file_item = ctxFileItem.internet.lists.get_by_title(SharePointDocumentListName).get_item_by_id(target_file.id)
            target_file_item.set_property("Title",DocumentTitle)
            target_file_item.set_property("CaseNumber",dfMasterFileFolderDocumentList_cin.loc[j,'Case Number New'])
            target_file_item.set_property("DocumentId",dfMasterFileFolderDocumentList_cin.loc[j,'Sequence Number'])
            if str(dfMasterFileFolderDocumentList_cin.loc[j,'Document_Type SPID']) != 'nan':
                target_file_item.set_property("DocumentTypeLKId",str(dfMasterFileFolderDocumentList_cin.loc[j,'Document_Type SPID']))
            target_file_item.set_property("CaseUrl",caseURL)
            if str(dfMasterFileFolderDocumentList_cin.loc[j,'DownloadList']) != 'nan':
                target_file_item.set_property("DownloadLists",{ 'outcomes': str(dfMasterFileFolderDocumentList_cin.loc[j,'DownloadList']).break up(',') })
            target_file_item.replace()
            ctxFileItem.execute_query_with_incremental_retry()
            alreadyupdatedfiles.append(r"{0}/{1}/{2}".format(root,file_folder,fileNameActual))
    

 



RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments