Tuesday 18 October 2016

python - How to load a dataframe from a printed dataframe string?




Often people ask questions on with an output of print(dataframe). It is convenient if one has a way of quickly loading the dataframe data into a pandas.dataframe object.




What is/are the most suggestible ways of loading a dataframe from a dataframe-string (which may or may not be properly formatted)?



Example-1



If you want to load the following string as a dataframe what would you do?



# Dummy Data
s1 = """
Client NumberOfProducts ID

A 1 2
A 5 1
B 1 2
B 6 1
C 9 1
"""


Example-2




This type is more similar to what you find in csv file.



# Dummy Data
s2 = """
Client, NumberOfProducts, ID
A, 1, 2
A, 5, 1
B, 1, 2
B, 6, 1
C, 9, 1

"""


Expected Output



enter image description here





Note: The following two links do not address the specific situation presented in Example-1. The reason I think my question is not a duplicate is that I think one cannot load the string in Example-1 using any of the solutions already posted on those links (at the time of writing).





  1. Create Pandas DataFrame from a string. Note that pd.read_csv(StringIO(s1), sep), as suggested here, doesn't really work for Example-1. You get the following output.
    enter image description here


  2. This question was marked as a duplicate of two links. One of them is the one above, which fails in addressing the case presented in Example-1. And the second one is . Among all the answers presented there, only one looked like it might work for Example-1, but it did not work.




# could not read the clipboard and threw error
pd.read_clipboard(sep='\s\s+')



Error Thrown:



PyperclipException: 
Pyperclip could not find a copy/paste mechanism for your system.
For more information, please visit https://pyperclip.readthedocs.org

Answer



I can suggest two methods to approach this problem.



Method-1




Process the string with regex and numpy to make the dataframe. What I have seen is that this works most of the time. This would for the case presented in "Example-1".



# Make Dataframe
import pandas as pd
import numpy as np
import re

# Make Dataframe
# s = s1

ncols = 3 # number_of_columns
ss = re.sub('\s+',',',s.strip())
sa = np.array(ss.split(',')).reshape(-1,ncols)
df = pd.DataFrame(dict((k,v) for k,v in zip(sa[0,:], sa[1:,].T)))
df


Method-2



Use io.StringIO to feed into pandas.read_csv(). But this would work if the separator is well defined. For instance, if your data looks similar to "Example-2". Source credit




import pandas as pd
from io import StringIO

# Make Dataframe
# s = s2
df = pd.read_csv(StringIO(s), sep=',')


Output




enter image description here


No comments:

Post a Comment

c++ - Does curly brackets matter for empty constructor?

Those brackets declare an empty, inline constructor. In that case, with them, the constructor does exist, it merely does nothing more than t...