Often people ask questions on with an output of print(dataframe). It is convenient if one has a way of quickly loading the dataframe data into a pandas.dataframe
object.
What is/are the most suggestible ways of loading a dataframe from a dataframe-string (which may or may not be properly formatted)?
Example-1
If you want to load the following string as a dataframe what would you do?
# Dummy Data
s1 = """
Client NumberOfProducts ID
A 1 2
A 5 1
B 1 2
B 6 1
C 9 1
"""
Example-2
This type is more similar to what you find in csv
file.
# Dummy Data
s2 = """
Client, NumberOfProducts, ID
A, 1, 2
A, 5, 1
B, 1, 2
B, 6, 1
C, 9, 1
"""
Expected Output
Note: The following two links do not address the specific situation presented in Example-1. The reason I think my question is not a duplicate is that I think one cannot load the string in Example-1 using any of the solutions already posted on those links (at the time of writing).
Create Pandas DataFrame from a string. Note that
pd.read_csv(StringIO(s1), sep)
, as suggested here, doesn't really work for Example-1. You get the following output.
This question was marked as a duplicate of two links. One of them is the one above, which fails in addressing the case presented in Example-1. And the second one is . Among all the answers presented there, only one looked like it might work for Example-1, but it did not work.
# could not read the clipboard and threw error
pd.read_clipboard(sep='\s\s+')
Error Thrown:
PyperclipException:
Pyperclip could not find a copy/paste mechanism for your system.
For more information, please visit https://pyperclip.readthedocs.org
Answer
I can suggest two methods to approach this problem.
Method-1
Process the string with regex
and numpy
to make the dataframe. What I have seen is that this works most of the time. This would for the case presented in "Example-1".
# Make Dataframe
import pandas as pd
import numpy as np
import re
# Make Dataframe
# s = s1
ncols = 3 # number_of_columns
ss = re.sub('\s+',',',s.strip())
sa = np.array(ss.split(',')).reshape(-1,ncols)
df = pd.DataFrame(dict((k,v) for k,v in zip(sa[0,:], sa[1:,].T)))
df
Method-2
Use io.StringIO
to feed into pandas.read_csv()
. But this would work if the separator is well defined. For instance, if your data looks similar to "Example-2". Source credit
import pandas as pd
from io import StringIO
# Make Dataframe
# s = s2
df = pd.read_csv(StringIO(s), sep=',')
No comments:
Post a Comment